0% found this document useful (0 votes)
31 views

Manual

This document contains instructions for a lab manual on compiler construction. It includes acknowledgements to those who contributed to the development of the material. It covers topics such as recursive descent parsing, transforming abstract syntax trees to SVG for visualization, parsing circuits with recursive descent, circuit optimization through simplification, and parsing with a parser generator called Parboiled. The document provides exercises for students to complete related to these compiler construction concepts.

Uploaded by

yhj5hqfgfh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Manual

This document contains instructions for a lab manual on compiler construction. It includes acknowledgements to those who contributed to the development of the material. It covers topics such as recursive descent parsing, transforming abstract syntax trees to SVG for visualization, parsing circuits with recursive descent, circuit optimization through simplification, and parsing with a parser generator called Parboiled. The document provides exercises for students to complete related to these compiler construction concepts.

Uploaded by

yhj5hqfgfh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 168

D E R E K R A Y S I D E & E C E 3 5 1 S TA F F

ECE 3 5 1 L A B M A N U A L

U N I V E R S I T Y O F W AT E R L O O
2 derek rayside & ece351 staff

Copyright © 2023 Derek Rayside & ECE351 Staff


Compiled September 2, 2023

acknowledgements:
• Prof Paul Ward suggested that we look into something with vhdl to have synergy with ece327.
• Prof Mark Aagaard, as the ece327 instructor, consulted throughout the development of this material.
• Prof Patrick Lam generously shared his material from the last offering of ece251.
• Zhengfang (Alex) Duanmu & Lingyun (Luke) Li [1b Elec] wrote solutions to most labs in txl.
• Jiantong (David) Gao & Rui (Ray) Kong [3b Comp] wrote solutions to the vhdl labs in antlr.
• Aman Muthrej and Atulan Zaman [3a Comp] wrote solutions to the vhdl labs in Parboiled.
• Michael Thiessen [3a Comp] improved the vhdl grammar in Parboiled.
• TA’s Jon Eyolfson, Vajih Montaghami, Alireza Mortezaei, Wenzhu Man, and Mohammed Hassan.
• TA Wallace Wu developed the vhdl labs.
• High school students Brian Engio and Tianyu Guo drew a number of diagrams for this manual, wrote
Javadoc comments for the code, and provided helpful comments on the manual.

Licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) version 2.5 or greater.
http://creativecommons.org/licenses/by-sa/2.5/ca/
http://creativecommons.org/licenses/by-sa/3.0/
Contents

0 Overview 15
Compiler Concepts: call stack, heap
Programming Concepts: version control,
0.1 How the Labs Fit Together 15 push, pull, merge, SSH keys, IDE,
0.2 Learning Progressions 17 debugger, objects, pointers

0.3 How this project compares to CS241, the text book, etc. 19
0.4 Student work load 20
0.5 How this course compares to MIT 6.035 21
0.6 Where do I learn more? 21
0.7 Full-Adder Circuit Example 22
0.7.1 Picturing Git† 24
0.8 What To Do This Week 25
0.9 Metadata 27
0.10 Checklist for Every Lab 27
0.11 How to do these labs 28
0.12 I think I found a bug in the skeleton code 29
0.13 I want to change the skeleton code for my own usage 29
0.14 Testing† 30
0.15 Phases of Compilation† 31
0.16 Engineers and Other Educated Persons 32

1 Recursive Descent Parsing of W 35


Compiler Concepts: regular languages,
1.1 Grammars & ebnf † 35 regular expressions, ebnf, recognizers,
parsing, recursive descent, lexing,
1.1.1 Summary of Grammar Notation Conventions 36 pretty-printing, abstract syntax tree
(ast)
1.1.2 Derivations† 37 Programming Concepts: classes, objects,
variables, aliasing, immutability, test-
driven development
4 derek rayside & ece351 staff

1.2 Write a regular expression recognizer for W 38


1.2.1 Test-Driven Development† 39
1.2.2 Complete the regular expression recognizer for W 39
1.3 Write a recursive descent recognizer for W † 39
1.4 Write a pretty-printer for W 40
1.5 Write a recursive descent parser for W 40
1.6 Object Diagram of Example W ast 41
1.7 Steps to success 42
1.8 Evaluation 44
1.9 Reading 44
1.9.1 Tiger Book 44
1.9.2 Programming Language Pragmatics 44
1.9.3 Web Resources 44

2 Transforming W → SVG for Visualization 45


Compiler Concepts: trees, transforma-
tions, xml, svg
2.1 Write a translator from W → svg 46 Programming Concepts: object contract,
object equality, mathematical equiva-
2.2 Write a translator from svg → W 46 lence classes, dom vs. sax parser styles,
call-backs, iterator design pattern
2.3 Introducing the Object Contract† 47
2.3.1 Engineering Apparatus and Controls† 47
2.3.2 Learning to write professional code† 49
2.3.3 Reading 49
2.4 Evaluation 50
2.5 Steps to success 50

3 Recursive Descent Parsing of F 53


Compiler Concepts: context-free gram-
3.1 A Tale of Three Hierarchies† 53 mars, ll(1) grammars, predict sets,
parse trees, precedence, associativity,
commutativity, program equivalence
3.1.1 Parse Tree 54 Programming Concepts: inheritance,
polymorphism, dynamic dispatch, type
3.1.2 ast 54 tests, casting, memory safety, composite
design pattern, template design pattern,
3.1.3 A peak at the Expr class hierarchy 56 singleton design pattern, recursive
functions, recursive structures, higher-
order functions
ece351 lab manual 5

3.2 Polymorphism & Dynamic Dispatch† 57


3.3 Write a recursive-descent recognizer for F 58
3.4 Write a pretty-printer for the ast 58
3.5 Equals, Isomorphic, and Equivalent† 58
3.6 Write Equals and Isomorphic for the F ast classes 59
3.7 Write a recursive-descent parser for F 59
3.8 Steps to success 60
3.9 Common Missteps 60
3.10 Evaluation 61
3.11 Background & Reading 61

4 Circuit Optimization: F Simplifier 63 Compiler Concepts: intermediate lan-


4.1 Iteration to a Fixed Point and Termination† 64 guages, identity element, absorbing
element, equivalence of logical for-
4.2 Confluence† 65 mulas, term rewriting, termination,
confluence, convergence
4.3 Mathematical Properties of Binary Operators† 66 Programming Concepts: interpreter
design pattern, template design pattern,
4.3.1 Commutativity 66 representation invariants

4.3.2 Associativity 66
4.4 Transforming BinaryExprs to NaryExprs 67
4.5 Identity Element & Absorbing Element† 68
4.6 Simplify Once 68
4.7 Object sharing in the simplifier 70
4.8 Class Representation Invariants† 71
4.9 Logical equivalence for boolean formulas† 72
4.10 BinaryExpr.examine() is a Higher-Order Function† 73
4.11 Choice of Data Structures for NaryExpr.children† 73
4.12 Evaluation 74

5 Parsing W with Parboiled 75 Compiler Concepts: parser generators,


5.1 Introducing Parboiled 76 Parsing Expression Grammars (peg),
push-down automata
5.2 Write a recognizer for W using Parboiled 77 Programming Concepts: domain specific
languages (dsl): internal vs. external,
5.3 Add actions to recognizer to make a parser 77 debugging generated code, stacks

5.3.1 An alternative Name() rule 78


5.3.2 The stack for our W parser will have two objects 79
6 derek rayside & ece351 staff

5.4 Parsing Expression Grammars (pegs)† 79


5.5 Expressions, Statements, and Side Effects† 80
5.6 A few debugging tips 81
5.7 Evaluation 81
5.8 Reading 81

6 Parsing F with Parboiled 83


Compiler Concepts: parser generators,
Parsing Expression Grammars (peg),
6.1 Write a recognizer for F using Parboiled 83 push-down automata
6.2 Add actions to recognizer to make a parser 84 Programming Concepts: domain specific
languages (dsl): internal vs. external,
6.3 Composite Design Pattern† 84 debugging generated code, stacks

6.4 Template Method Design Pattern† 85


6.5 Singleton Design Pattern† 87
6.6 Evaluation 88

7 Technology Mapping: F → Graphviz 89


Compiler Concepts: common subexpres-
7.1 Interpreters vs. Compilers† 90 sion elimination
Programming Concepts: hash struc-
7.2 Introducing the Interpreter Design Pattern† 90 tures, iteration order, object identity,
non-determinism, Visitor design pat-
7.3 Introducing the Visitor Design Pattern† 92 tern, tree traversals: preorder, postorder,
inorder
7.3.1 Is Visitor always better than Interpreter?† 95
7.4 Hash Structures, Iteration Order, and Object Identity† 99
7.5 Non-determinism: Angelic, Demonic, and Arbitrary† 102
7.6 Translate one formula at a time 102
7.7 Common Subexpression Elimination 102
7.8 Designing a Common Subexpression Eliminator for F 103
7.8.1 An n2 design using isomorphic and memory address 104
7.9 Evaluation 105

8 Simulation: F → Java 107


Compiler Concepts: program generation,
8.1 Name Collision/Capture† 107 name capture
Programming Concepts:
8.2 Testing a code generator† 110
8.3 Evaluation 110
ece351 lab manual 7

9 Simulation: F → x64 Assembly 111


9.1 Reading the W Program 111
9.2 Register Allocation† 111
9.2.1 Register Allocation in This Lab 112
9.3 Assembling & Linking 112
9.4 Evaluation 112
9.4.1 Bonus Marks 112

10 vhdl Recognizer & Parser 115


Compiler Concepts:
Programming Concepts:
10.1 Keywords and Whitespaces 118
10.2 vhdl Recognizer 118
10.3 vhdl Parser 118
10.4 Engineering a Grammar† 119
10.5 Evaluation 120

11 vhdl → vhdl: Desugaring & Elaboration 121


Compiler Concepts: desugaring, function
inlining
11.1 vhdl → vhdl: Desugaring 121 Programming Concepts:
11.2 Elaboration 122
11.2.1 Inlining Components without Signal List in Architecture 122
11.2.2 Inlining Components with Signal List in Architecture 124
11.2.3 Inlining Components with Processes in Architecture 126
11.2.4 Notes 126
11.3 Evaluation 126

12 vhdl Process Splitting & Combinational Synthesis 127


12.1 Process Splitter 127
12.2 Splitter Notes 129
12.3 Synthesizer 130
12.3.1 If/Else Statements 130
12.3.2 Example Synthesizing Signal Assignment Statements 131
12.3.3 Example Synthesizing If/Else Statements 131
8 derek rayside & ece351 staff

12.4 Evaluation 132

13 Simulation: F → Assembly 133


13.1 Which assembler to use? 133
13.2 masm32 Configuration 134
13.3 masm32 JNI and x86 Assembly 134
13.4 Evaluation 137

14 Simulation: F → JVM 139 Compiler Concepts: code generation


14.1 How javac compiles boolean logical operators† 139 Programming Concepts: assembly

14.2 How we will generate code 141


14.3 Negation 144
14.4 Evaluation 144
14.5 Bonus Lab: F to x86/x64/etc 144

A Extra vhdl Lab Exercises 145


A.1 Define-Before-Use Checker 145
A.1.1 Some Tips 146
A.2 Inline F intermediate variables 146

B Advanced Programming Concepts 147


B.1 Immutability 147
B.2 Representation Invariants 148
B.3 Functional Programming 148

C Design Patterns 149 See the Design Patterns book or


C.1 Iterator 149 Wikipedia or SourceMaking.com

C.2 Singleton 149


C.3 Composite 149
C.4 Template Method 150
C.5 Interpreter 150
C.6 Visitor 150
ece351 lab manual 9

D Lab Instructor Notes (GitLab) 151


D.1 Repository URLs 151
D.2 Setting Up 151
D.2.1 Creating the Wildcard Repo ece351/term 151
D.2.2 Notes/PDF Repository 152
D.2.3 Lib Repository 152
D.2.4 Course Offering Metadata Repository 152
D.2.5 Skeleton Repository 152
D.3 Forking Student Repos 154
D.4 Marking Script: Build.xml 154
D.5 Web User Interface for Gitolite Server 154
D.6 Exporting to Skeleton: export.sh 155
D.6.1 Update header.txt at the start of term 155
D.6.2 Export on an as-needed basis 155
D.6.3 Exporting to skeleton dev branch 155
D.6.4 Files to Release for Each Lab 156

E Lab Instructor Notes (gitolite) 159


E.1 Repository URLs 159
E.2 Setting Up 159
E.2.1 Creating the Wildcard Repo ece351/term 159
E.2.2 Notes/PDF Repository 160
E.2.3 Lib Repository 160
E.2.4 Course Offering Metadata Repository 160
E.2.5 Skeleton Repository 160
E.3 Forking Student Repos 162
E.4 Marking Script: Build.xml 162
E.5 Web User Interface for Gitolite Server 162
E.6 Exporting to Skeleton: export.sh 163
E.6.1 Update header.txt at the start of term 163
E.6.2 Export on an as-needed basis 163
E.6.3 Exporting to skeleton dev branch 163
E.6.4 Files to Release for Each Lab 164
10 derek rayside & ece351 staff

F Bibliography 167 † denotes conceptual sections


List of Figures

1 Overview of ece351 Labs 15


2 Lab dependencies 16
3 Descriptions of individual labs with the compiler and programming
concepts introduced in each 18
4 Student hours spent on labs in winter 2013 20
5 Student hours spent on labs in summer 2013. (Partial data to lab 9.) 20
6 Source code for full adder circuit (vhdl) 22
7 Input waveform for full adder (W ) 22
8 Boolean formulas for full adder (F generated from source code in
Figure 6) 22
9 Gates for full adder (generated from formulas in Figure 8) 22
10 Input and output waveforms for full adder (generated from formu-
las in Figure 8 and input waveform in Figure 7) 23
11 Git topology for ece351 24
12 Phases of compilation and the labs in which we see them 31

1.1 Example waveform file for an or gate 35


1.2 Grammar for W . 35
1.3 Object diagram for example W ast 41
1.4 Steps to success for §1 42
1.5 Legend for Figure 1.4 43

2.1 Example waveform file for an or gate 45


2.2 Rendered svg of W file from Figure 2.1 45
2.3 First 10 lines of svg text of W file from Figure 2.1 46
2.4 Code comprehension exercises 47
2.5 Infographic and photo of the International Kilogram Prototype 48
2.6 Steps to success for §2 50
2.7 How §1 and §2 fit together 51

3.1 ll(1) Grammar for F 53


3.2 Object diagram of ast for F program X <= A or B; 54
3.3 Object diagram for example F ast 55
3.4 Some highlights from the Expr class hierarchy 56
12 derek rayside & ece351 staff

3.5 Steps to success for §3 60

4.1 Code listing for Expr.simplify() showing iteration to a fixed point 64


4.2 Example of two confluent rewrites converging on a common solu-
tion 65
4.3 Four trees that represent logically equivalent circuits. The sorted n-
ary representation makes equivalence comparisons easier. 67
4.4 NaryExpr.simplifyOnce() 68
4.5 Simplifications for F programs 69
4.6 Object sharing in the F simplifier 70
4.7 NaryExpr.repOk() 71
4.8 Equivalent truth tables 72
4.9 BinaryExpr.examine() is a higher-order function 73

5.1 Snippet of a recognizer written in Parboiled’s dsl 77


5.2 Snippet of a parser written in Parboiled’s dsl 78
5.3 Stack of Parboiled parser from Figure 5.2 while processing input ‘Ren Stimpy ’. 78
5.4 An alternative definition of the Name() rule in Figure 5.2 78
5.5 Snippet of a parser written in Parboiled’s dsl (Figure 5.2). Extended
with debugging commands. Corresponding to the recognizer snip-
pet in Figure 5.1. 79

6.1 ll(1) Grammar for F (reproduced from Figure 3.1) 83


6.2 uml class diagram for Composite Design Pattern 84
6.3 Template Design Pattern illustrated by UML Class Diagram 85
6.4 The only two instances of class ConstantExpr 87
6.5 Preventing clients from instantiating class ConstantExpr 87

7.1 Example Graphviz input file and rendered output 89


7.2 A uml class diagram for a subset of F expression class hierarchy 91
7.3 uml class diagram for F Expr hierarchy, Interpreter design pattern 91
7.4 Matrix view of Interpreter Design Pattern 91
7.5 uml class diagram for F Expr hierarchy, Visitor design pattern 92
7.6 uml class diagram for F Expr hierarchy, Visitor design pattern 93
7.7 Matrix view of Visitor Design Pattern 93
7.8 Abstract super class for Visitors for F expression asts 93
7.9 Signature and implementation of accept method. 94
7.10 The traverse methods of ExprVisitor 96
7.11 Implementation of PostOrderExprVisitor 97
7.12 Implementation of ExtractAllExprs 98
7.13 Iteration order for different data structures 99
7.14 Object identity for different data structures 100
7.15 Synthesized circuit without common subexpression elimination 104
7.16 Synthesized circuit with common subexpression elimination 104
ece351 lab manual 13

8.1 Simulator for F program x <= a or b; 108


8.2 Implementation of DetermineInputVars 109
8.3 Dataflow of lab8 110

9.1 x64 assembly for x <= a; 114

10.1 Grammar for vhdl 116


10.2 Object diagram for example vhdl ast 117
10.3 Implementation of rule used to match the keyword ‘signal’ 118

11.1 vhdl program used to illustrate elaboration. 123


11.2 Elaborated architecture body, four_port_structure. 124
11.3 Extension of the vhdl program shown in Figure 11.1. 125
11.4 Elaborated architecture body, eight_port_structure. 125

12.1 Example vhdl program used to illustrate process splitting. 128


12.2 The resulting vhdl program of Figure 12.1 after process splitting. 129
12.3 Example used to illustrate synthesizing assignment statements. 131
12.4 Synthesized output of the program in Figure 12.3. 131
12.5 Example used to illustrate synthesizing assignment statements. 132
12.6 Synthesized output of the program in Figure 12.5. 132

13.1 Generate this method in assembly rather than in Java 133


13.2 Example assembly for an F Program 135
13.3 Example definitions file for a dynamic link library (DLL) 136

A.1 vhdl program used to illustrate signal declarations and the use of
undefined signals in signal assignment statements. 146
Compiler Concepts: call stack, heap
Programming Concepts: version control,
push, pull, merge, SSH keys, IDE,
debugger, objects, pointers

Overview

0.1 How the Labs Fit Together

The overall structure of our vhdl synthesizer and simulator is de-


picted in Figure 1.

Figure 1: Overview of ece351 Labs.


Nodes represent file types (e.g.,
vhdl). All of these file types, with the
desugaring exception of .class and png files, are
VHDL elaboration text files.
Edges represent translators between
process splitting different file types. Solid edges repre-
sent translators that we will implement
combinational synthesis in ece351. Dotted edges represent
translators provided by third-parties
such as Sun/Oracle (javac) or AT&T
F simplifier Research (dot).
The three-part edge between .class
and W nodes is intended to indicate
that the .class file we generate will read
technology mapper simulator generator a waveform (W) file as input and write
a waveform (W) file as output.
Labels on edges describe the transla-
dot .java tion(s) performed.
Numbers on edges indicate the
order in which we will implement those
Graphviz javac translators in ece351. For example, the
first translator that we will implement
will transform waveform files to svg
files (svg is an xml-based graphics file
PNG .class format).
The general direction of our work
in ece351 will be from the bottom of
the figure towards the top. We will start
with file types that have the simplest
W grammars and work towards file types
with more complicated grammars.

vizW

SVG
16 ece351 lab manual [september 2, 2023]

1. W recursive descent

2. W to SVG 3. F recursive descent 5. W parboiled

4. F simplifier 6. F parboiled

7. F tech. mapper 8. F simulator 9. V parboiled

10. V elaborator 11. V splitter

Figure 2: Lab dependencies. A solid


line indicates that the code is needed
to test the future lab. For example, you
need to have a working implementation
of lab1 in order to test lab2.
A dotted line indicates that ideas
from one lab feeds into the next lab. For
example, you learn the grammar of W
in lab1 and then use that idea again
in lab5. Not all of the dotted lines are
shown, in order to simplify the graph.
The shaded nodes indicate labs that
you need to focus on because they are
used by future labs. If you must skip a
lab, skip something like 2, 5, 8, 10, or
11 that are not used as much by future
labs.
overview 17

0.2 Learning Progressions

There are a variety of undergraduate compiler projects. Almost all of


them involve compiling a subset of some imperative programming
language to assembly. In the past Pascal was the popular language to
compile, whereas these days Java is.
The ece351 labs are different. On a superficial level, the differ-
ence is that these labs use a subset of vhdl as the input language,
and produce circuit gate diagrams and simulations as outputs. The
deeper difference is that our project is designed around two parallel
learning progressionsrather than around the logical structure of a com- The idea of a learning progression
piler. This project comprises both a programming skills progression and has been used in hockey for several
decades. It is recently attracting atten-
a compiler concepts progression. tion in educational circles. For example,
The key technical decision that enables these progressions is to suppose that the goal of the practice
is to have the players skate quickly
restrict our subset of vhdl to combinational circuits: no loops, no around the circles with the puck. To
timing, etc.. From this decision flows the design of a simple interme- progress to that goal the team might
diate language, F , for boolean formulas, and the design of a simple start with skating around circles, then
skating in a straight line with the puck,
auxiliary language, W , for boolean waveforms. The project pro- then skating around the circles slowly
gresses from the simplest language (W ) to the most complex (vhdl), with the puck, and finally skating
around the circles quickly with the
performing parsing, transformation, and translation on each of the puck. The skills are first practiced in
three languages. Repetition with increasing complexity (hopefully) isolation and at slower speeds, and then
leads to mastery. finally all together at high speed.
18 ece351 lab manual [september 2, 2023]

# Description Compiler Concepts Programming Concepts


0 Prelab call stack, heap version control, push, pull, merge,
SSH keys, IDE, debugger, objects,
pointers
1 Parsing W by recur- regular languages, regular expres- classes, objects, variables, aliasing,
sive descent sions, ebnf, recognizers, parsing, immutability, test-driven develop-
recursive descent, lexing, pretty- ment
printing, abstract syntax tree (ast)
2 Translating W to svg trees, transformations, xml, svg object contract, object equality, math-
(visualization) ematical equivalence classes, dom vs.
sax parser styles, call-backs, iterator
design pattern
3 Parsing F by recur- context-free grammars, ll(1) gram- inheritance, polymorphism, dynamic
sive descent mars, predict sets, parse trees, prece- dispatch, type tests, casting, memory
dence, associativity, commutativity, safety, composite design pattern,
program equivalence template design pattern, singleton
design pattern, recursive functions,
recursive structures, higher-order
functions
4 Simplifying F pro- intermediate languages, identity interpreter design pattern, template
grams element, absorbing element, equiv- design pattern, representation invari-
(optimization) alence of logical formulas, term ants
rewriting, termination, confluence,
convergence
5 Parsing W with a parser generators, Parsing Expres- domain specific languages (dsl):
parser generator sion Grammars (peg), push-down internal vs. external, debugging
automata generated code, stacks
6 Parsing F with a parser generators, Parsing Expres- domain specific languages (dsl):
parser generator sion Grammars (peg), push-down internal vs. external, debugging
automata generated code, stacks
7 Translating F to common subexpression elimination hash structures, iteration order,
Graphviz (technol- object identity, non-determinism,
ogy mapping) Visitor design pattern, tree traversals:
preorder, postorder, inorder
8 Translating F to Java program generation, name capture
(circuit simulation)
9 Parsing vhdl with a
parser generator
10 vhdl elaboration
11 vhdl process split- desugaring, function inlining
ting and translation
to F (combinational
synthesis)
B Translating F to instruction selection, register alloca- assembly, linking
assembly tion
∗ ‘DP’ stands for ‘design pattern’; ‘B’ stands for bonus lab
Figure 3: Descriptions of individual
labs with the compiler and program-
ming concepts introduced in each
overview 19

0.3 How this project compares to CS241, the text book, etc.

ece351 cs241 Tiger cs444 mit 6.035


Language(s) vhdl, F , W Java Java Java Java
Compiler Phases
√ √ √ √ √
Parsing
√ √ √ √
Symbol tables
√ √ √
Type checking ?

Dataflow analysis ◦ ?
√ √
Optimization ◦ ?
√ √ √ √ √
Translation
√ √ √ √
Assembly ◦
Pedagogy

Skills Progression

Concept Progression

Background

Tests
√ √ √
Workload ×2 ×2
20 ece351 lab manual [september 2, 2023]

0.4 Student work load

Figure 4: Student hours spent on labs


Lab Hours Data
in winter 2013. The target is five hours
per lab. About half the students hit
40 ● ●
this target on most labs, as indicated
by the bars in the middle of the boxes.
On most labs, 75% of the students
30 ● ● completed the lab within eight hours,
as indicated by the top of the boxes.
● Some students took much longer
than eight hours. If you are one of
Hours

20 ● ● ● those students, please consider taking


● advantage of our course collaboration
● ● ● ● ● ●
policy. The point of the collaboration
● ● ● ●

policy is for you to learn more in less
10 ● ●
time.


● ●
The exceptional labs that took more

time were 4 and 9. This term we will






● be increasing the ta resources for lab
0
4, and also changing the way we teach
lab0

lab1

lab2

lab3

lab4

lab5

lab6

lab7

lab8

lab9

lab10

lab11 lab 9, in an effort to get these numbers


Lab down.

Figure 5: Student hours spent on labs in


summer 2013. (Partial data to lab 9.)
40


30

● ● ●


Hour

20

● ● ● ● ● ●


● ● ●
10

● ● ●







0

lab0 lab1 lab2 lab3 lab4 lab5 lab6 lab7 lab8 lab9

Lab#
overview 21

0.5 How this course compares to MIT 6.035

mit’s (only) undergraduate compilers course is 6.035. It differs from


ece351 in a two important ways:

a. 6.035 is rated at 12 hours per week for 14 weeks, whereas ece351


is rated at 10 hours per week for 12 weeks: a 48 hour nominal
difference. Moreover, 6.035 makes no effort to keep the workload
near the course rating, whereas ece351 actively tracks student
hours and over half the students get the work done in the rated
time. So the actual difference is much larger than the nominal.
6.035 also comes in an 18 hour per week variant.

b. 6.035 is an elective course, whereas ece351 is a required course.


Elective courses only get students who have the interest and abil-
ity; required courses get everyone. On the flipside, not every
graduate of mit eecs will know compilers: some will be totally
ignorant. At uw we guarantee a minimum level of quality in our
graduates (this approach is also required for ceab accreditation).

0.6 Where do I learn more?

If you are interested in learning more about compilers at uw your


next step is cs444. If you take ece351 and cs444 then you will know
more than if you took mit 6.035. cs462 covers the theoretical end
of formal languages and parsing, although this knowledge won’t
greatly improve your practical skills. In the graduate curriculum
cs644 and cs744 are offered on a semi-regular basis. There are also a
number of graduate courses in both ece and cs on program analysis
that are offered irregularly.
22 ece351 lab manual [september 2, 2023]

0.7 Full-Adder Circuit Example

Figure 6: Source code for full adder


entity full_adder is port ( circuit (vhdl)
A, B, Cin: in bit;
S, Cout: out bit
);
end full_adder;

architecture full_adder_arch of full_adder is


begin
S <= (A xor B) xor Cin;
Cout <= ((A xor B) and Cin) or (A and B);
end full_adder_arch;

Figure 7: Input waveform for full adder


A: 0 1 0 1 0 1 0 1; (W )
B: 0 0 1 1 0 0 1 1;
Cin: 0 0 0 0 1 1 1 1;

Figure 8: Boolean formulas for full


S <= ((not (((not A) and B) or ((not B) and A))) and Cin) or adder (F generated from source code in
Figure 6)
((((not A) and B) or ((not B) and A)) and (not Cin));
Cout <= ((((not A) and B) or ((not B) and A)) and Cin)
or (A and B);

Figure 9: Gates for full adder (gener-


ated from formulas in Figure 8)

Cout

Cin
overview 23

Figure 10: Input and output waveforms


A for full adder (generated from formulas
in Figure 8 and input waveform in
Figure 7)

Cin

Cout

S
24 ece351 lab manual [september 2, 2023]

While Git is quite powerful, its usability


leaves something to be desired. Usually
0.7.1 Picturing Git† the fastest way to learn Git is to have
someone who knows it sit with you
Git is a powerful distributed version control system. Figure 11 will help and teach you what you need to do.
you understand how we use it in ece351. If you want to read up on Come in to the lab for help. Or ask
a classmate or friend — anyone who
Git, here are some references: is willing to help you. Getting help
with Git is outside the scope of the lab
• http://git-scm.com/book/en/Getting-Started-Git-Basics collaboration policy.
• http://git-scm.com/book/en/Git-Basics-Getting-a-Git-Repository

Prof Computer

Working Copy

commit

Student Computer

Eclipse Workspace Local Skeleton Repo

push

GitLab Server

Git Working Copy Skeleton Repo

commit
pull skeleton master
TA Computer
push
Working Copy Local Student Repo Student Repo
pull

Local Student Repo pull

Figure 11: Git topology for ece351


overview 25

0.8 What To Do This Week


Common Problem 1: JDK vs. JRE test
Install and Configure Software fails. In order to fix this, do the fol-
lowing. Click Window and go to
 Create an ssh key (with a password!) Preferences. In the tree view select Java,
then Installed JREs and click Add. Hit
ssh-keygen -t rsa -C "[email protected]"
Next (Standard VM should be selected
 GitLab (git.uwaterloo.ca) already), then Directory, navigate to
C:\Program Files\Java\jdk, or some-
 Make account on git.uwaterloo.ca thing similar, and hit Finish. Finally,
 Upload your ssh public key to git.uwaterloo.ca check the jdk in the Installed JREs pane
and click OK.
 Set up your ssh agent
* Might be PuTTY’s ssh agent Common Problem 2: testJUnitConfigura-
* Might be adding eval ‘ssh−agent‘ to your dotfiles tion fails, but you have followed these
steps. Probably you tried to run Test-
 Eclipse (or other ide you prefer) Prelab before configuring Eclipse’s JUnit
launcher. The configuration only adds
 Download Eclipse JDT from http://www.eclipse.org/ ‘−ea’ to new run configurations; it does
 Might need to install Java jdk (not jre) not change existing run configurations.
You now need to change the TestPrelab
 Configure Git at command line (if you are using it that way) run configuration individually so that it
* git config --global user.name "FIRSTNAME LASTNAME" includes ‘−ea’ in the VM arguments.

* git config --global user.email "[email protected]"


Common Problem 3: Lots of compiler
* git config --global push.default simple errors. Ensure that Eclipse is configured
* git config --global color.ui "true" to use compiler/language level 11
 Connect Eclipse to GitLab (might need your ssh key) or greater — and that the runtime
JDK/JRE is set to 11.
 Configure Eclipse’s JUnit Launcher to enable assertions:
* Go to Window / Preferences / Java / JUnit
* Check Add -ea to VM arguments for new launch configurations
 Graphviz (you can live without this, but nice to have)

Clone and Connect to Course Repositories See §0.7.1 above.

 git clone [email protected]:ece351-TERM/pdfs ece351-pdfs


 git clone --recursive [email protected]:ece351-TERM/USERID ece351-labs
 cd ece351-labs
 git remote add skeleton [email protected]:ece351-TERM/skeleton
Replace TERM with the term number,
e.g. ‘1201’. Replace USERID with your
user ID, e.g. ‘p24gill’.
Show us that you can do stuff

 git pull skeleton main


 git submodule init
 git submodule update
 Check that there is a non-empty lib directory
 Run TestPrelabConfig.java
 Run TestPrelabExceptions.java
 Run TestImports.java
 Run build.xml (this is the marking script)
26 ece351 lab manual [september 2, 2023]

 Right-click on a single test to run it individually


 Edit meta/hours.txt to specify your time spent getting set up See §0.9 below.
 Edit meta/collaboration.txt In collaboration.txt, indicate that for
 git commit -m ’edited meta files’ meta/*.txt lab0 you were mentored by one of
the course staff (this is just an exercise
 git push to check that you know who the staff
are and that you know how to edit the
collaboration.txt file properly).
overview 27

0.9 Metadata

Your workspace has a directory named meta that contains the follow-
ing two files in which you can describe a few things about how your
work on the lab went.

collaboration.txt To record your collaborators. Each line is a triple of You must edit this file for every lab,
lab number, collaboration role, and userid. Legal values for collab- even if all of your collaborations were
just conversation.
oration roles are: converser, partner, mentor, protege. The role field
describes the role of the other person, so lab2 mentor jsmith says that
J Smith was your mentor for lab2. Similarly, lab3 protege jsmith says
that J Smith was your protégé for lab3 (i.e., you were his mentor).
Both parties are required to report collaborations. If you collab-
orated with more than one person on a lab then you should put
multiple lines into this file for that lab: one line for each collabora-
tor on each lab.
hours.txt Estimate of the hours you worked on each lab. This file will These data will be used solely for
have a line for each lab like so: lab1 5 (indicating five hours spent the staff to assess the difficulty of the
labs. This assessment will be made in
on lab 1). For pre-lab / computing environment time, use lab0. aggregate, and not on an individual
basis. These data will not be used to
assess your grade. However, we will
0.10 Checklist for Every Lab not mark your lab until you report an
estimate of your hours.

Before you start working on the lab:

 cd ~/git/ece351-labs
 git checkout main
 git pull
 git pull skeleton main
 git submodule update
 manually resolve any conflicts

During/After every working session:

 Run tests in Eclipse


 Run tests via build.xml
 git commit -am ’description of work you did since last commit’
 git push

Before the deadline:

 Update meta/collaboration.txt
 Update meta/hours.txt
 git add meta
 git commit -m ’updated metadata for labX’
 git push
28 ece351 lab manual [september 2, 2023]

0.11 How to do these labs

The lab manual will tell you what files you need to edit, what li-
braries you should use, and what tests to run. The lab manual will
also explain what you need to do at each point in the code, and the
background concepts you need to understand the lab. Most of the
dagger († ) sections in the lab manual are to explain these background
concepts. Every place in the skeleton code that you need to edit is
marked by both a Todo351Exception and a TODO marker. We will dis-
cuss the next week’s lab in class every Friday.
Despite this clear and explicit instruction, some students have dif-
ficulty getting started on these labs. Why? What is missing from the
instructions described above? The lab manual doesn’t tell you the order In ece250 you did object-oriented
programming in the small. That is, you
in which you should edit the files. There is an important reason for this: defined structures that spanned one
execution order is the wrong way to think about developing object- or two classes and operations on those
structures that spanned one or two
oriented software (most of the time). The right way to think about
methods. The programs you worked
object-oriented software is to focus on developing cohesive and ex- on were perhaps several hundred lines
tendible modules (classes). In this way object-oriented programming long.
In ece351 you will work on a code
(in the large) is mentally quite different from procedural program- base that is over 8,000 lines of code, of
ming (in the small). In procedural programming (in the small) one which you will write about 1000 lines,
at an average rate of about 100 lines per
thinks primarily about the order in which the steps are performed.
week. The structures we will work with
Now don’t misunderstand this to mean that execution order are defined across dozens of classes,
doesn’t matter: it does. It’s just that execution order is generally a and the operations on those structures
are similarly defined across dozens of
separate design concern from modularity and extensibility. The best methods. This is the first time in the
way to understand the execution order of a large object-oriented pro- ece curriculum that you are exposed to
programming in the large. At this scale,
gram is to run it and observe it in the debugger or via some other
modularity and code structure really
tracing mechanism (e.g., printf). matter.
If you want to figure out what code you should edit first, run the By the standards of modern industrial
software development, 8,000+ lines is
test harness and see where an exception is thrown. Fix that. Then run just approaching medium sized. The
again, get another exception, etc.. The ‘fix’ step might be quick and code structuring ideas you will learn
in these labs can take you up to maybe
easy, or it might require reading several pages of the lab manual to
100,000 lines: beyond that you will need
understand what needs to be done at that particular point. Remem- new ideas and modularity mechanisms.
ber, the lab manual describes everything that needs to be done and
where it needs to be done, it just doesn’t describe the order in which
you should do it. Aligning your editing order with the program exe-
cution order is one way to guide your work that will help you build
an understanding of the code.
Thinking on your own to develop an understanding of the skele-
ton code is an important part of these labs. I promise you that it takes
less time and effort to study 8,000+ lines of code than to write it from
scratch.
overview 29

0.12 I think I found a bug in the skeleton code

You are welcome to do what you like with the skeleton code. Our
recommendation if you want to make a change is the following:

a. Report the change you want to make (in the forum, or to a staff
member). Preferably by generating a patch.

b. We tell you that the change is misguided and you really want to
do something else.

c. We say thanks and we patch the skeleton code with the change so
everyone can use it. Potentially you earn bonus marks for partici-
pation.

If you make the change yourself without reporting it then you lose
out on class participation points, and if someone else reports the
change and we go to apply a patch the patch will fail on your code.
This might or might not end up being a problem for you.

0.13 I want to change the skeleton code for my own usage

You may change the skeleton code for your own usage. Perhaps
you have a better idea of how to write something. If you make such
changes, be careful to not create more problems than you solve. Some
things to look out for include:

• Breaking the JUnit test harnesses.

• Breaking some other code that depends on the code you are
changing.

• Messing up one of the design patterns that are an explicit part of


what you are supposed to be learning.

• Changing something from immutable to mutable because you


do not want to learn to work with immutable data. First, this is
depriving yourself of one of the important lessons of the labs.
Second, you will likely be introducing some bugs that will be very
difficult to fix later. Immutable data is a good engineering practice
that helps you avoid many classes of difficult to diagnose and fix
bugs.

Algorithmic changes are usually safe, because the algorithmic parts


of this code are usually encapsulated.
30 ece351 lab manual [september 2, 2023]

0.14 Testing†
Program testing can be used to show the
Just because your code passes all of the tests does not mean it is presence of bugs, but never to show their
correct: there could be some as of yet unidentified test that it does absence! — Edsger W. Dijstra

not pass. Worse, that not-yet-identified test might occur during some
lab later in the term.
This might be the first course in which you have to write a ‘real’
program: that is, a non-trivial program that you will have to depend
on in the future. First, the labs are non-trivial: The total size of the
code for this course is about 8500 lines, of which you will have to
write about 1500, and roughly 7000 will be provided for you. Second,
the labs are cumulative: Each week you will run some labs that you
wrote in the past. At the end of the term you will run all of them
together. So testing will be important.

Test inputs may be generated either manually or automatically


with a testing tool. There are two main approaches used by tools: The Korat tool generates test inputs
systematic and random. Because the space of possible inputs is large, systematically based on representation
invariants.
possibly infinite, systematic approaches tend to generate all small in- The Randoop tool generates inputs
puts and no big inputs. Random approaches will generate a mixture randomly.

of small and large inputs. Programmers can use human insight to


generate interesting inputs that might be missed by automated tools.

There are different strategies for evaluating whether a test


passed or failed. One is to check the computed result against a
known answer. In other words, the test suite consists of a set of
known input/output pairs.
Another strategy is to check general properties of functions. For
example, that a function never returns null. An advantage of this
property-based approach is that one only needs the test inputs — the
corresponding outputs do not need to known. Some properties that
we will be interested in for this course include:
reflexive x.equals(x)
symmetric x.equals(y) ⇒ y.equals(x)
transitive x.equals(y) and y.equals(z) ⇒ x.equals(z)
antisymmetric x ≤ y and y ≤ x ⇒ x = y
total x ≤ y or y ≤ x
idempotent f ( x ) = f ( f ( x ))
invertible f 0 ( f ( x )) = x
Some functions do not have any of these properties. But if we
expect a function to have one or more of these properties then it is a
good idea to test for that property explicitly.
In particular, the first three properties define a mathematical equiv-
alence relation. We expect the equals() method in Java to represent a
mathematical equivalence relation.
A total order is defined to be transitive, antisymmetric, and total.
overview 31

The integers, for example, are totally ordered. We expect the com-
pareTo() method in Java to represent a total order.
In a previous offering of this course the
staff provided tests that only looked
A good test suite will contain some of everything mentioned
at general properties and none that
above: test inputs that are generated manually and automatically, examined specific input output pairs.
both systematically and randomly; evaluations that look at specific It turned out that students could write
code that had the general property
input/output pairs and at general properties. without actually computing anything
useful. They became unhappy when
they discovered that their code that
0.15 Phases of Compilation† passed the staff-provided tests did not
actually work when they wanted to run
it on a future lab.

Figure 12: Phases of compilation and


Scanner/Lexer/Tokenizer the labs in which we see them. The
1, 3
scanner, parser, and type checker are
considered the front end, whereas the
Parser optimizer and code generator are
1, 3, 5, 6, 9
considered the back end of the compiler.

Type Checker

Optimizer
2, 4, 7, 10, 11

Code Generator
7, 8, 11
32 ece351 lab manual [september 2, 2023]

0.16 Engineers and Other Educated Persons

There are a number of important general skills that all educated


persons, including engineers, should possess. The first two years of
engineering education need to focus on technical details in isolation
in order for you to have the technical competency to tackle more
interesting things. This focus on minutiæ sometimes comes at the
cost of limited growth in these larger skills. In this course you will
not only learn lots of tiny technical details, but you will also need
to develop some of the larger skills that all educated persons should
possess.

The ability to quickly find relevant information. Infor- Simplicity and elegance are unpopular
mation is more accessible now than at any previous point in human because they require hard work and
discipline to achieve and education to
history. There are a wide variety of sources available to you: the be appreciated.
course notes, the lab manual, old exams, the skeleton code, the rec- – Edsgar W. Dijkstra, 1997
ommended text books, other text books, lecture notes and slides and
videos from other professors, etc. You should be facile in using all of
these sources to find what you are looking for.
Books, in particular, have helpful features for navigating them,
such as the table of contents, the index, and the preface or introduc-
tion. You should know how to use these features.

The ability to quickly assess what information is rele- Fools ignore complexity; pragmatists
vant. As old-time engineers used to say, there is always more heat suffer it; experts avoid it; geniuses
remove it.
than light. Learn to see the light without being overwhelmed by the — Alan Perlis
heat.
When doctors, lawyers, accountants, and other professionals assess
a case study problem, the first order of business is to discern what
the relevant facts and issues are. Those professions explicitly train
their people to separate the wheat from the chaff. Engineering educa-
tion, especially in the first two years, is often guilty of spoon-feeding
its students with only and all of the relevant information, and hence
developing a sense of intellectual complacency and entitlement in its
students.
For example, you should be able to take a list of topics to be cov-
ered from the course outline and be able to use that to determine
which sections of the text book are relevant and which questions
from old exams are applicable.
[lab 0] overview 33

The ability to accurately assess the credibility of a source. There are two ways of constructing a
software design. One way is to make
Information comes from a variety of sources. Some of them are more
it so simple that there are obviously no
credible than others. An educated person knows, for example, that deficiencies. And the other way is to make
a report from the Transportation Safety Board of Canada is more it so complicated that there are no obvious
deficiencies.
likely to be accurate than a newspaper report — more likely, but not — C.A.R. Hoare, 1982
infallibly so. Turing Award Speech

The ability to manage large amounts of complex, inter-


connected details. The human body is a complex system that
doctors work with. The law is a complex system that lawyers work The total code base that we work with
with. Engineers work with complex socio-technical systems. in this course is about 8,000 lines.
In the first two years of engineering education you typically en- We provide about 7,000 lines of that
to you — which you are expected
counter only small text book problems in order to gain understand- to understand while developing the
ing of specific technical concepts in isolation. The labs for this course remaining 1,000 lines.
For reference, the source code for
might be the first time in your engineering education where you Microsoft’s Windows operating system
have had to work with a non-toy system. Our skeleton code is still is somewhere around 25 million lines
small by industrial standards, but it might be an order of magnitude of code. The source code for ati’s
video card driver is about 60 million
greater than what you have worked with in your previous courses. lines — more than double the Windows
Learning to manage this volume of complexity and interdependency operating system. Modern chip designs
also get into millions of lines of code.
is an essential part of your professional education.

The ability to think deep thoughts. In elementary school you Back where I come from we have universi-
learned to multiply positive integers. In middle school you learned ties — seats of great learning — where men
go to become great thinkers. And when they
to multiply all integers (including negative ones). In high school you come out, they think deep thoughts, and
learned to multiply matrices, perhaps in two different ways: cross- with no more brains than you have.
— The Wizard of Oz, 1939
product and dot-product. In first year you learned to write programs Of course, in the modern world, women
that multiply matrices. In this course you will learn a bit about how now go to (and graduate from) univer-
to design and implement programming languages in which someone sities in greater numbers than men.

might write a program to multiply matrices. You could go on in pure


math and logic courses to learn more about multiplication. It’s still For example, Gödel’s first incomplete-
the multiplication of your childhood, but your understanding of it ness theorem shows that any formal
logic that includes multiplication can
is much deeper for the twenty years of study that you have devoted express propositions that we know to
to it. Understanding one or a few things in some depth like this be true but which cannot be proven
within that logic. A result that rocked
hopefully cultivates in you the ability to think deeper thoughts in the world in 1931, and was an impor-
other domains and circumstances. tant intellectual precursor to Turing’s
proof of The Halting Problem in 1936.
Compiler Concepts: regular languages,
regular expressions, ebnf, recognizers,
parsing, recursive descent, lexing,
pretty-printing, abstract syntax tree
(ast)
Programming Concepts: classes, objects,
Lab 1 variables, aliasing, immutability, test-
driven development

Recursive Descent Parsing of W

We consider that the input and outputs of a circuit are a set of wave-
forms. We use the W language for expressing waveforms in text files.
Figure 1.1 shows an example W file and Figure 1.2 gives the gram-
mar for W .

Figure 1.1: Example waveform file for


A: 0 1 0 1 0 1 ; an or gate. The input pins are named A
B: 1 0 1 0 1 0 ; and B, and the output pin is named OR.
Our vhdl simulator will read a W file
OR: 1 1 1 1 1 1 ; with lines A and B and will produce a
W file with all three lines.

Program → (Waveform)+ Figure 1.2: Grammar for W in ebnf.


By convention we will call the top
Waveform → Id ‘:’ Bits ‘;’
production Program.
Id → Char ( Char | Digit | ‘_’ )*
Char → [A-Za-z]
Digit → [0-9]
Bits → (‘0’ | ‘1’)+

1.1 Grammars & ebnf †

Crafting: §4.3
Figure 1.2 lists the grammar for the language W in Extended Backus-
Naur Form (ebnf). A grammar is formal specification of the syntax bnf was developed by John Backus
of a language. A grammar tells us which sentences are included in while working on what became the
Algol-60 language. Backus lead the
the language and which are excluded. For certain classes of gram- Fortran compiler team at ibm in the
mars, we can systematically derive a program to recognize and parse 1950s, which was the first commercially
successful compiler.
sentences written in the language.
The grammar in Figure 1.2 also includes the lexical specification of
the language W : that is, it also tells us what sequences of characters
make legal tokens (words). For example, it tells us that an identifier People do not usually have numerals
(i.e., a name) must start with an alphabetic character, but may con- in their names. A notable exception
is former New York Times reporter
tain numerals or underscores in subsequent positions. Some times Jennifer 8. Lee. She says that many
grammars do not explicitly include a lexical specification, and instead computer programs prohibit her middle
name, forcing her to write out ‘Eight’.
assume that characters not separated by whitespace form legal tokens. She chose the middle initial 8. because
as a teenager she realized that there
were about 10,000 other ‘Jennifer Lee’s
in the United States.
36 ece351 lab manual [september 2, 2023]

There are a number of important concepts and notations for un- Examples are given with respect to
derstanding grammars and ebnf: the grammar for the W language in
Figure 1.2.

Production Rules: Each line with an arrow (→) represents a produc- The name ‘production rule’ is some-
tion rule. A rule has a right-hand side (rhs) and a left-hand side times shortened to just ‘production’ or
‘rule’.
(lhs). We say the the lhs derives the rhs, by which we mean that
the rhs can be substituted for the lhs. The lhs will always be a
single non-terminal, whereas the rhs can be some combination of
terminals and non-terminals.
Terminals: A terminal is a symbol that cannot be derived any further. e.g., colon (:), semi-colon (;), zero (0),
one (1), and underscore (_)
Non-terminals: A non-terminal is a symbol that can be derived. Ev- e.g., Program, Waveform, Id, Char, Digit,
ery non-terminal must appear on the lhs of at least one produc- and Bits
tion.
Alternation: The bar (|) character indicates alternatives. e.g., Bit → ‘0’ | ‘1’
says that a Bit can be a zero or a one.
Repetition: The star (∗) and plus (+) characters are used to indicate e.g., Program → Waveform+
repetition. Star means zero or more, whereas plus means one or says that a Program derives one or
more. The plus, star and bar are part of Extended Backus Naur Form more Waveforms.

(ebnf), but are not part of regular Backas Naur Form (bnf). We
will learn how to convert from ebnf to bnf in [N 2.71].
Derivation: Consider the input W program A: 1 0. Let’s derive this
string from the W grammar in Figure 1.2. In this example we are doing a leftmost
Program → (Waveform)+ top-level rule derivation: that is, we are expanding
+
the leftmost non-terminal at each step.
→ (Id ‘:’ Bits ’;’) derive Waveform If we expanded Bits before Id then it
→ (‘A’ ‘:’ Bits ’;’)+ derive Id would be a rightmost derivation.
→ (‘A’ ‘:’ ‘1’ ‘0‘ ’;’) + derive Bits Pragmatics: §2.1.3, p.48
Crafting: §4.1.1, §4.1.2
→ ‘A’ ‘:’ ‘1’ ‘0‘ ’;’ reached end of input
The recursive descent recognizer and parser that you will write
in this lab will work in this way: it will descend from the top of
the grammar, recursively if necessary (the grammar of W is not
recursive, but the grammar of F in lab3 is).

1.1.1 Summary of Grammar Notation Conventions


When writing grammars in ebnf or bnf we will often follow these
notational conventions for names. This chart will be repeated elsewhere
in the Lab Manual and Course Notes
ABC non-terminals as convenient. The conventions are
adopted from Programming Language
abc terminals
Pragmatics by Michael L. Scott.
XYZ nonterminals or terminals Pragmatics: §2
xyz token strings Crafting: §4.5.1
αβγ strings of arbitrary symbols
lhs Left Hand Side
rhs Right Hand Side
[lab 1] recursive descent parsing of W 37

1.1.2 Derivations†
Pragmatics: §2.1.3, p.48 + j1
A derivation is how we show, on paper, that an input string can be Tiger: j1
recognized by a grammar. We start with the top production in the Crafting: §4.1.1, §4.1.2
grammar and do substitutions until we derive the input string. In
most textbooks this is done algebraically, but it can also be done
graphically by drawing the partial ast at each step of the derivation.
Here we will show both ways.
Consider the input string ‘1+2+3’ and the following grammar, with
productions numbered for easy reference:

1. S → E
2. E → E+E
3. E → I NT
Here is a derivation of that input string with that grammar:

S E
S → E start at the top
S

E E
→ ( E + E) substitute by rule 2

E + E

E + E
→ (( E + E) + E) substitute by rule 2

E + E

E + E 3

1 2
→ ((1 + 2) + 3) substitute by rule 3 (3 times)
38 ece351 lab manual [september 2, 2023]

Here is another derivation of that input string with that grammar:

S E
S → E start at the top
S

E E
→ ( E + E) substitute by rule 2

E + E

E + E
→ ( E + ( E + E)) substitute by rule 2

E + E

1 E + E

2 3
→ (1 + (2 + 3)) substitute by rule 3 (3 times)

Since there are multiple derivations (i.e., parse trees) for the same
input string, we say that this grammar is ambiguous. [N 2.4]

1.2 Write a regular expression recognizer for W


Sources:
A recognizer is a program that accepts or rejects a string input based ece351.w.regex.TestWRegexSimple
on whether that string is a valid sentence in some language. A rec- ece351.w.regex.TestWRegexSimpleData
ece351.w.regex.TestWRegexAccept
ognizer for W will accept or reject a file if that file is a legal W ‘sen- ece351.w.regex.TestWRegexReject
tence’/‘program’. The website regexper.com will draw
W is a regular language. Regular is a technical term here that de- finite-state machine diagrams for
your regular expressions. The website
scribes the complexity of the grammar of W . Regular languages are regexpal.com lets you test out your
the simplest kind of languages that we will consider. The grammar of regular expressions. There are many
other similar websites.
a regular language can be described by a regular expression.
Your first task is to write a regular expression describing the gram-
mar of W . You will do this in two steps, first editing TestWRegexSimpleData
and running TestWRegexSimple, and second editing TestWRegexAccept.REGEX
and running both TestWRegexAccept and TestWRegexReject. Your final
answer should be stored in the field TestWRegexAccept.REGEX.
[lab 1] recursive descent parsing of W 39

1.2.1 Test-Driven Development†


Sources:
In traditional waterfall software development, one writes the code ece351.w.regex.TestWRegexSimple
first and then the tests; moreover, the code is written in a deductive ece351.w.regex.TestWRegexSimpleData
The danger of the traditional approach
mental style, thinking about the problem generally. is that it produces general code that
Modern agile software development advocates a different ap- does not actually pass any specific tests.
proach: Test-Driven Development (TDD). In TDD, one writes the tests Obviously if the test-suite is inadequate
then TDD will result in code that is
first and then the code. The code is written in a more inductive men-
not sufficiently general. But it will pass
tal style: what needs to be done to pass the next test case? some test cases.
The first exercise is to follow a Test-Driven Development approach
to building a regular expression recognizer for the W language.
There are test files r1.wave through r8.wave. You will start by running
TestWRegexSimple and observing that the first test passes and the
second test fails. The first test passes because the regular expression
you are given, in TestWRegexSimpleData, matches the contents of
r1.wave: this regular expression will work only for r1.wave.
Your job is to copy/paste/generalize that regular expression, mak-
ing the minimal modifications necessary for it to accept both r1.wave
and r2.wave. Then you will see that it fails r3.wave. Again, copy/-
paste/generalize, and you will see it fails r4.wave. Eventually you will
build up a regular expression that will accept up to r8.wave.
A challenge that you will face is that the W grammar in Figure 1.2
does not explicitly specify whitespace, whereas your regular expres-
sion will have to explicitly specify the whitespace. Grammars given
in ebnf usually implicitly assume some lexical specification that
describes how characters in the input string are to be grouped to-
gether to form tokens. Usually any amount of whitespace may occur
between tokens.

1.2.2 Complete the regular expression recognizer for W


Sources:
Copy your final regular expression recognizer for W from ece351.w.regex.TestWRegexAccept
TestWRegexSimpleData to TestWRegexAccept.REGEX, and then run ece351.w.regex.TestWRegexReject

TestWRegexAccept and TestWRegexReject. You will see that some tests


pass and some tests fail. Fix your regular expression until all of the
tests pass.

1.3 Write a recursive descent recognizer for W †

Sources:
Recursive descent is a style of writing parsers (or recognizers) by hand ece351.w.rdescent.WRecursiveDescentRecognizer
(i.e., without using a parser generator tool). In this style we make a Libraries:
ece351.util.Lexer
method for each non-terminal in the grammar. For recognizers these
Tests:
methods return void. ece351.w.rdescent.TestWRDRecognizerAccept
The bodies of these non-terminal methods consume the input ece351.w.rdescent.TestWRDRecognizerReject
one token at a time. Kleene stars (*) or plusses (+) become loops.
40 ece351 lab manual [september 2, 2023]

Alternation bars (|) become conditionals (if statements).


Execution of these non-terminal methods starts at the ‘top’ of the
grammar. This is what the term descent in recursive descent refers to.
The term recursive in recursive descent simply means that the various
non-terminal methods may call each other.
By convention we will name the top production of our grammars
Program, and so execution of our hand-written recursive descent
parsers and recognizers will start in a method called program.
Your W recognizer code will make use of the Lexer library class
that we provide. This class tokenizes the input string: i.e., strips out
the whitespace and groups the input characters into chunks called to-
kens. The lexer provides a number of convenience methods, including
methods for recognizing identifiers, so you won’t need to write an id()
method in your recognizer.
The lexer has two main kinds of operations: inspect the next token
to see what it looks like, and consume the next token. The inspect
methods are commonly used in the tests of loops and conditionals.
The consume methods are commonly used in regular statements.

1.4 Write a pretty-printer for W


Sources:
Pretty-printing is the opposite of parsing. Parsing is the process of ece351.w.ast.WProgram
constructing an abstract syntax tree (ast) from a string input. Pretty- ece351.w.ast.Waveform
Libraries:
printing is the process of producing a string from an ast. java.lang.String
For this step you will write the toString() methods for the W ast System.lineSeparator()
classes. These methods will be tested in the next step. Note that org.parboiled.common.ImmutableList

these toString() methods just return a string: they do not actually print
to the console nor to a file. Pretty-printing is the name for the inverse Once we have a function and its inverse
we can test that f 0 ( f ( x )) = x for any
of parsing, and does not necessarily involve actually printing the input x.
result to an output stream. Whereas parsing constructs a tree from a
string, pretty-printing produces a string from a tree.

1.5 Write a recursive descent parser for W


Sources:
A parser reads a string input and constructs a tree output. Specifi- ece351.w.rdescent.WRecursiveDescentParser
cally, an abstract syntax tree (ast). Libraries:
ece351.w.ast.WProgram
To write your recursive descent parser for W start by copying over ece351.w.ast.Waveform
the code from your recursive descent recognizer. The parser will be ece351.util.Lexer
a superset of this code. The difference is that the recognizer discards org.parboiled.common.ImmutableList
Tests:
the input whereas the parser will build up the ast based on the ece351.w.rdescent.TestWRDParserBasic
input. The ast classes are WProgram and Waveform. ece351.w.rdescent.TestWRDParserAccept
The test performed here is to parse the input, pretty-print it, re- ece351.w.rdescent.TestWRDParserReject

parse the pretty-printed output, and then compare the ast’s from the
two parses to see that they are the same.
[lab 1] recursive descent parsing of W 41

1.6 Object Diagram of Example W ast

AST Tree W Example

For the W File below the corresponding AST constructed will be:

W File AST

WProgram

+ waveforms: ImmutableList<Waveform>

[Waveform,Waveform]

Waveform Waveform

A: 1 0 1; + name : String + name : String


+ bits : ImmutableList<String> + bits : ImmutableList<String>
B: 0 1 0;

A [1,0,1] B [0,1,0]

1 0 1 0 1 0

Figure 1.3: Object diagram for example


W ast
42 ece351 lab manual [september 2, 2023]

1.7 Steps to success

Lab 1 Programming Procedure


W Files

Waveform
1.3
1.1 Lexer Pretty
Recursive Printer
Simple
Descent
Wprogram
Recognizer
Recursive
Descent
Recongnizer
Accept Reject
Double Rectangle

Accept Reject
Recursive
Descent Parser
1.2 Recursive
Descent Recognizer

Basic Accept Reject

1.4 Recursive
Descent Parser

Figure 1.4: Steps to success for §1.


Legend for icon meanings in Figure 1.5.
[lab 1] recursive descent parsing of W 43

Programming Procedure Legend

Class Files / TXT Files


that needed to be Tests needed
to be run
imported or exported

Contains all the files in


one section of the lab
Sources needed to and shows the name of
be edit the section.
Ex:“1.1 Recursive
Descent Recognizer”

Logical Process Legend

Class
TXT Files
that needed to be
imported or exported Method

Figure 1.5: Legend for Figure 1.4


44 ece351 lab manual [september 2, 2023]

1.8 Evaluation

The last pushed commit before the deadline is evaluated both on the
shared test inputs and on a set of secret test inputs according to the
weights in the table below. Note that you don’t earn any points
for the rejection tests until some of the
corresponding acceptance tests pass.
Shared Secret Also, you don’t earn any points for the
TestWRegexAccept 10 5 parser until the TestWRDParserBasic
tests pass.
TestWRegexReject 5 5
TestWRDRecognizerAccept 15 5
TestWRDRecognizerReject 5 5
TestWRDParserBasic 5 0
TestWRDParserAccept 20 10
TestWRDParserReject 5 5

1.9 Reading

1.9.1 Tiger Book1 1


A. W. Appel and J. Palsberg. Mod-
ern Compiler Implementation in Java.
• 1 Introduction Cambridge, 2004
• 2.0 Lexical Analysis
• 2.1 Lexical Tokens
• 2.2 Regular Expressions
• 3.0 Parsing
• 3.2.0 Predictive Parsing / Recursive Descent
– skip First and Follow Sets for now
– read Constructing a Predictive Parser on p.50
– skip Eliminating Left Recursion and everything that comes after
it — for now
• 4.1.1 Recursive Descent

1.9.2 Programming Language Pragmatics2 2


M. L. Scott. Programming Language
Pragmatics. Morgan Kaufmann, 3
• 2.2 Scanning edition, 2009
• 2.3.1 Recursive Descent

1.9.3 Web Resources


Thinking in Java3 3
B. Eckel. Thinking in Java. Prentice-
Prof Alex Aiken @ Stanford on recursive descent: Hall, 2002. http://www.mindview.net/
Books/TIJ/
https://www.youtube.com/watch?v=O7PXN0aHfZg
https://www.youtube.com/watch?v=q7qQn76l8ww
Compiler Concepts: trees, transforma-
tions, xml, svg
Programming Concepts: object contract,
object equality, mathematical equiva-
lence classes, dom vs. sax parser styles,
call-backs, iterator design pattern
Lab 2
Transforming W → SVG for Visualization

svg is an xml-based graphics file


While our circuit simulator programs will read and write W files, format: in other words, it is structured
people often prefer to look at waveforms in a graphical format. In text that describes some vectors to be
rendered. Any modern web browser
this lab we will translate W files into svg files for visualization. will render svg.
An example W file is shown in Figure 2.1. Figure 2.2 shows
what the corresponding svg looks like when visualized with a
web browser such as Firefox or a vector graphics program such as
Inkscape. Figure 2.3 shows the first few lines of the textual content of
the svg file.

Figure 2.1: Example waveform file for


A: 0 1 0 1 0 1 ; an or gate. The input pins are named A
B: 1 0 1 0 1 0 ; and B, and the output pin is named OR.
Our vhdl simulator will read a W file
OR: 1 1 1 1 1 1 ; with lines A and B and will produce a
W file with all three lines.

Figure 2.2: Rendered svg of W file



from Figure 2.1


46 ece351 lab manual [september 2, 2023]

Figure 2.3: First 10 lines of svg text of


<?xml version="1.0" encoding="UTF−8"?> W file from Figure 2.1. The boilerplate
at the top is already in W2SVG.java for
<!DOCTYPE svg PUBLIC "−//W3C//DTD SVG 1.1//EN" "http://www.w3.org/
your convenience.
Graphics/SVG/1.1/DTD/svg11.dtd"> Note that the origin of an svg
<svg xmlns="http://www.w3.org/2000/svg" width="100%" height="100%" canvas is the top left corner, not the
bottom left corner (as you might expect
version="1.1"> from math). Consequently, Y=200 is
<style type="text/css"><![CDATA[line{stroke:#006600;fill:#00 cc00;} text{font− visually lower than Y=100, since Y
size:"large";font−family:"sans−serif"}]]></style> counts down from the top.

<text x="50" y="150">A</text>


<line x1="100" x2="100" y1="150" y2="200" />
<line x1="100" x2="200" y1="200" y2="200" />
<line x1="200" x2="200" y1="200" y2="100" />
<line x1="200" x2="300" y1="100" y2="100" />

2.1 Write a translator from W → svg


Sources:
The idea of the transformation is simple: for each bit (zero or one) in ece351.w.svg.TransformW2SVG
the input file, produce a vertical line and a horizontal line. To do this Libraries:
ece351.w.ast.WProgram
the transformer needs to remember three things: the current X and
ece351.w.svg.Line
Y position of the (conceptual) cursor, and the YOffset so we can move ece351.w.svg.Pin
the cursor down the page when we start drawing the next waveform. Tests:
ece351.w.svg.TestLegalSVG
Run your transformations and inspect the output both visually, in
a program such as Firefox, and textually, using the text editor of your
Exercise 2.1.1 Where is the origin
choice. Ensure that the output is sensible. on an svg canvas? Which directions are
positive and which are negative?

2.2 Write a translator from svg → W


Sources:
This translator is the inverse of the previous one. We write this in- ece351.w.svg.TransformSVG2W
verse function as a way to test the previous translator. Now we can Libraries:
ece351.w.ast.WProgram
take advantage of the general property that x = f 0 ( f ( x )) or, in other
ece351.w.svg.Line
words, w = toW (toSVG (w)). ece351.w.svg.Pin
This inverse translation is a bit tricky because the svg files do Tests:
ece351.w.svg.TestW2SVG2W
not contain any explicit information about which line segments are
associated with which waveform: the svg file just contains a bunch of
text labels and a bunch of line segments. We have to infer, from the y
values, which line segments belong to which waveform/label. Then
we use the x values to infer the ordering of the bits, and the y values
to infer the bit values (0 or 1).
Many program analysis tasks used in optimizing compilers in-
volve this kind of inference: recovering information that is implicit
in a lower-level representation but was explicit in some higher-level
representation.
[lab 2] transforming W → svg for visualization 47

2.3 Introducing the Object Contract†


Sources:
The object contract is a term sometimes used to describe properties ece351.objectcontract.TestObjectContract
that all objects should have. It is particularly concerned with the Libraries:
ece351.objectcontract.TestObjectContractBase
equals and hashCode methods. The equals method should represent
Tests:
a mathematical equivalence relation and it should be consistent with ece351.objectcontract.TestObjectContract
the hashCode method. A mathematical equivalence class has three The object contract looks simple (and
properties: it is), but even expert programmers
have difficulty getting it right in all
reflexive x.equals(x) circumstances.
symmetric x.equals(y) ⇔ y.equals(x)
transitive x.equals(y) && y.equals(z) ⇒ x.equals(z)
Additionally, by consistent with hashCode we mean the following:
consistent x.equals(y) ⇒ x.hashCode() == y.hashCode()
Finally, no object should be equal to null: Exercise 2.3.1 What would happen
if every hashCode method returned 42?
not null !x.equals(null)

The file TestObjectContract contains some code stubs for testing the Exercise 2.3.2 What would happen
object contract. In order to fill in those stubs correctly you might if every hashCode method returned a
random value each time it was called?
need to read and understand the code in TestObjectContractBase.
What is the default behaviour of the equals and hashCode meth- The == (‘double equals’ or ‘equals
ods? In other words, what happens if you do not implement those equals’) comparison operator compares
the memory addresses of the objects
methods for your objects? Consider the code listings in Figure 2.4. referred to by the two variables.
What value is printed out by each statement?

Exercise 2.3.3 Exercise 2.3.4 Exercise 2.3.5


Double x = new Double(1.5); List a = new LinkedList(); Object p = new Object();
Double y = new Double(1.5); List b = new LinkedList(); Object q = new Object();
System.out.println(x.equals(y)); System.out.println(a.equals(b)); System.out.println(p.equals(q));
System.out.println(x == y); System.out.println(a == b); System.out.println(p == q);

Figure 2.4: Code comprehension exer-


2.3.1 Engineering Apparatus and Controls† cises. Draw an object diagram for each
statement in each listing. What will the
What is a kilogram? Since 1983, a metre has been defined as the result of the println statements be?
length of the path travelled by light in vacuum during a time interval
1
of 299,792,458 seconds.1 What is a second? Since 1967, a second has 1
17th Conférence Générale des Poids et
been defined as the duration of 9,192,631,770 periods of the radiation Mesures (CGPM) — Resolution 1 of the
CGPM (1983): Definition of the metre.
corresponding to the transition between the two hyperfine levels of Bureau international des poids et
the ground state of the caesium 133 atom.2 So we have definitions of mesures (BIPM). http://www.bipm.org/
en/CGPM/db/17/1/
distance and time in terms of physical constants of nature. 2
Unit of time (second). SI Brochure.
The kilogram, by contrast, is defined by a cylinder (with nicely BIPM. http://www.bipm.org/en/
chamfered edges) of platinum-iridium alloy in a vault in Paris: the publications/si-brochure/second.
html
International Prototype Kilogram (IPK). It is the last remaining base
unit of measure to be defined by a physical artefact rather than a
property of nature (Figure 2.5).
48 ece351 lab manual [september 2, 2023]

Figure 2.5: Infographic and photo


of the International Kilogram
Prototype stored in Paris. From
Phys.org http://phys.org/news/
2011-11-quandary-kilo-triggers-weighty-reflexion.
html

Suppose that we want to build a scale — an engineering apparatus


to measure mass. The purpose of such a device, in principle, is to
compare other masses to the IPK. This comparison has three possible
outcomes: less than, more than, or same.
Many developed countries, including Canada, have an official
replica of the IPK. One would expect that all of these official replicas
have the same mass as the IPK. Suppose that we tested our scale with
only these official replicas, and every time the scale reported same, as
expected. Can we thus conclude that our scale is really accurate? No:
perhaps it always returns same; maybe it never returns more or less.
We need some control objects that actually have more or less mass
than the IPK, so that we can also check that our scale does return
more or less when appropriate.
Part of the software exercise here is to build the apparatus (scale),
and part of it is to build these control objects so that we know the
apparatus works properly. Then, in the future, we can use the appa-
ratus on unknown objects and trust the results it gives us.
In this analogy, it is obvious that the control objects are supposed
to weigh more or less than the IPK. There are many objects in the
world that weigh more or less than the IPK. In our software world,
no object is ever supposed to violate the object contract, and in prac-
tice very few objects violate that contract. So it might feel weird to
intentionally create control objects that violate the object contract.
Nobody would ever want such an object in a regular program. But
we need these control objects to assess the accuracy of our apparatus.
[lab 2] transforming W → svg for visualization 49

2.3.2 Learning to write professional code†


It is important, in general, for you to learn the Object Contract be-
cause it is a set of properties that must hold for all code you write in
any object-oriented language (Java, C++, C#, Python, etc.). It is also
specifically important that you learn the Object Contract to do the
labs in this course, because checking the equality of objects forms the
basis of the automated marking scripts. In this exercise you are also
learning how to write professional code, with the Object Contract as
an example. The steps that we are following here are:

a. Identify the mathematical properties that the code should have.


(Object contract: reflexive, symmetric, transitive, etc.)
b. Write methods that check these mathematical properties on some
input object(s). (e.g., checkEqualsIsReflexive(x)) Call these the prop-
erty check methods.
c. Verify that the property check methods return true when expected
by using third party inputs (e.g., java.lang.String and java.lang.Integer).
d. Verify that the property check methods return false when ex-
pected by constructing pathological control objects with known
deviant behaviour. (e.g., constructAlwaysTrue(), constructAlwaysFalse(),
constructToggler(), etc.)
e. Use the verified property check methods to assess real code.
(We aren’t doing this step in this exercise.)

In future labs we will use staff-provided property check methods


for Object Contract properties. The purpose of this exercise is for
you to see what these property check methods look like, both so you
understand the Object Contract and so you understand how your
future labs will be graded.

2.3.3 Reading
http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#equals(java.lang.Object)
http://www.artima.com/lejava/articles/equality.html
http://www.angelikalanger.com/Articles/JavaSolutions/SecretsOfEquals/Equals.html
Implementation of java.util.AbstractList.equals()
Effective Java 3 3
J. Bloch. Effective Java. Addison-Wesley,
§0.14 of this lab manual on Testing 2001
50 ece351 lab manual [september 2, 2023]

2.4 Evaluation

The last pushed commit before the deadline is evaluated both on the
shared test inputs and on a set of secret test inputs according to the
weights in the table below.
We provide you with about 30 W files for these equations.

Current New Equation


TestLegalSVG 20 10 legalSVG(TransformW2SVG(w))
TestW2SVG2W 40 10 SVG2W(W2SVG(w)).equivalent(SVG2W(staff.svg))
TestObjectContract 15 5 see §2.3 above

2.5 Steps to success

Lab 2 Programming Procedure

WProgram Line Pin WProgram Line Pin

W FIle TransformW2SVG SVG File TransformSVG2W W FIle

W2SVG W2SVG2W

2.1 Write a translator from W to svg 2.2 Write a translator from svg to W

TestObjectContract

TransformW2SVG
ObjectContract
Base

2.4 dom vs. sax parser styles


2.3 Introducing the Object Contract

Figure 2.6: Steps to success for §2.


Legend for icon meanings in Figure 1.5.
[lab 2] transforming W → svg for visualization 51

LAB 1
W File

1.1
Regular Expression 1.4
Recognizer
Recursive Descent
Parser

Waveform WProgram

1.3 1.3
Pretty Printing Pretty Printing

LAB 2

2.1 2.2
TransformW2SVG TransformSVG2W

SVG File

Figure 2.7: How §1 and §2 fit together


Compiler Concepts: context-free gram-
mars, ll(1) grammars, predict sets,
parse trees, precedence, associativity,
commutativity, program equivalence
Programming Concepts: inheritance,
polymorphism, dynamic dispatch, type
Lab 3 tests, casting, memory safety, composite
design pattern, template design pattern,
singleton design pattern, recursive
Recursive Descent Parsing of F functions, recursive structures, higher-
order functions

This lab introduces formula language F , which we will use as an


intermediate language in our circuit synthesis and simulation tool. In
a subsequent lab we will write a translator from V to F .
Compilers usually perform their optimizations on programs in
intermediate forms. These intermediate forms are designed to be
easier to work with mechanically, at the cost of being less pleasant
for people to write large programs in. A program written in F , for
example, is just a list of boolean formulae. This is relatively easy to
manipulate mechanically. V , by contrast, has conditionals, module
structure, etc., which are all of great benefit to the V programmer but
are more work for the compiler writer to manipulate. In the next lab
we will write a simplifier/optimizer for F programs.

Program → Formula+ $$ Figure 3.1: ll(1) Grammar for F . F is


a very simple subset of V , which is in
Fomula → Var ‘<=’ Expr ‘;’
turn a subset of the real vhdl. F in-
Expr → Term (‘or’ Term)* cludes only the concurrent assignment
Term → Factor (‘and’ Factor)* statement and the boolean operators
conjunction (and), disjunction (or), and
Factor → ‘not’ Factor | ‘(’ Expr ‘)’ | Var | Constant negation (not). Note that in the concrete
Constant → ‘‘0’’ | ‘‘1’’ syntax the constants ‘0’ and ‘1’ are sur-
rounded by single quotes, whereas no
Var → id
other terminals are.

3.1 A Tale of Three Hierarchies†

There are (at least) three hierarchies involved in understanding this


lab — and all of the future labs. In lab1 we met the abstract syntax
tree (ast) and the parse tree (the execution trace of a recursive descent
recognizer/parser). In this lab we will also meet the class hierarchy of
our Java code. In the past labs there wasn’t much interesting in the
class hierarchy, but now (and for the remainder of the term) it is a
central concern.
54 ece351 lab manual [september 2, 2023]

3.1.1 Parse Tree


Consider the F program X <= A or B;. When we execute the recursive
descent recognizer that we are about to write its call tree will look
something like this:
Exercise 3.1.1 Try to draw a few
program() parse trees on paper for other examples
formula() such as:
var() • X <= A and B;
• X <= A or B and C;
id()
• X <= A and B or C;
‘X’
‘<=’
expr()
term()
factor()
var()
id()
‘A’
‘or’
term()
factor()
var()
id()
‘B’ We will use EOI (end of input) and EOF
‘;’ (end of file) and $$ interchangeably.
EOI
See the readings, especially Bruce
Eckel’s free online book Thinking in
Java, for more background material on
3.1.2 ast inheritance/sub-classing.

Recall that ‘ast’ stands for abstract syntax tree, and is the important
Exercise 3.1.2 For the examples
information that we want to remember from the parse tree above. above, also draw the corresponding
This important information is the structure of the tree and the inter- ast. Both the call tree and the resulting
ast depend on the input.
esting nodes. Things like term, factor, and parentheses are ways that
the input string communicates its structure to the parser: the ast
doesn’t need to retain those things, just the structure itself.
FProgram1
FProgram
AssignmentStatement
AssignmentStatement1
outputVar = X
expr = OrExpr outputVar expr
left = A
X OrExpr1
right = B
left right

VarExpr1 VarExpr2

A B

Figure 3.2: Object diagram of ast for F


program X <= A or B;
[lab 3] recursive descent parsing of F 55

AST Tree F Example


For the F File below the corresponding AST constructed will be:

F File AST

FProgram

+ formulas: ImmutableList<AssignmenStatement>

[AssignmentStatement]

AssignmentStatement
+ outputVar: VarExpr
+ expr: Expr

x <= a or ( b and c );
VarExpr OrExpr
+ identifier: String + left: Expr
+ right: Expr

x
VarExpr AndExpr
+ identifier: String + left: Expr
+ right: Expr

a
VarExpr VarExpr
+ identifier: String + identifier: String

b c

Figure 3.3: Object diagram for example


F ast
56 ece351 lab manual [september 2, 2023]

3.1.3 A peak at the Expr class hierarchy


The class hierarchy, on the other hand, does not depend on the in-
put: it is how the code is organized. The following listing highlights
For this lab you will need the classes
some features of the Expr class hierarchy and also demonstrates
AndExpr, OrExpr, NotExpr, VarExpr, and
polymorphism/dynamic-dispatch. ConstantExpr. You will also need the
classes that those depend on, but you
Exercise 3.1.3 Find the class common.ast.Expr in Eclipse, right-click
will not need classes corresponding to
on it and select Open Type Hierarchy. This will give you an interactive more esoteric logical operators such as
view of the main class hierarchy. exclusive-or. This lab also uses classes
outside of the Expr class hierarchy, but
you don’t need to draw them on your
Exercise 3.1.4 Draw this hierarchy in a uml class diagram, excluding uml class diagram.
the classes in common.ast that you do not need for this lab.

Figure 3.4: Some highlights from the


1 abstract class Expr { Expr class hierarchy
2 abstract String operator();
3 }
4 abstract class BinaryExpr extends Expr {
Exercise 3.1.5 Draw a uml class
5 final Expr left; diagram for this code.
6 final Expr right;
7 abstract BinaryExpr newBinaryExpr(Expr l, Expr r);
Exercise 3.1.6 Draw an object diagram
showing the relationship between the
8 public String toString() { return left + operator() + right; } variables and the objects for the main
9 } method. (We have drawn this kind
of diagram on the board in class, and
10 final class AndExpr extends BinaryExpr { these kinds of diagrams are also drawn
11 String operator() { return " and "; } by PythonTutor.com.)
12 BinaryExpr newBinaryExpr(Expr l, Expr r) { return new AndExpr(l,r); } Exercise 3.1.7 Annotate this diagram
13 } with the static type of each variable
14 final class OrExpr extends BinaryExpr { and the dynamic type of each object. In
the past we only considered the case
15 String operator() { return " or "; } where the static type of the variable was
16 BinaryExpr newBinaryExpr(Expr l, Expr r) { return new OrExpr(l,r); } the same as the dynamic type of the
object referred to by that variable. Now
17 } we also consider the case where the
18 final class Main { dynamic type of the object is a subtype
19 public static void main(String[] args) { of the variable’s static type.

20 BinaryExpr b = new AndExpr(); // why isn’t the type of e AndExpr? Exercise 3.1.8 Is there any aliasing
21 Expr[] a = new Expr[3]; // an empty array of size 3 occurring in the main method? If so,
what is it?
22 // what is the type of the object stored in each element of the array?
23 a[0] = b;
24 a[1] = new OrExpr();
25 a[2] = b.newBinaryExpr(null,null); // monomorphic call site
26 for (int i = 0; i < a.length; i++) {
27 Expr e = a[i];
28 System.out.println(e.operator()); // polymorphic call site
29 System.out.println(e.toString()); // mono or polymorphic?
30 }
31 }
32 }
[lab 3] recursive descent parsing of F 57

3.2 Polymorphism & Dynamic Dispatch†

A monomorphic call site is one where the dynamic dispatch will always
resolve to the same target. A polymorphic call site is one where the
dynamic dispatch might resolve to a different target each time the
call is executed.
The Java compiler/runtime system inserts code like the following
at each potentially polymorphic call site:
if (a[i] instanceof AndExpr) {
return AndExpr::operator(); }
else if (a[i] instanceof OrExpr) {
return OrExpr::operator(); }

This feature of implicit type tests to determine which method def-


inition to execute is one of the key features of object-oriented pro-
gramming. Many design patterns are about ways to organize code
with this feature that make certain kinds of anticipated changes to
the software modular.
58 ece351 lab manual [september 2, 2023]

3.3 Write a recursive-descent recognizer for F


Sources:
Figure 3.1 lists a grammar for F . A recognizer merely computes ece351.f.rdescent.FRecursiveDescentRecognizer
whether a sentence is generated by a grammar: i.e., its output is Libraries:
ece351.util.Lexer
boolean. A parser, by contrast, also constructs an abstract syntax tree
ece351.util.CommandLine
(AST) of the sentence that we can do things with. A recognizer is Tests:
simpler and we will write one of them first. ece351.f.rdescent.TestFRDRecognizerAccept
ece351.f.rdescent.TestFRDRecognizerReject
The idea is simple: for each production in the grammar we make
a function in the recognizer. These functions have no arguments and
return void. All these functions do, from a computational standpoint,
is examine the lexer’s current token and then advance the token.
If the recognizer manages to push the lexer to the end of the input
without encountering an error then it declares success.
Exercise 3.3.1 Can you write a regular expression recognizer for F ?

3.4 Write a pretty-printer for the ast


Sources:
Pretty-printing is the inverse operation of parsing: given an AST, ece351.f.ast.FProgram
produce a string. (Parsing produces an ast from a string.) In this ece351.common.ast.AssignmentStatement
ece351.common.ast.ConstantExpr
case the task is easy: implement the toString() methods of the ast ece351.common.ast.UnaryExpr
classes. When the program invokes toString() on the root node of the ece351.common.ast.BinaryExpr
ast the result should resemble the original input string. Tests:
manual inspection of output
parser tests below
3.5 Equals, Isomorphic, and Equivalent†

We were previously introduced to the object contract and the equals


method, and learned that all of the equals methods together are sup-
posed to define a partitioning of the objects, which has three proper- A partitioning is also known as a math-
ties: reflexivity, symmetry, and transitivity. But we did not talk about ematical equivalence class. We’ll try to
move towards the term ‘partitioning’
the specific semantics of the equals method. For example, always and away from ‘mathematical equiva-
returning true from every equals method would define a partition- lence class’, because we use the words
‘equivalence’ and ‘class’ in other pro-
ing: there would just be one partition and every object would be in it. gramming contexts.
That’s not particularly useful.
In this lab we will give some more meaningful semantics to the
equals method, and we will also define two other partitionings with
different semantics: isomorphic and equivalent. Here is an example:
equals isomorphic equivalent
X <= A or !A; ←→ X <= A or !A; ←→ X <= !A or A; ←→ X <= 1;

equals: Two objects are equals if any computation that uses either one
will produce identical results.1 This can only be true if the objects 1
B. Liskov and J. Guttag. Program
are immutable (i.e., the values stored in their fields do not change). Development in Java: Abstraction, Spec-
ification, and Object-Oriented Design.
isomorphic: We will say that two objects are isomorphic if they have Addison-Wesley, 2001
the same elements and similar structures. For example, we will
consider the expressions X or Y and Y or X to be isomorphic: they
[lab 3] recursive descent parsing of F 59

are permutations of the same essential structure. Any two objects equals ⇒ isomorphic
that are equals are also isomorphic, but isomorphic objects are not
necessarily equals. For example, the expressions X or Y and Y or X
are isomorphic but not equals.
equivalent: We will say that two objects are equivalent if they have You will not be implementing equiva-
the same meaning, but possibly different structures and possibly lent this term, but you will eventually
(not for this lab) need to understand
different elements. For example, the expression 1 is equivalent what it means and how we have imple-
to the expression X or !X: they have the same meaning but totally mented it for FProgram.
different syntax. Any two objects that are isomorphic are also isomorphic ⇒ equivalent
equivalent, but not necessarily vice versa (as in this example).

3.6 Write Equals and Isomorphic for the F ast classes


Sources:
Start with VarExpr and ConstantExpr. You should be able to fill in ece351.common.ast.VarExpr
these skeletons with the knowledge you have learned so far. ece351.common.ast.ConstantExpr
ece351.common.ast.AssignmentStatement
For the FProgram and AssignmentStatement classes you will need ece351.common.ast.UnaryExpr
to make recursive function calls. Understanding why these calls will ece351.common.ast.CommutativeBinaryExpr
ece351.f.ast.FProgram
terminate requires understanding recursive object structures.
Libraries:
For the UnaryExpr and CommutativeBinaryExpr classes you will ece351.util.Examinable
also need to understand inheritance and polymorphism. ece351.util.Examiner
ece351.util.ExaminableProperties
Exercise 3.6.1 Why do we use getClass() for type tests instead of
Tests:
instanceof? ece351.f.test.TestObjectContractF

Exercise 3.6.2 Why do we cast to UnaryExpr instead of NotExpr?

Exercise 3.6.3 How does the operator() method work?

3.7 Write a recursive-descent parser for F


Sources:
Our recursive-descent parser will follow the same structure as our ece351.f.rdescent.FRecursiveDescentParser
recursive-descent recognizer. The steps to write the parser are the Tests:
ece351.f.rdescent.TestFRDParserBasic
same as in the previous lab: ece351.f.rdescent.TestFRDParser
• Copy the procedures from the recognizer. It is important that you follow these
• For each procedure, change its return type from void to one of the steps. If you try to implement the
parser for F in one big method you will
ast classes. run into a lot of trouble.
• Modify each procedure to construct the appropriate ast object
and return it.

Exercise 3.7.1 Is the result of pretty-printing an ast always character-


by-character identical with the original input program? Why?

Exercise 3.7.2 Write a program in F for which the result of pretty-


printing the ast is not identical to the original input string.
60 ece351 lab manual [september 2, 2023]

3.8 Steps to success

Lab 3 Programming Procedure


F Files

Lexer FRecursive
VarExpr
DescentParser

FProgram

ConstantExpr FRecursive
Descent
Object FRDParser f.ast.*
Recongnizer
UnaryExpr ContractF

Assignment
Statement
FRDParser common.
Accept Reject
Commutative Basic ast.*
BinaryExpr
3.4 Write a pretty-
3.2 Write a recursive printer for the ast
3.5 Write Equals and Isomorphic 3.6 Write a recursive-
-descent recognizer for F
for the F ast classes descent parser for F

Figure 3.5: Steps to success for §3.


Legend for icon meanings in Figure 1.5.

3.9 Common Missteps

Not testing your Pretty-Printer before starting to write the


parser. There is no test suite for the pretty printer. You need to test
it manually by looking at the output. If your pretty printer doesn’t
work correctly then you will get weird failures in the parser tests.

Not following the methodology for writing recursive descent


recognizers and parsers from the grammar. In lab1 you could have
got away with this because W is a regular language and so there was
no real recursion in the grammar. Because F has nested expressions
there is recursion in the grammar: Expr calls Formula, which in turn
calls Expr back. If you do not follow the methodology, then you will
probably get a parser that works for short formulas but not for long
ones. And it will probably also have other bugs.

Associativity of BinaryExprs. Since and and or are not associa- TestFRDParserBasic.testLeftAssociativeOr()


tive, arithmetically you can parse them either way (left-associative or
right-associative). For uniformity of testing, we require the parse to
be left-associative: i.e., a or b or c should parse as (a or b) or c.
[lab 3] recursive descent parsing of F 61

3.10 Evaluation

The last pushed commit before the deadline is evaluated both on the
shared test inputs and on a set of secret test inputs according to the
weights in the table below.
The testing equation for this lab is:
∀ AST | AST.equals(parse(prettyprint(AST)))
At present we have about 80 ast’s to plug into this equation. We
might explore some mechanical techniques to generate more ast’s. One of the great benefits of having a
testing equation is that we can separate
out the correctness condition (the
Current New equation) from the inputs. Then we can
TestFRDRecognizerAccept 5 5 look for other techniques to generate
inputs. By contrast, if we test with
TestFRDRecognizerReject 5 5 just specific input/output pairs the
TestFRDParserBasic 5 5 correctness condition is that we get the
output for that input.
TestFRDParser 30 10
TestObjectContractF 20 10

3.11 Background & Reading


See also the readings for lab1 above.
Tiger Book
3.0 Parsing
3.1 Context-Free Grammars
3.2.0 Predictive Parsing / Recursive Descent
• including First and Follow Sets
• read Constructing a Predictive Parser on p.50
• skip Eliminating Left Recursion and everything that comes after
it — for now (we’ll study this for exam, but not for the labs)
4.1.1 Recursive Descent

Thinking in Java 2 is a good resource for object-oriented program- 2


B. Eckel. Thinking in Java. Prentice-
ming in general and Java in particular. Hall, 2002. http://www.mindview.net/
Books/TIJ/

1 Introduction to Objects
6 Reusing Classes
7 Polymorphism
If you are not already familiar with
Some topics you should be comfortable with include: these topics then this lab is going to
take you longer than five hours. Once
you get through these background top-
• inheritance / subtyping / subclassing ics then the rest of the course material
• polymorphism / dynamic dispatch should be fairly straightforward.
• objects vs. variables, dynamic vs. static type
• type tests, casting

ECE155 Lecture Notes In winter 2013 ece155 switched to Java (from


C#), and the third lecture covered some of the differences. http://patricklam.ca/ece155/
lectures/pdf/L03.pdf
Compiler Concepts: intermediate lan-
guages, identity element, absorbing
element, equivalence of logical for-
mulas, term rewriting, termination,
confluence, convergence
Programming Concepts: interpreter
Lab 4 design pattern, template design pattern,
representation invariants

Circuit Optimization: F Simplifier

F is an intermediate language for our circuit synthesis and simula-


tion tools, in between vhdl and the final output languages, as was
depicted in Figure 1. In a subsequent lab we will write a translator
from vhdl to F .
Compilers usually perform their optimizations on programs in an
intermediate language. These intermediate languages are designed
to be easier to work with mechanically, at the cost of being less pleas-
ant for people to write large programs in. A program written in F ,
for example, is just a list of boolean formulas. This is relatively easy
to manipulate mechanically. vhdl, by contrast, has conditionals,
module structure, etc., which are all of great benefit to the vhdl pro-
grammer but are more work for the compiler writer to manipulate.
The simplifier we will develop in this lab will work by term-
rewriting. For example, when it sees a term of the form x + 1 it will
rewrite it to 1. Figure 4.5 lists the algebraic identities that your sim-
plifier should use.
Our simplifier works at a syntactic level: i.e., it does not have a
deep understanding of the formulas that it is manipulating. The
testing framework to determine if the simplifier has produced correct
output does, however, do a deep semantic analysis of the formulas,
as discussed in §4.9.
You might have previously studied semantic techniques for boolean
circuit simplification such as Karnaugh Maps and the Quine-McCluskey
algorithm for computing prime implicants. By completing this lab
you will have the necessary compiler-related knowledge to imple-
ment these more sophisticated optimization algorithms in the future.
64 ece351 lab manual [september 2, 2023]

4.1 Iteration to a Fixed Point and Termination†


We will use this idea of iterating to
A common technique in optimizing compilers is to iterate to a fixed a fixed point on paper in the course
point: that is, to keep applying the optimizations until the program notes when analyzing a grammar to
determine if it is ll(1) and when doing
being transformed stops changing. Figure 4.1 shows the code in dataflow analysis.
Expr.simplify() that we will run in this lab to iterate to fixed point.

Figure 4.1: Code listing for


final public Expr simplify() { Expr.simplify() showing iteration to
Expr e = this; a fixed point. This is the only imple-
mentation of simplify() in the project,
while (true) { // loop forever? and it is provided for you. You will be
final Expr simplified = e.simplifyOnce(); working on implementing simplify-
Once() for some of the ast classes.
if (simplified.equals(e)) {
// we’re done: nothing changed
return simplified;
} else {
// something changed: keep working
e = simplified;
}
}
}

From this termination condition we


Notice the loop while (true). Is this an infinite loop? Will this code ever can reason that the simplify method
is idempotent: that is, if we apply it
terminate? Maybe. Inside the body of the loop there is a return state- twice we get the same result as if
ment that will exit the loop (and the method) when simplified.equals(e). we apply it once. We first saw this
We refer to this test to decide whether to exit the loop as the termina- property above in §0.14 in the form
f ( x ) = f ( f ( x )). In our code here,
tion condition. Will this termination condition ever be true? Will it x.simplify().equals(x.simplify().simplify())
become true for any possible F program that we might optimize? Idempotence is an important gen-
eral property that can be exploited
Yes, this termination condition will become true for any possible F in testing. It is one of the main gen-
program that we optimize (unless there are other bugs in your code). eral properties that must hold for
How do we know this? Because our optimizer applies rewrite rules data synchronizers: if a synchro-
nization is performed, and no data
that make the formula smaller, and there is a limit to how small a changes on either side, then a sec-
formula can get. Therefore, we say that our term rewriting system is ond synchronization is performed,
the second synchronization should
terminating: there is no input F program on which it will get stuck in not need to perturb the data. http:
an infinite loop. //en.wikipedia.org/wiki/Idempotence
[lab 4] circuit optimization: F simplifier 65

4.2 Confluence†

We now know that our term rewriting system is terminating: it will


never get stuck in an infinite loop. But will it always compute the
same result? What if we apply the rewrite rules in a different order?
A term rewriting system that always reaches a unique result for any
given input, regardless of the order in which the rules are applied, is
called confluent.
Suppose, as a counter-example, that we consider a term rewriting

system for the square root operator ( ) that has two rewrite rules,
one that gives the positive root and another that gives the negative

root. This rewrite system is not confluent. For example, 4 7→ 2 and

4 7→ −2, and 2 6= −2. It is harder to test code that implements a
non-confluent term rewriting system because there could be different
outputs for the same input.
The term rewriting system that we are implementing in this lab is
confluent. Consider the example in Figure 4.2. Whichever order we
apply the rules in we eventually get to the same result.

Figure 4.2: Example of two confluent


rewrites converging on a common
solution

(X or !X) and (Y and !Y)

1 and (Y and !Y) (X or !X) and 0

1 and 0

convergent = confluent + terminating


Proving that a term rewriting system is confluent is, in the general
case, a fairly difficult problem. For our purposes, we will consider a
There are a variety of sources on-
rewrite system to be confluent if there are no two rules that have the line where you can read more
same left-hand side. Our counter-example above had two rules with if you are interested. One good
√ one is by Paul Klint: http:
x as the left-hand side. The rules that we are implementing for this
//www.meta-environment.org/doc/
lab all have different left-hand sides. books/extraction-transformation/
term-rewriting/term-rewriting.html
66 ece351 lab manual [september 2, 2023]

4.3 Mathematical Properties of Binary Operators†

These are things you’ve probably learned and forgotten many times.
Turns out they are actually important in this course.
The term binary operators refers to the grammatical fact that these
operators take two operands (arguments) —- not to the semantic is-
sue that some operators apply to boolean (binary) values. For exam-
ple, conjunction (logical and) is binary because it takes two operands,
not because it operates on true/false values.

4.3.1 Commutativity
Changing the order of the arguments doesn’t matter, e.g.:

x+y = y+x
Addition, multiplication, conjunction (and), disjunction (or) are all
commutative. Subtraction and division are not commutative: the
order of the operands (arguments) matters.

4.3.2 Associativity
Wikipedia says it nicely:1 1
http://en.wikipedia.org/wiki/
Associative_property
Within an expression containing two or more occurrences in a row of
the same associative operator, the order in which the operations are
performed does not matter as long as the sequence of the operands is
not changed. That is, rearranging the parentheses in such an expres-
sion will not change its value. Consider, for instance, the following
equations:

1 + (2 + 3) = (1 + 2) + 3
Addition, multiplication, conjunction (logical and), disjunction (logi-
cal or) are all associative.
Subtraction, division, exponentiation, and vector cross-product are
not associative. The terminology can be a little bit confusing here.
On the one hand we say that subtraction is not associative, and on the
other hand we say it is left associative because:

x − y − z = ( x − y) − z
Exponentiation is right associative:

z z)
x y = x (y
Saying that an operator is not associative means that it is either left
associative or right associative.
[lab 4] circuit optimization: F simplifier 67

4.4 Transforming BinaryExprs to NaryExprs


Where to implement this idea will be
To test your simplifier we’ll need to compare the output it computes described below.
with the output that the staff simplifier computes. In a previous lab We say that an operator is binary
because it has two operands. It is
we wrote an isomorphic method that accounted for the commutativity just a coincidence that our binary
property of many binary operations: e.g., it would consider x + y and operators operate on boolean values.
For example, arithmetic addition is
y + x to be isomorphic. That’s a good start, but we’ll need more than also a binary operator, even though
this to really evaluate your simplifier mechanically. it operates on numbers. We say that
Many of the binary operations we consider are also associative. an operator is n-ary if it has a variable
number of operands, possibly more
That is, x + (y + z) is equivalent to ( x + y) + z. Disjunction and than two.
conjunction are both associative, as are addition and multiplication. conjunction means ‘logical and’
disjunction means ‘logical or’
Detecting the equivalence of x + (y + z) and ( x + y) + z is tricky be-
cause these parse trees are not structurally similar (recall that ‘struc-
turally similar’ is what isomorphic means). We could try to implement
a really clever equivalent method that detects this equivalence, or we
could go all out and compute the truth tables for each expression and
compare them, or we could try something else: transforming these
trees into a standardized form for which it is easy to check isomor-
phism.
Figure 4.3 shows four different trees that all represent the same
logical expression. The first three trees are binary. Comparing them
for equivalence is difficult. However, all three binary trees can be
transformed into the sorted n-ary tree fairly easily, and these sorted
n-ary trees can be easily compared to each other for isomorphism.
Note that this transformation requires that the operator (e.g., ‘+’)
be both associative and commutative. Fortunately, both conjunction
and disjunction are associative and commutative, and these are the
only operators that we are applying this transformation to.

right-associative parse left-associative parse right-associative parse sorted n-ary tree


of x + y + z of x + y + z of y + x + z of x + y + z

+ + +

x + + z y +
+

y z x y x z x y z

Figure 4.3: Four trees that represent


logically equivalent circuits. The sorted
Where to implement this transformation will be described below. n-ary representation makes equivalence
comparisons easier.
68 ece351 lab manual [september 2, 2023]

4.5 Identity Element & Absorbing Element†


For example, with integer addition and
Let I represent the identity element of a binary operation ⊗ on ele- multiplication we have:
ments of set S, and let A represent the absorbing element of that opera- ∀x∈Z| x = x×1
tion on that set. Then the following equations hold: ∀x∈Z| 0 = x×0
∀x∈Z| x = x+0
∀ x ∈ S | x = x⊗I
∀ x ∈ S | A = x⊗A
What are the absorbing and identity elements for conjunction and Sources:
ece351.common.ast.AndExpr
disjunction in boolean logic? ece351.common.ast.OrExpr
ece351.common.ast.NotExpr
ece351.common.ast.NaryExpr
4.6 Simplify Once ....NaryOrExpr.getIdentityElement()
....NaryOrExpr.getAbsorbingElement()
....NaryAndExpr.getIdentityElement()
In §4.1 above we discussed how the simplify() method keeps calling
....NaryAndExpr.getAbsorbingElement()
simplifyOnce() until there are no changes (i.e., it iterates to a fixed Libraries:
point). Now we turn our attention to simplifyOnce(). NaryExpr.filter()
Figure 4.5 lists all of the transformations to be implemented NaryExpr.removeAll()
NaryExpr.contains()
within the simplifyOnce() methods. By far the most interesting case NaryExpr.getThatClass()
is NaryExpr.simplifyOnce(), which is broken down into a number of Tests:
sub-cases, as shown in Figures 4.4 and 4.5. TestSimplifierEquivalence
TestSimplifier2

Figure 4.4: The steps of Nary-


Expr.simplifyOnce().
@Override You are not obliged to use exactly
these steps.
protected final Expr simplifyOnce() { These methods have been stubbed
assert repOk(); with return this, so that the code will run
— it just won’t do any transformations
final Expr result = until you implement them.
simplifyChildren(). All of these methods transform
mergeGrandchildren(). an NaryExpr to a new NaryExpr —
except singletonify(), which might
foldIdentityElements(). return a different kind of Expr. Why?
foldAbsorbingElements(). Most of these transformations reduce
the number of nodes in the ast. That
foldComplements(). might result in an NaryExpr with
removeDuplicates(). just one child, which doesn’t make
simpleAbsorption(). sense. So singletonify() will replace
this malformed NaryExpr with its lone
subsetAbsorption(). child. Implement singletonify() early.
singletonify();
assert result.repOk();
return result;
[lab 4] circuit optimization: F simplifier 69

Level Description Code & Transformation Notes


0 Convert binary to n-ary OrExpr.simplifyOnce()
AndExpr.simplifyOnce()
NaryExpr.mergeGrandchildren()
B+ N+

B+ Y N+ Y
N+

X Z X Z X Y Z
7→ 7→
1 Fold Identity Elements NaryExpr.foldIdentityElements()

Implement NaryExpr.singletonify()
x·1 7→ x
before implementing folds. Why?
x+0 7→ x

2 Fold Absorbing Elements NaryExpr.foldAbsorbingElements()


Write more concise code by using
x·0 7→ 0 getIdentityElement() and
x+1 7→ 1 getAbsorbingElement()

3 Fold Negation NotExpr.simplifyOnce()

!0 7→ 1
!1 7→ 0
!!x 7→ x

4 Fold Complements NaryExpr.foldComplements()

x ·!x 7→ 0
x +!x 7→ 1

5 Deduplication NaryExpr.removeDuplicates()

x + x 7→ x
x · x 7→ x

6 Simple Absorption NaryExpr.simpleAbsorption()

x + ( x · y) 7→ x Is x a VarExpr or an NaryExpr?
The latter case is harder.
x · ( x + y) 7→ x

7 Subset Absorption NaryExpr.subsetAbsorption()

( a + b) + (( a + b) · y) 7→ a + b
Figure 4.5: Simplifications for F pro-
grams
70 ece351 lab manual [september 2, 2023]

4.7 Object sharing in the simplifier

Figure 4.6 was drawn by some students to help you visualize which
objects are shared between the original ast and the simplified ast.
The general pattern is that the leaves of the tree will be re-used
(shared), whereas the interior nodes will be replaced.

F AST Before & After for Simplification

Before
<=

x or

a 1

Objects in Memory

x <= <= a 1

After

<=

Legend

AST Path

Reference
x 1

Figure 4.6: Object sharing in the F


simplifier. The original ast and the
simplified ast will share some nodes.
This is permissible because all of our
ast nodes are immutable.
[lab 4] circuit optimization: F simplifier 71

4.8 Class Representation Invariants†


http://en.wikipedia.org/wiki/Class_invariant
The invariants of a class are conditions that are expected to hold for
all legal objects of that class. For example, a legal FProgram will con-
tain at least one formula and will have exactly one formula for each
output pin (i.e., no output pin will be computed by two different for-
mulas). A legal NaryExpr will have more than one child, its children
must not be null, and its children must be sorted. By convention, the class representation
For mutable objects, invariants are usually expected to hold at invariants are checked in a method
called repOk().
all public method boundaries (before the method starts and fin-
ishes). All of our ast objects are immutable. We take the position
that it is possible to construct illegal ast objects (where the invari-
ants do not hold) as temporary values. Consider the code for Nary-
Expr.simplifyOnce() in Figure 4.4: each of the helper methods (except
singlotonify) is allowed to construct and return an illegal NaryExpr,
but at the end we expect to have a legal Expr object.

Figure 4.7: NaryExpr.repOk() class


public boolean repOk() { representation invariants for NaryExpr;
// programming sanity i.e., the rules that define what is a
well-formed NaryExpr object.
assert this.children != null;
// should not have a single child: indicates a bug in simplification
assert this.children.size() > 1 : "should have more than one child, probably a bug in simplification";
// check that children is sorted
int i = 0;
for (int j = 1; j < this.children.size(); i++, j++) {
final Expr x = this.children.get(i);
assert x != null : "null children not allowed in NaryExpr";
final Expr y = this.children.get(j);
assert y != null : "null children not allowed in NaryExpr";
assert x.compareTo(y) <= 0 : "NaryExpr.children must be sorted";
}
// Note: children might contain duplicates −−− not checking for that
// ... maybe should check for duplicate children ...
// no problems found
72 ece351 lab manual [september 2, 2023]

4.9 Logical equivalence for boolean formulas†


The correctness of your simplifier is
F is a language of boolean formulas. Checking equivalence of determined by checking whether the
boolean formulas is an NP-complete problem. output is logically equivalent to, yet
structurally smaller (or no larger) than
To see why checking equivalence of boolean formulas is NP- the input.
complete consider the example of comparing ( x + y) + z and x + Is the termination condition in
(y + z) and !(!x ·!y·!z) by constructing their truth tables, where f Expr.simplify() (Figure 4.1) written
in terms of logical equivalence? Why?
names the output:

(x + y) +z = f x+ (y + z) = f !(!x · !y · !z) = f
0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1 0 1 0 1
0 1 1 1 0 1 1 1 0 1 1 1
1 0 0 1 1 0 0 1 1 0 0 1
1 0 1 1 1 0 1 1 1 0 1 1
1 1 0 1 1 1 0 1 1 1 0 1
1 1 1 1 1 1 1 1 1 1 1 1
Figure 4.8: Equivalent truth tables
We can see from examining the three truth tables that these three
formulas are equivalent. Great. But the number of rows in each truth
table is an exponential function of the number of variables in the
formula. For example, these formulas have three variables and eight
rows in their truth tables: 23 = 8.
What to do? This is a hard problem. As discussed in the Course
Notes §0 we have three main options: implement an NP-complete al-
gorithm ourselves; use a polynomial time approximation; or convert
the problem to sat and ask a sat-solver for the answer. If you look Our translation to sat occurs via an
at the skeleton code in FProgram.equivalent() you will see that it uses intermediate language named Alloy
(http://alloy.mit.edu), which is in
this last approach, and this is the approach that we generally recom- turn translated to sat. We found it
mend you follow in your future career (unless there are well known convenient to work this way, but it
would also be easy to translate this
and accepted polynomial time approximations for the problem you problem to sat directly.
are trying to solve). The sat solver that Alloy is config-
For this particular problem of computing the equivalence of ured to use in this case is sat4j, which
is open-source, written in Java, and
boolean formulas there is a fourth option: translate the formulas used by Eclipse to compute plugin
into a standardized form, and then compare that standardized form. dependencies.
This is the approach that we took above when we converted the bi-
nary expressions to sorted n-ary expressions. In the case of boolean
formulas, the most common standardized form is reduced, ordered
binary decision diagrams (or bdd for short).2 A bdd is essentially a 2
R. E. Bryant. Graph-based algorithms
compressed form of a truth table. for boolean function manipulation.
IEEE Transactions on Computers, C-
Two equivalent boolean formulas will have the exact same bdd 35(8):677–691, Aug. 1986
representation. Constructing a bdd for a boolean formula is, in the There will be no exam questions on
worst case, an exponential problem, but in practice usually runs bdds. This material is just here for
your interest. Boolean formulas are of
fairly quickly. bdds are commonly used for hardware verification fundamental and intrinsic interest both
and other tasks that require working with boolean formulas. in theory and in practice.
[lab 4] circuit optimization: F simplifier 73

4.10 BinaryExpr.examine() is a Higher-Order Function†


Higher-order functions were briefly
A higher-order function is a function that takes another function as introduced in [N 0.6]
an argument. BinaryExpr.examine() is an example: the first argu-
ment, the Examiner, is essentially another function. The Examiner
object has no state: it is just used to call the examine() method.

Figure 4.9: BinaryExpr.examine() is a


private boolean examine(final Examiner e, final Object obj) { higher-order function
// basics
if (obj == null) return false;
if (!this.getClass().equals(obj.getClass())) return false;
final BinaryExpr be = (BinaryExpr) obj; Exercise 4.10.1 How many Exam-
iner objects are there in this codebase?
What are their names?
// compare field values
if (!e.examine(left, be.left)) return false; Exercise 4.10.2 Is it possible to
create any more Examiner objects?
if (!e.examine(right, be.right)) return false;

// no differences
return true;
}

4.11 Choice of Data Structures for NaryExpr.children†

NaryExpr uses an ImmutableList to store its children. Some points of


discussion:

• ImmutableList is used widely in this code, so it is familiar.

• We must use linear search with ImmutableList. Perhaps some NaryExpr.contains()


other data structure would offer better asymptotic complexity. For
example, a hash structure would give us constant time lookup.
But, we are engineers: we care about constant factors. Linear
search in a small list can be faster, in practice, than looking up
in a hash structure — depending on what the constant factors are.
• A list is the most compact storage for this purpose. The children of
NaryExpr need to be ordered. So a structure like HashMap would
not be adequate: LinkedHashMap would be necessary. Linked-
HashMap maintains ordering via an internal list, in addition to the
space allocated for the hash structure.
• ImmutableList.remove() will throw exceptions because ImmutableList
is immutable.
74 ece351 lab manual [september 2, 2023]

4.12 Evaluation
In the unlikely, but not impossible,
The last pushed commit before the deadline is evaluated both on event that you implement a better
the shared test inputs and on a set of secret test inputs according to simplifier than the staff has then your
simplifier might produce spurious
the weights in the table below. The testing equations are as follows. failures while evaluating the correctness
First, the simplifier should always produce an equivalent F program, equation. Talk to us to rectify the
situation and get some bonus marks.
for all inputs:
originalAST.simplify().equivalent(originalAST)
Second, the simplifier should be idempotent:
originalAST.simplify().equals(originalAST.simplify().simplify())
Finally, for a select set of F programs (opt*.f), the simplifier should
perform some specific transformations:
originalAST.simplify().isomorphic(staffSimplifiedAST) Note that you do not earn any marks
for this lab until TestObjectContractF
from lab3 passes. As you can see, the
Shared Secret testing equations for this lab depend
TestSimplifierEquivalence 30 10 on having the object contract correctly
TestSimplifier2 40 20 implemented for the F ast classes. If
your object contract implementation is
buggy then we cannot trust the results
of the testing equations.
Compiler Concepts: parser generators,
Parsing Expression Grammars (peg),
push-down automata
Programming Concepts: domain
specific languages (dsl): internal vs.
external, debugging generated code,
Lab 5 stacks

Parsing W with Parboiled

Files:
In this lab you will write a new parser for W . In lab1 you wrote ece351.w.parboiled.WParboiledRecognizer
a parser for W by hand. In this lab you will write a parser for W ece351.w.parboiled.WParboiledParser

using a parser generator tool named Parboiled. A parser generator is Tests:


ece351.w.parboiled.TestWParboiledRecognizerAccept
a tool that takes a description of a grammar and generates code that ece351.w.parboiled.TestWParboiledRecognizerReject
recognizes whether input strings conform to that grammar. Many ece351.w.parboiled.TestWParboiledParserAccept
ece351.w.parboiled.TestWParboiledParserReject
developers in practice choose to use a parser generator rather than
Tool Documentation:
write parsers by hand. http://parboiled.org
There are many different parser generator tools, and there are a
number of dimensions in which they differ, the two most important
of which are the theory they are based on and whether they require dsl = Domain Specific Language. This
the programmer to work in an internal or external dsl. The theory terminology is in contrast to a general
purpose programming language, such as
behind Parboiled is called Parsing Expression Grammars (peg). Other Java/C/etc.
common theories are ll (e.g., Antlr) and lalr (e.g., JavaCup).
A dsl is often used in combination with a general purpose pro-
gramming language, which is sometimes called the host language.
For example, you might write an sql query in a string inside a Java
program. In this example Java is the host language and sql is an
external dsl for database queries. Whether the dsl is internal or ex- Exercise 5.0.1 Name a few more
common dsls
ternal is determined by whether it shares the grammar of the host
language, not by the physical location of snippets written in the dsl.
In this example the sql snippet is written inline in the Java program,
but sql has a different grammar than Java, and so is considered to be
an external dsl.
An internal dsl uses the grammar of the host language. A com-
mon way to implement an internal dsl is as a library written in the
host language. Not all libraries represent dsl’s, but almost all inter-
nal dsl’s are implemented as libraries. The Tiger Book §3.4 discusses two other
Most parser generators require the programmer to work in an parser generator tools, JavaCC and
SableCC. You can look at that section to
external dsl. That is, they require the programmer to learn a new get a sense for what their external dsl’s
language for specifying a grammar. Parboiled, by contrast, provides look like, and why it might be easier to
learn Parboiled first. Parboiled did not
an internal dsl. I think that it is easier to learn an internal dsl, and exist when the Tiger book was written.
this is why we have chosen to use Parboiled.
76 ece351 lab manual [september 2, 2023]

The purpose of this lab is for you to learn how to use Parboiled. Learning progression: repetition with
This lab is specifically structured to facilitate this learning. You are increasing complexity. In a traditional
compilers project, without learning
already familiar with the input language W and the host language progression, you would have to learn
Java. W is an even simpler language than is used to teach Parboiled everything simultaneously in one
big lab: the source language (W ), the
on its website. The only new thing in this lab is Parboiled. host language (Java), and the parser
Once you learn to use one parser generator tool it is not too hard generator’s external dsl. Moreover, the
to learn another one. Learning your first one is the most challenging. source language would be more like
our toy subset of vhdl, which we won’t
The other reason why we are using Parboiled is that it is like a see until after midterm week, rather
pushdown automata because its main storage mechanism is a stack, than a simple regular language like W .
Even the pedagogical example on the
and so it reinforces that theoretical concept. Parboiled website is as complicated as
F (a simple context-free language).

5.1 Introducing Parboiled

To write a recognizer or a parser with Parboiled we extend the


BaseParser class, which either defines or inherits almost all of the
methods that we will use from Parboiled. For our labs we will actu-
ally extend BaseParser351, which in turn extends BaseParser and adds
some additional utility methods.
We can divide the methods available in our recognizer/parser into
a number of groups:

Rule Constructors. These methods are used to describe both the Recall that in previous labs there was
grammar and the lexical specification of the language that we wish a separate Lexer class that encoded
the lexical specification for W (and
to recognize or parse, and we can subdivide this group this way: F ). Some parser generator tools have
EBNF Parboiled separate dsl’s for the lexical and
syntactic specifications of the input
* ZeroOrMore() language. In Parboiled, by contrast, we
+ OneOrMore() specify the tokenization as part of the
? Optional() grammar.
ebnf = Extended Backus Naur Form.
| FirstOf() similar but importantly different This is the name of the notation used to
Sequence() no explicit character in EBNF specify the grammars for W (Figure 1.2)
and F (Figure 3.1).
Regex Parboiled
[ab] AnyOf("ab") Regular expressions are often used for
[^ab] NoneOf("ab") lexical specifications.
a Ch(’a’) or just ’a’
For this lab we will specify whitespace
[a-z] CharRange(’a’, ’z’) explicitly. In the next lab we will
IgnoreCase() no regex equivalent learn how to specify the whitespace
implicitly, which makes the recognizer
EOI special char for end of input rules look a bit less cluttered.
W0() optional whitespace (zero or more)
W1() mandatory whitespace (at least one)
Access to input. A recognizer doesn’t need to store any of the input
that it has already examined. A parser, however, often saves sub- The match() method is in Parboiled’s
strings of the input into an ast. The match() method returns the BaseActions class, which is the super-
class of BaseParser.
substring that matched the most recently triggered rule. Exercise 5.1.1 Draw a uml class
diagram for WParboiledRecognizer and
WParboiledParser.
[lab 5] parsing W with parboiled 77

Stack manipulation. A parser builds up an ast. Where do the frag-


ments of this ast get stored while the parser is processing the
input? When we wrote the parsers for W and F by hand in the
previous labs we used a combination of fields, local variables,
and method return values to store ast fragments. None of these
storage mechanisms are available to us within Parboiled’s dsl.
Parboiled gives us one and only one place to store ast fragments: Hewlett-Packard calculators also
famously have a value stack for inter-
the Parboiled value stack. In this sense, working with Parboiled
mediate results.
is very much like programming a pushdown automata. We can
manipulate this stack with the standard operations, including: These stack operations are in Par-
push(), pop(), peek(), swap(), and dup(). When the parser has pro- boiled’s BaseActions class, which is the
superclass of BaseParser.
cessed all of the input we expect to find the completed ast on the
top of the stack.
Grammar of the input language. We will add a method to our recog- We might also add some methods for
nizer/parser for each of the non-terminals in the grammar of the parts of the lexical specification of the
input language.
input language. These methods will have return type Rule and
will comprise just one statement: a return of the result of some
Parboiled rule constructor. See rule constructors above.

5.2 Write a recognizer for W using Parboiled

Suppose we want to process files that contain a list of names, such as


‘Larry Curly Moe ’. A recognizer for such a language might include a
snippet as in Figure 5.1.

Figure 5.1: Snippet of a recognizer


public Rule List() { return ZeroOrMore(Sequence(Name(), W0())); } written in Parboiled’s dsl.
public Rule Name() { return OneOrMore(Letter()); }
public Rule Letter() { return FirstOf(CharRange(’A’, ’Z’), CharRange(’a’, ’z’)); }
Exercise 5.2.1 Write the ebnf
grammar that corresponds to this
Parboiled code.

5.3 Add actions to recognizer to make a parser

Let’s add actions to our recognizer from Figure 5.1 to make a parser.
Figure 5.2 lists code for the actions. The general idea is that we wrap
every recognizer rule in a Sequence constructor, and then add new
clauses to manipulate the stack: i.e., a clause with one of the push,
pop, peek, swap, dup, etc., commands.
Figure 5.5 augments the listing of Figure 5.2 with some debugging debugmsg() and checkType() are
clauses using the debugmsg() and checkType() methods. If you want defined in BaseParser351, which is a
superclass of our recognizer/parser
to inspect memory while your recognizer or parser is executing then classes and a subclass of Parboiled’s
use one of the debugmsg() or checkType() methods and set a break- BaseParser.
Rule constructors are executed once in
point inside that method. Setting a breakpoint inside a rule constructor a grammar analysis phase in order to
will not do what you want. generate code that will actually process
input strings.
78 ece351 lab manual [september 2, 2023]

public Rule List() {


return Sequence( // wrap everything in a Sequence
push(ImmutableList.of()), // push empty list on to stack
ZeroOrMore(Sequence(Name(), W0())) // rule from recognizer
);
}
public Rule Name() {
return Sequence( // wrap everything in a Sequence
OneOrMore(Letter()), // consume a Name, which can be retrieved later by match()
push(match()), // push the Name on top of the stack
swap(), // swap the Name and the List on the stack
push( ((ImmutableList)pop()).append(pop()) ) // make a new list by appending the new Name to the old List
);
}
public Rule Letter() { return FirstOf(CharRange(’A’, ’Z’), CharRange(’a’, ’z’)); } // rule from recognizer

Figure 5.2: Snippet of a parser written


When the parser reaches the end of the input then we expect to in Parboiled’s dsl, corresponding to
the recognizer snippet in Figure 5.1.
find a list object on the top of the stack, and we expect that list object See this snippet extended with some
to contain all of the names given in the input. For example, for the debugging commands in Figure 5.5.
input string ‘Ren Stimpy ’ we would expect the result of the parse
to be the list [‘Ren’, ‘Stimpy’]. Similarly, for the input string ‘Larry
Curly Moe ’ we would expect the result of the parse to be the list
[‘Larry’, ‘Curly’, ‘Moe’]. The result of a parse is the object on the top
of the stack when the parser reaches the end of the input. Figure 5.3 http://www.youtube.com/results?
search_query=ren+stimpy
illustrates the state of the stack as this example parser processes the
input ‘Ren Stimpy ’.

Ren [] Stimpy [Ren] Figure 5.3: Stack of Parboiled parser


from Figure 5.2 while processing input
[] [] Ren [Ren] [Ren] Stimpy [Ren, Stimpy] ‘Ren Stimpy ’. Time proceeds left to
right.

5.3.1 An alternative Name() rule Exercise 5.3.1 Which state of the


stack corresponds to which line of the
Figure 5.4 shows an alternative formulation of the Name() rule from parser source code?
Figure 5.2.

Figure 5.4: An alternative definition of


public Rule Name() { the Name() rule in Figure 5.2
return Sequence(
OneOrMore(Letter()),
push( ((ImmutableList)pop()).append(match()) )
);
} Exercise 5.3.2 Draw out the stack
that this alternative formulation creates
(i.e., the analogue of Figure 5.3).
[lab 5] parsing W with parboiled 79

public Rule List() {


return Sequence( // wrap everything in a Sequence
push(ImmutableList.of()), // push empty list on to stack
ZeroOrMore(Sequence(Name(), W1())) // rule from recognizer
checkType(peek(), List.class) // expect a list on the top of the stack
);
}
public Rule Name() {
return Sequence( // wrap everything in a Sequence
OneOrMore(Letter()), // rule from recognizer
push(match()) // push a single name, so stack is: [List, String]
debugmsg(peek()), // print the matched name to System.err
checkType(peek(), String.class) // match always returns a String
swap(), // swap top two elements on stack: [String, List]
checkType(peek(), List.class) // confirm that the list is on top
push( ((ImmutableList)pop()).append(pop()) ) // construct and push a new list that contains
// all of the names on the old list plus this new name
);
}
public Rule Letter() { return FirstOf(CharRange(’A’, ’Z’), CharRange(’a’, ’z’)); } // rule from recognizer

Figure 5.5: Snippet of a parser written


in Parboiled’s dsl (Figure 5.2). Ex-
tended with debugging commands.
5.3.2 The stack for our W parser will have two objects Corresponding to the recognizer snip-
pet in Figure 5.1.
The stack for our W programs will have two objects once it reaches
steady state: a WProgram object and a Waveform object. Periodically Exercise 5.3.3 Draw out a picture
the Waveform object will be appended with the WProgram object and of what you expect the stack to look
like at each point in time of the parser
a new Waveform object will be instantiated. The WProgram object execution before you start programming it.
will usually be closer to the bottom of the stack and the Waveform See Figure 5.3 as an example.
object will usually be closer to the top.
Exercise 5.3.4 What is the maximum
size that the stack will grow to?

5.4 Parsing Expression Grammars (pegs)†

Parboiled works with Parsing Expression Grammars (pegs). pegs are Exercise 5.4.1 What language
slightly different than Context Free Grammars (cfgs). The important feature can cfgs represent that pegs
cannot?
difference is that pegs cannot represent ambiguity, whereas cfgs
can. We never want ambiguity in programming languages, so this
‘feature’ of cfgs is not much of a feature in our context. http://en.wikipedia.org/wiki/
Parsing_expression_grammar
cfgs were originally developed by linguists to model natural lan-
http://en.wikipedia.org/wiki/
guages, which often contain grammatical ambiguities. Only after Context-free_grammar
linguists had successfully applied the idea of cfgs in their natural
language context did computer scientists try using cfgs for program-
ming languages.
80 ece351 lab manual [september 2, 2023]

Where does ambiguity arise in a cfg? From the choice (alterna-


tion) operator: ‘|’ (the bar). An ambiguity is when two (or more)
different alternatives might apply. pegs remove this possibility for cfgs and pegs recognize almost the
same set of grammars, but there are a
ambiguity by treating alternatives in priority order: the first alterna- few that each can do that the other can-
tive to match is taken as correct; if a subsequent alternative would not. For example, a cfg can recognize
the following, while a peg cannot:
also have matched it is just ignored. So Parboiled does not have a
S → 0 x0 S 0 x0 | 0 x0
bar operator, it has the FirstOf method instead, which says ‘use the Similarly, there are some grammars that
alternative in this list that matches first.’ pegs can recognize that cfgs cannot,
such as counting three things:
{ a n b n c n | n ≥ 0}
5.5 Expressions, Statements, and Side Effects†

Expressions are evaluated to a value. From an idealistic perspective,


expressions are mathematical: they comprise variables and operators;
they do not change the state of the machine.

Statements are executed to do something (i.e., a side-effect). State-


ments end with a semi-colon (;) in C-like languages. They change
the state of the machine (their side-effect). For example, an assign-
ment statement changes the value bound to a variable; a goto statement
changes the program counter (the statement which is currently being
executed). The Wikipedia definition of statement:
https://en.wikipedia.org/wiki/
In computer programming, a statement is the smallest standalone Statement_%28computer_science%29
element of an imperative programming language that expresses some
action to be carried out. It is an instruction written in a high-level
language that commands the computer to perform a specified action.
A program written in such a language is formed by a sequence of one
or more statements. A statement may have internal components (e.g.,
expressions).

The line between expressions and statements can be a bit fuzzy


in modern languages (most languages invented after around 1970; C
was one of the first to blurr this line a bit). There are some nice arti-
cles on StackOverflow that discuss this bluriness in C#and Python.
That discussion is beyond the scope of this course, but might be of
interest to you.
C#: http://stackoverflow.com/questions/19132/expression-versus-statement
Python: http://stackoverflow.com/questions/4728073/what-is-the-difference-between-an-expression-and-a-st
[lab 5] parsing W with parboiled 81

5.6 A few debugging tips

You might get a weird error message that Parboiled has difficulty cre-
ating the parser class. If so, see if your rule constructors are throwing
exceptions. For example, the skeleton code ships with stubs like this: You need to get rid of all of these stubs
and replace them with code that does
public Rule Program() { something sensible — or just return null.
// TODO: 1 lines snipped If you leave the exception throwing in,
even in a rule constructor you don’t
throw new ece351.util.Todo351Exception(); call, Parboiled will complain.
}

Also, you should explicitly include EOI in your grammar.

5.7 Evaluation

The last pushed commit before the deadline is evaluated both on the
shared test inputs and on a set of secret test inputs according to the
weights in the table below.

You do not earn any marks for the


Current New
rejection tests until some acceptance
TestWParboiledRecognizerAccept 20 5 tests pass. You do not earn any marks
TestWParboiledRecognizerReject 5 5 for the main parser tests until the basic
parser test passes.
TestWParboiledParserBasic 5 0
TestWParboiledParserAccept 30 15
TestWParboiledParserReject 10 5
The recognizer tests just run the recognizer to see whether it rejects
or accepts. The parser rejection tests work similarly. The parser ac-
ceptance tests check two testing equations. Let w name the input W
file. Let x = ParboiledParse(w). The two testing equations are:
x.isomorphic(ParboiledParse(PrettyPrint(x)))
x.isomorphic(RecursiveDescentParse(w))

5.8 Reading

Parboiled & pegs:


http://parboiled.org
https://github.com/sirthias/parboiled/wiki/Grammar-and-Parser-Debugging
https://github.com/sirthias/parboiled/wiki/Handling-Whitespace
http://www.decodified.com/parboiled/api/java/org/parboiled/BaseParser.html
http://en.wikipedia.org/wiki/Parsing_expression_grammar

JavaCC & SableCC: (two other parser generators) You will not be tested on specifics of
Tiger Book §3.4 JavaCC and SableCC, but you should
know of their existence and that they
Note that both JavaCC and SableCC require the grammar to be use an external dsl for specifying the
defined in an external dsl (i.e., not in Java). grammar.
Compiler Concepts: parser generators,
Parsing Expression Grammars (peg),
push-down automata
Programming Concepts: domain
specific languages (dsl): internal vs.
external, debugging generated code,
Lab 6 stacks

Parsing F with Parboiled


Files:
ece351.f.parboiled.FParboiledRecognizer.java
ece351.f.parboiled.FParboiledParser.java

Tests:
This lab is similar to some previous labs: we will write a recognizer ece351.f.parboiled.TestFParboiledRecognizerAccept
and parser for F , this time using Parboiled. ece351.f.parboiled.TestFParboiledRecognizerReject
ece351.f.parboiled.TestFParboiledParserBasic
ece351.f.parboiled.TestFParboiledParser
ece351.f.parboiled.TestFParserComparison
6.1 Write a recognizer for F using Parboiled
Program → Formula+ $$ Figure 6.1: ll(1) Grammar for F
(reproduced from Figure 3.1)
Fomula → Var ‘<=’ Expr ‘;’
Expr → Term (‘or’ Term)*
Term → Factor (‘and’ Factor)*
Factor → ‘not’ Factor | ‘(’ Expr ‘)’ | Var | Constant
Constant → ‘‘0’’ | ‘‘1’’
Var → id

Inspect token without consuming it. Parboiled has two rule The staff solution code does not use the
Test() method. The FirstOf() method pro-
constructors that allow you to inspect the next token without con-
vides adequate look-ahead functionality
suming it: Test() and TestNot. For example, in a rule for Var one might for the F grammar.
include TestNot(Keyword()) to ensure that a keyword (e.g., ‘and’, ‘or’) is
not considered as a variable name.

Implicit whitespace handling. As we saw in a previous lab, If you are interested in how this trick
Parboiled incorporates both syntactic and lexical specifications. works there is a page devoted to it
on the Parboiled website. This is just
One of the practical consequences of this is that we cannot delegate a trick of the tool and not part of the
whitespace handling to a separate lexer (as we did when writing re- intellectual content of this course, so
you are not expected to understand
cursive descent parsers by hand). The explicit mention of whitespace how it works.
after every literal can make the recognizer/parser code look clut-
tered. There is a standard trick in Parboiled that if we add a single
trailing space to each literal then the match for that literal will in-
clude all trailing whitespace. For example, in the rule constructor for
Constant we would write: FirstOf("0 ", "1 ") (notice the trailing space
after the zero and the the one). If you look at the method Constan-
tExpr.make(s) you will see that it looks only at the first character of
its argument s, and so thereby ignores any trailing whitespace. When
you call match() the result will contain the trailing whitespace.
84 ece351 lab manual [september 2, 2023]

6.2 Add actions to recognizer to make a parser

Following our standard design method, once your recognizer is


working, copy it to start making the parser. As before, you will in-
troduce a new Sequence at the top level of each rule constructor, and
the parser actions will be additional arguments to this Sequence. See Figure 5.3 for an example of a
Your Parboiled F parser should construct BinaryExprs, just as sketch of the state of the stack.

your recursive descent F parser did. We can apply the same set of We will reuse the F ast classes from
transformations, developed in a previous lab, to produce NaryExprs. the previous lab.

Exercise 6.2.1 Draw the stack as it evolves through a simple parse.

Exercise 6.2.2 How large can the stack grow while parsing F ?

6.3 Composite Design Pattern†

The Composite design pattern is commonly used when we need to


construct trees of objects, and we want to work with the leaf nodes
and the interior nodes (‘composite’ nodes) via a common interface
(‘component’). Figure 6.2 illustrates this idea.
The main place we’ve seen the composite design pattern in the
labs is the F ast classes. The classes representing the leaves of an What about classes like BinaryExpr and
ast are ConstantExpr and VarExpr. The classes representing the in- UnaryExpr?

terior (‘composite’) nodes of the ast are AndExpr, OrExpr, NotExpr,


etc.. The shared interface is Expr (‘component’).

Figure 6.2: uml class diagram for


Component Composite Design Pattern.
0..*
child From http://en.wikipedia.org/
+ operation()
wiki/File:Composite_UML_class_
diagram_(fixed).svg, where it is
released in the public domain (i.e., free
for reuse).

Leaf Composite

+ operation() + operation() 1
+ add() parent

+ remove()
+ getChild()

Why didn’t we use the Composite design pattern for the W ast
classes? Because W is a regular language, and so does not have
nested expressions, it will always result in ast’s of a known fixed
height. F , on the other hand, is a context-free language with nested
expressions, so the ast depth will vary from input to input. The
Composite design pattern lets us work with these ast’s of varying
depth in a uniform way.
[lab 6] parsing F with parboiled 85

6.4 Template Method Design Pattern†

A Template Method defines an algorithm in outline, but leaves some http://en.wikipedia.org/wiki/


Template_method_pattern
primitive operations to be defined by the particular datatypes the
http://sourcemaking.com/design_
algorithm will operate on. For example, most sorting algorithms
patterns/template_method
depend on a comparison operation that would be defined differently
http://www.oodesign.com/
for strings or integers. template-method-pattern.html
Typically the template algorithm is implemented in an abstract
class that declares abstract method signatures for the primitive oper-
ations. The concrete subclasses of that abstract class provide defini-
tions of the primitive operations for their respective data values.
The abstract class does not know the name of its subclasses, nor
does it perform any explicit type tests.1 The type tests are implicit in 1
An explicit type tests is performed by
the dynamic dispatch performed when the primitive operations are the instanceof keyword or the getClass()
method.
called.

Figure 6.3: Template Design Pattern


AbstractClass
// ... illustrated by UML Class Diagram.
doSomething(); Image from http://en.wikipedia.
// ... org/wiki/File:Template_Method_
UML.svg. Licensed under GNU Free
PrimitiveOperation1(); PrimitiveOperation1()
Documentation Licence.
// ...
PrimitiveOperation1(); PrimitiveOperation2()
// ...
doAbsolutelyThis(); TemplateMethod()
// ...
doAbsolutelyThis()

doSomething()

ConcreteClass

PrimitiveOperation1()

PrimitiveOperation2()

doSomething()
86 ece351 lab manual [september 2, 2023]

Expr:
Primitive Op Implemented By Template Method
operator() every subclass of Expr toString()
simplifyOnce() NaryExpr, AndExpr, OrExpr, NotExpr simplify()

NaryExpr:
Primitive Op Implemented By Template Method
getAbsorbingElement() NaryAndExpr, NaryOrExpr simplifyOnce() helpers
getIdentityElement() NaryAndExpr, NaryOrExpr simplifyOnce() helpers
getThatClass() NaryAndExpr, NaryOrExpr simplifyOnce() helpers

Why?

• No superclass (e.g., NaryExpr) should know what its subclasses


(e.g., NaryAndExpr, NaryOrExpr) are.
• Should be able to add a new subclass without perturbing parent
(e.g., NaryExpr) and siblings (e.g., NaryAndExpr, NaryOrExpr).
• Explicit type tests (e.g., instanceof, getClass()) usually make for
fragile code — unless comparing against own type for purpose of
some equality-like operation.
• Use dynamic (polymorphic) dispatch to perform type tests in a
modular manner.
• Elegant and general implementation of overall transformations.
• Prevent subclasses from making radical departures from general
algorithm.
• Reduce code duplication.
[lab 6] parsing F with parboiled 87

6.5 Singleton Design Pattern†


http://en.wikipedia.org/wiki/
The singleton design pattern is used when the number of objects Singleton_pattern
to be instantiated for a given class is independent of the program’s
input. If the singleton objects are immutable then the primary moti-
vation is to conserve memory by not creating duplicate objects. If the
singleton objects are mutable then the primary motivation is to create
a set of global variables (which is a controversial decision2 ). 2
W. Wulf and M. Shaw. Global vari-
The main use of the singleton pattern in our labs is the ConstantExpr ables considered harmful. ACM
SIGPLAN Notices, 8(2):80–86, Feb. 1973
class, which is immutable, so the motivation is to conserve memory http://c2.com/cgi/wiki?GlobalVariablesConsideredHarm
by instantiating just one object for True and one for False, regard-
less of how many times ‘1’ and ‘0’ appear in the F programs ma-
nipulated by the compiler. Figure 6.4 shows the creation of the two
instances of class ConstantExpr.

Figure 6.4: The only two instances of


public final class ConstantExpr extends Expr { class ConstantExpr
public final Boolean b;

/** The one true instance. To be shared/aliased wherever necessary. */


public final static ConstantExpr TrueExpr = new ConstantExpr(true);
/** The one false instance. To be shared/aliased wherever necessary. */
public final static ConstantExpr FalseExpr = new ConstantExpr(false);

While Figure 6.4 shows the creation of two objects that have global
names so they can be used by anyone, it does not prevent anyone
from creating more objects of class ConstantExpr. To do that we make
the constructor private, and then provide an alternative method to
return a reference to one of the existing objects, as listed in Figure 6.5.

Figure 6.5: Preventing clients from


/** Private constructor prevents clients from instantiating. */ instantiating class ConstantExpr
private ConstantExpr(final Boolean b) { this.b = b; }

/** To be used by clients instead of the constructor.


* Returns a reference to one of the shared objects. */
public static ConstantExpr make(final Boolean b) {
if (b) { return TrueExpr; } else { return FalseExpr; }
}
88 ece351 lab manual [september 2, 2023]

6.6 Evaluation

We evaluate the code that you have committed and pushed before the
deadline. The test harnesses are run with the F programs we have
released to you and a secret set of staff F programs. The weights for
the test harnesses are as follows:

Current New You do not get any marks for any other
TestFParboiledRecognizerAccept 10 5 parser tests until TestFParboiledParser-
Basic and TestObjectContractF pass
TestFParboiledRecognizerReject 5 0 everything.
TestFParboiledParserBasic 5 0
TestFParboiledParser 40 10
TestFParboiledParserComparison 20 5
TestFParboiledRecognizer just runs the recognizer to see if it
crashes. TestFParboiledParser checks the following equation:
∀ AST | AST.equals(parse(prettyprint(AST)))
TestFParboiledParserComparison checks that your recursive descent
and Parboiled parsers give isomorphic ast’s when given the same
input file.
Compiler Concepts: common subexpres-
sion elimination
Programming Concepts: hash struc-
tures, iteration order, object identity,
non-determinism, Visitor design pat-
tern, tree traversals: preorder, postorder,
Lab 7 inorder

Technology Mapping: F → Graphviz


Files:
ece351.f.techmapper.TechnologyMapper

Libraries:
In this lab we will produce gate diagrams of our F programs. We ece351.f.techmapper.GraphvizToF
will do this by translating F programs into the input language of ece351.f.analysis.ExtractAllExprs
ece351.common.visitor.PostOrderExprVisitor
AT&T GraphViz. Graphviz is an automatic graph layout tool: it reads
Tests:
(one dimensional) textual descriptions of graphs and produces (two ece351.f.techmapper.TestTechnologyMapper
dimensional) pictures of those graphs. Graphviz, which is primarily http://graphviz.org

a visualization tool, attempts to place nodes and edges to minimize Graphviz is a widely used graph
layout tool. Circuit layout is often
edge crossings. Tools that are intended to do digital circuit layout done by specialized tools, but the
optimize the placement of components based on other criteria, but basic idea is the same as Graphviz:
algorithmic placement of interconnected
the fundamental idea is the same: transform a (one dimensional)
components. What differs is the criteria
description of the logic into a (two dimensional) plan that can be used for placement and the ways in
visualized or fabricated. which things are connected. Graphical
layout tools such as Graphviz try to
Figure 7.1 lists a sample Graphviz input file and its rendered out- minimize edge crossings, for example,
put for the simple F program X <= A or B;. The input pins A and B are which is important for people looking
at pictures but may be less relevant for
on the left side of the diagram and the output pin X is on the right
circuit boards. Graphviz also draws
hand side of the diagram. nice Bezier curve lines that are pleasing
to look at, whereas circuit boards are
typically laid out with orthogonal lines.
digraph g { Technology mapping and circuit layout
are studied in ece647.
// header
rankdir=LR;
margin=0.01;
node [shape="plaintext"];
edge [arrowhead="diamond"];
// circuit
or12 [label="or12", image="../../gates/or_noleads.png"];
var0[label="x"];
var1[label="a"];
var2[label="b"];
var1 −> or12 ;
Figure 7.1: Example Graphviz input file
var2 −> or12 ; and rendered output for F program
or12 −> var0 ; X <= A or B;

}
90 ece351 lab manual [september 2, 2023]

The input file in Figure 7.1 contains a header section that will be
common to all of the Graphviz files that we generate. This header
says that the graph should be rendered left to right (instead of the
default top-down), that the whitespace margin around the diagram
should be 0.01 inches, and that the default look for nodes and edges
should be plain.
The main thing to notice in the lines of the input file in Figure 7.1
that describe the formula is that visual nodes are created for both the
pins (A, B, X) and the gates (there is a single or gate in this example).

7.1 Interpreters vs. Compilers†

An programming language interpreter is a special kind of compiler that


executes a program directly, instead of translating the program to
another language that is then executed. Executing a program through
a language interpreter is usually slower than executing the compiled
form of that program. It can take less engineering effort to implement
a language interpreter than a compiler, so language interpreters are
most commonly used when engineering time is more valuable than
machine time. Compilers are used when the converse is true: i.e., it is
worth the engineering effort to make the code run faster.

7.2 Introducing the Interpreter Design Pattern†

The Interpreter design pattern1 names a particular way to structure 1


E. Gamma, R. Helm, R. Johnson, and
code, that we have been using in previous labs but have not yet J. Vlissides. Design Patterns: Elements
of Reusable Object-Oriented Software.
named. Programming language interpreters are often implemented Addison-Wesley, 1995
using the interpreter design pattern, which is how the design pattern
got its name. When learning about both concepts this name collision
can be a bit confusing.
Figure 7.2 shows a uml class diagram for a subset of the F expres-
sion class hierarchy that we’ve been working with recently.
In §4 we wrote a simplifier/optimizer for F programs. How did
we structure that code? We added a method called simplify() to almost
every class in this hierarchy, similar to what is depicted in Figure 7.3.
This is a non-modular change: the simplifier functionality is scattered
across the Expr class hierarchy.
Is it possible to structure this code in another way? Could we add
the simplifier functionality to our compiler in a modular way?
[lab 7] technology mapping: F → graphviz 91

Figure 7.2: A uml class diagram for a


subset of F expression class hierarchy

Expr

BinaryExpr ConstantExpr VarExpr UnaryExpr

AndExpr OrExpr NotExpr

Figure 7.3: A uml class diagram for a


subset of F expression class hierarchy,
showing the simplifier functionality
Expr scattered across the hierarchy by use
simplify() of the Interpreter design pattern.
Classes modified to add the simplifier
functionality are highlighted in colour.

BinaryExpr ConstantExpr VarExpr


UnaryExpr
simplify() simplify() simplify()

NotExpr
AndExpr OrExpr
simplify()

Operation Figure 7.4: Matrix view of Interpreter


Design Pattern. Rows for ast leaf
AST Class simplify pretty-printing conversion to gates
classes. Columns for operations to be
AndExpr AndExpr AndExpr AndExpr performed on the ast. Cell values in-
dicate where the code for that class/op
(BinaryExpr) (BinaryExpr) (BinaryExpr) pair is located. In some cases we place
OrExpr OrExpr OrExpr OrExpr the operation in an abstract superclass,
(BinaryExpr) (BinaryExpr) (BinaryExpr) e.g., BinaryExpr, rather than directly in
the leaf class. This location is indicated
VarExpr VarExpr VarExpr VarExpr with the concrete leaf class name fol-
ConstantExpr ConstantExpr ConstantExpr ConstantExpr lowed by the abstract superclass name
in parenthesis. Notice that the pattern
NotExpr NotExpr NotExpr NotExpr here is that the same value is repeated
NaryAndExpr NaryAndExpr NaryAndExpr NaryAndExpr across each row.
(NaryExpr) (NaryExpr) (NaryExpr)
NaryOrExpr NaryOrExpr NaryOrExpr NaryOrExpr
(NaryExpr) (NaryExpr) (NaryExpr)
92 ece351 lab manual [september 2, 2023]

7.3 Introducing the Visitor Design Pattern†


Tiger Book §4.3
The Visitor design pattern2 is an alternative way to structure code 2
E. Gamma, R. Helm, R. Johnson, and
from the interpreter design pattern. The Visitor pattern is also widely J. Vlissides. Design Patterns: Elements
of Reusable Object-Oriented Software.
used in compilers, and is how we will structure our code in this lab
Addison-Wesley, 1995
and future labs. If we had used the Visitor design pattern for §4 then Also widely documented on the web.
we could have added the simplifier functionality in a more modular
manner, as depicted in Figure 7.5.

Figure 7.5: A uml class diagram for a


Expr Simplifier subset of F expression class hierarchy,
showing the simplifier functionality
made modular by use of the Visitor
design pattern.

BinaryExpr ConstantExpr VarExpr UnaryExpr

AndExpr OrExpr NotExpr

3
Notice that in our F Expr class hierar-
Well, it’s not quite as simple as Figure 7.5 might lead you to be- chy the only classes that can be directly
lieve: there is some infrastructure that we need to add to make instantiated (i.e., are concrete) are the
leaves of the hierarchy. This is some-
this work properly. Figure 7.6 illustrates that we need to add an times known as the abstract superclass
accept(Visitor) method to each concrete class.3 Similarly, we add a rule (i.e., all super-classes should be ab-
stract), and is a good design guideline
visit() for each concrete class to the Simplifier class. These accept(Visitor)
for you to follow.
methods can be reused by any Visitor, whether it is the simplifier or
the technology mapper that we will develop in this lab.
With the use of the Visitor design pattern we can add new opera-
tions (e.g., simplification, technology mapping, etc.) to our complex
data structure (ast) in a modular manner, without having to directly
modify the ast classes.
The Visitor design pattern involves two main methods: visit and
accept. The visit methods are written on the Visitor class and the
accept methods are written on the ast classes. The abstract superclass
for the Visitors that we will write is listed in Figure 7.8.
Notice that our visit methods return an object of type Expr. This fea-
ture allows our Visitors to rewrite the ast as they visit it, which is
necessary for transformations such as the simplifier we wrote previ-
ously. The Visitor we write in this lab does not rewrite the ast, and
so all of its visit methods will simply return the ast node that they
are visiting (i.e., return e;).
[lab 7] technology mapping: F → graphviz 93

Simplifier Figure 7.6: A uml class diagram for a


visit(AndExpr) subset of F expression class hierarchy,
Expr visit(OrExpr) showing the simplifier functionality
accept(Visitor) visit(NotExpr) made modular by use of the Visitor
visit(ConstantExpr) design pattern.
visit(VarExpr)

ConstantExpr VarExpr
BinaryExpr UnaryExpr
accept(Visitor) accept(Visitor)

AndExpr OrExpr NotExpr


accept(Visitor) accept(Visitor) accept(Visitor)

Operation Figure 7.7: Matrix view of the Visi-


tor Design Pattern. Rows for ast leaf
AST Class simplify pretty-printing conversion to gates
classes. Columns for operations to be
AndExpr Simplifier PPrinter TechMapper performed on the ast. Cell values in-
dicate where the code for that class/op
OrExpr Simplifier PPrinter TechMapper pair is located. Notice that the pattern
VarExpr Simplifier PPrinter TechMapper here is that the values are repeated
down each column, rather than across
ConstantExpr Simplifier PPrinter TechMapper
each row as we saw above in Figure 7.4
NotExpr Simplifier PPrinter TechMapper for the Interpreter Design Pattern.
NaryAndExpr Simplifier PPrinter TechMapper In the actual code for our labs we
followed the Interpreter Design Pattern
NaryOrExpr Simplifier PPrinter TechMapper for the simplify and pretty-printing
operations, so the class names Simplifier
and PPrinter are fictitious. We are
developing the TechnologyMapper visitor
in this lab.

Figure 7.8: Abstract super class for


public abstract Expr visitConstant(ConstantExpr e); Visitors for F expression asts. There is
public abstract Expr visitVar(VarExpr e); a visit method for each concrete F ast
class.
public abstract Expr visitNot(NotExpr e);
public abstract Expr visitAnd(AndExpr e);
public abstract Expr visitOr(OrExpr e);
public abstract Expr visitNaryAnd(NaryAndExpr e);
public abstract Expr visitNaryOr(NaryOrExpr e);
94 ece351 lab manual [september 2, 2023]

To the FExpr class we add the signature for the accept method,
which is then implemented in each concrete ast class as a call to a
visit method, as shown in Figure 7.9.

Figure 7.9: Signature and implementa-


public abstract Expr accept(final ExprVisitor exprVisitor); tion of accept method. The signature
belongs in the Expr class and the im-
plementation is put in each concrete
public Expr accept(final ExprVisitor v) { return v.visitAnd(this); } subclass of Expr.

Together the visit and accept methods implement what is known


as double dispatch: i.e., select which method to execute based on the
polymorphic type of two objects. Languages like Java, C++, C#, and
Objective C are all single dispatch languages: the target of a polymor-
phic call is determined just by the type of the receiver object. CLOS
and Dylan are examples of multiple-dispatch languages, where the tar-
get of a polymorphic call is determined by the runtime types of all of
arguments.
One of the main points of variation in how the Visitor pattern is The point of a Visitor is to traverse, or
implemented is where the traversal code goes: in the visit methods? ‘walk over’, the nodes in an ast.

an iterator? somewhere else? All of these options are used in prac-


tice. We have decided to put the traversal code in a set of traverse
methods in the Visitor classes (Figures 7.10 and 7.11).
We can write Visitors that perform a number of useful tasks for
this lab. For example, Figure 7.12 lists a Visitor that builds a set of all
the nodes in an F expression ast.
[lab 7] technology mapping: F → graphviz 95

7.3.1 Is Visitor always better than Interpreter?†


From the discussion above you might think that it is always better
to use the Visitor pattern than the Interpreter pattern. This is not
correct. Which pattern to choose depends on what changes you antic-
ipate happening to the software in the future. Almost all software evolves or dies.
Consider this: we have a complex structure (e.g., an ast), and a set Donald Knuth’s TEX is a notable
counter-example: it does not grow
of complex operations that we would like to apply to that structure new features, it does not have signifi-
(e.g., simplification, technology mapping, etc.). In ece250 you studied cant bugs, it was conceived complete.
Consequently, the version number for
some relatively simple structures, such as lists, and some relatively TEX does not increment, it converges:
simple operations, such as searching or sorting. Those structures as the (minor) bugs are removed the
were simple enough that they could be defined in one or two classes, version number approximates π by
another decimal place. This reflects
and those operations were simple enough that they could be defined convergence to an original vision, with
in one or two methods. Now we have a complex structure that is the acknowledgement that perfection
will never truly be achieved within the
defined across a dozen or more classes, and operations that might finite time available to us here on earth.
similarly be defined across a dozen or more methods.
The choice of how we organize this code (Visitor or Interpreter)
has consequences, and those consequences are in how difficult the
code will be to change in the future.
If we expect the grammar of our input language (e.g., W or F ) to
These references below are for your
be stable, and if we expect to add more transformations (operations)
broader educational enrichment. You
to the ast in future labs, then it is better to use the Visitor pattern, will not be tested on them specifically.
because the Visitor pattern allows us to add new transformations in a You are expected to understand the
trade-offs between the Visitor and
modular fashion. Interpreter design patterns.
If, on the other hand, we expect the grammar of our input lan-
4
P. Wadler. The expression problem,
guage to change and the set of transformations we might want to
Nov. 1998. Email to Java Generics list
perform is small and known a priori, then it is better to use Inter-
preter. Interpreter lets us change the ast class hierarchy in a modu- 5
J. C. Reynolds. User-defined types
lar fashion, which is what we would need to do if the grammar of the and procedural data as complementary
approaches to data abstraction. In
input language were to change significantly.
S. A. Schuman, editor, New Directions
What if we anticipate both kinds of change will happen in the in Algorithmic Languages: IFIP Working
future? Then we have a problem, because there is no widely accepted Group 2.1 on Algol. INRIA, 1975

programming language that lets us structure our code to facilitate


6
M. Torgersen. The Expression Prob-
lem Revisited: Four new solutions
both kinds of change. This is known as the Expression Problem in using generics. In M. Odersky, ed-
programming language design. The Expression Problem was named itor, Proc.18th ECOOP, volume 3344
of LNCS, Oslo, Norway, June 2004.
by Wadler in the late nineties4 , although the idea goes back at least
Springer-Verlag
as far as Reynolds work in the mid seventies5 . More recent research
papers have proposed some solutions in Java6 or Scala7 , but none of 7
M. Zenger and M. Odersky. Inde-
these proposals is widely accepted and all have drawbacks. pendently extensible solutions to the
expression problem. In Proc.12th Work-
shop on Foundations of Object-Oriented
Languages, 2005
96 ece351 lab manual [september 2, 2023]

Figure 7.10: The traverse methods of


public final Expr traverseExpr(final Expr e) { ExprVisitor
if (e instanceof NaryExpr) {
return traverseNaryExpr( (NaryExpr) e );
} else if (e instanceof BinaryExpr) {
return traverseBinaryExpr( (BinaryExpr) e );
} else if (e instanceof UnaryExpr) {
return traverseUnaryExpr( (UnaryExpr) e );
} else {
return e.accept(this);
}
}

public abstract Expr traverseNaryExpr(final NaryExpr e);


public abstract Expr traverseBinaryExpr(final BinaryExpr e);
public abstract Expr traverseUnaryExpr(final UnaryExpr e);

/**
* Visit/rewrite all of the exprs in this FProgram.
* @param p input FProgram
* @return a new FProgram with changes applied
*/
public FProgram traverseFProgram(final FProgram p) {
FProgram result = new FProgram();
for (final AssignmentStatement astmt : p.formulas) {
result = result.append(traverseAssignmentStatement(astmt));
}
return result;
}

/**
* Visit/rewrite the expr in this AssignmentStatement
* @param astmt the AssignmentStatement to be visited/rewritten
* @return a new AssignmentStatement with changes applied
*/
public AssignmentStatement traverseAssignmentStatement(final AssignmentStatement astmt) {
final Expr e = traverseExpr(astmt.expr);
if (e == astmt.expr) {
// no change
return astmt;
} else {
// rewrite occured
return astmt.varyExpr(e);
}
}
}
[lab 7] technology mapping: F → graphviz 97

Figure 7.11: Implementation of Post-


/** OrderExprVisitor. Any Visitors that
extend this class will visit the nodes of
* This visitor rewrites the AST from the bottom up. an F expression ast in post-order (i.e.,
* Optimized to only create new parent nodes if children have changed. parents after children).
*/
public abstract class PostOrderExprVisitor extends ExprVisitor {

@Override
public final Expr traverseUnaryExpr(UnaryExpr u) {
// child first
final Expr child = traverseExpr(u.expr);
// only rewrite if something has changed
if (child != u.expr) {
u = u.newUnaryExpr(child);
}
// now parent
return u.accept(this);
}

@Override
public final Expr traverseBinaryExpr(BinaryExpr b) {
// children first
final Expr left = traverseExpr(b.left);
final Expr right = traverseExpr(b.right);
// only rewrite if something has changed
if (left != b.left || right != b.right) {
b = b.newBinaryExpr(left, right);
}
// now parent
return b.accept(this);
}

@Override
public final Expr traverseNaryExpr(NaryExpr e) {
// children first
ImmutableList<Expr> children = ImmutableList.of();
boolean change = false;
for (final Expr c1 : e.children) {
final Expr c2 = traverseExpr(c1);
children = children.append(c2);
if (c2 != c1) { change = true; }
}
// only rewrite if something changed
if (change) {
e = e.newNaryExpr(children);
}
// now parent
return e.accept(this);
}
}
98 ece351 lab manual [september 2, 2023]

Figure 7.12: Implementation of Extrac-


tAllExprs
/**
* Returns a set of all Expr objects in a given FProgram or AssignmentStatement.
* The result is returned in an IdentityHashSet, which defines object identity
* by memory address. A regular HashSet defines object identity by the equals()
* method. Consider two VarExpr objects, X1 and X2, both naming variable X. If
* we tried to add both of these to a regular HashSet the second add would fail
* because the regular HashSet would say that it already held a VarExpr for X.
* The IdentityHashSet, on the other hand, will hold both X1 and X2.
*/
public final class ExtractAllExprs extends PostOrderExprVisitor {

private final IdentityHashSet<Expr> exprs = new IdentityHashSet<Expr>();

private ExtractAllExprs(final Expr e) { traverseExpr(e); }

public static IdentityHashSet<Expr> allExprs(final AssignmentStatement f) {


final ExtractAllExprs cae = new ExtractAllExprs(f.expr);
return cae.exprs;
}

public static IdentityHashSet<Expr> allExprs(final FProgram p) {


final IdentityHashSet<Expr> allExprs = new IdentityHashSet<Expr>();
for (final AssignmentStatement f : p.formulas) {
allExprs.add(f.outputVar);
allExprs.addAll(ExtractAllExprs.allExprs(f));
}
return allExprs;
}

@Override public Expr visitConstant(ConstantExpr e) { exprs.add(e); return e; }


@Override public Expr visitVar(VarExpr e) { exprs.add(e); return e; }
@Override public Expr visitNot(NotExpr e) { exprs.add(e); return e; }
@Override public Expr visitAnd(AndExpr e) { exprs.add(e); return e; }
@Override public Expr visitOr(OrExpr e) { exprs.add(e); return e; }
@Override public Expr visitXOr(XOrExpr e) { exprs.add(e); return e; }
@Override public Expr visitNAnd(NAndExpr e) { exprs.add(e); return e; }
@Override public Expr visitNOr(NOrExpr e) { exprs.add(e); return e; }
@Override public Expr visitXNOr(XNOrExpr e) { exprs.add(e); return e; }
@Override public Expr visitEqual(EqualExpr e) { exprs.add(e); return e; }
@Override public Expr visitNaryAnd(NaryAndExpr e) { exprs.add(e); return e; }
@Override public Expr visitNaryOr(NaryOrExpr e) { exprs.add(e); return e; }
}
[lab 7] technology mapping: F → graphviz 99

7.4 Hash Structures, Iteration Order, and Object Identity†

When you iterate over the elements in a List you get them in the
order that they were added to the List. When you iterate over the
elements in a TreeSet you get them sorted lexicographically. When TreeSet, List, HashSet, and HashMap
you iterate over the elements in a HashSet or HashMap, what order are all part of the standard JDK Collec-
tions classes.
do you get them in? Unspecified, unknown, and non-deterministic:
the order could change the next time you iterate, and will likely
change the next time the program executes.

List list = new ArrayList(); SortedSet tset = new TreeSet(); Set hset = new HashSet();
list.add(3); tset.add(3); hset.add(3);
list.add(1); tset.add(1); hset.add(1);
list.add(2); tset.add(2); hset.add(2);
System.out.println(list); System.out.println(tset); System.out.println(hset);

Figure 7.13: Iteration order for different


data structures
Why might the iteration order change with hash structures? Be-
cause the slot into which an element gets stored in a hash structure
is a function of that element’s hash value and the size of the hash table.
As more elements are added to a hash structure then it will resize
itself and rehash all of its existing elements and they’ll go into new
slots in the new (larger) table. If the same data value always produces
the same hash value and the table never grows then it is possible to
get deterministic iteration order from a hash structure — although
that order will still be nonsense, it will be deterministically repeatable
nonsense. But these assumptions often do not hold. For example, if a
class does not implement the equals() and hashCode() methods then
its memory address is used as its hashCode(). The next execution of
the program is highly likely to put that same data value at a different
memory address.
Iterating over the elements in a hash structure is one of the most What benefit could there be to non-
common ways of unintentionally introducing non-determinism into determinism? Not much directly. But
non-determinism is often a conse-
a Java program. Non-determinism makes testing and debugging quence of parallel and distributed
difficult because each execution of the program behaves differently. systems. In these circumstances we
sometimes choose to tolerate some
So unless there is some benefit to the non-determinism it should be non-determinism for the sake of
avoided. performance — but we still try to
The JDK Collections classes provide two hash structures with de- control or eliminate some of the non-
determinism using mechanisms like
terministic iteration ordering: LinkedHashSet and LinkedHashMap. locks or database engines.
These structures also maintain an internal linked list that records
the order in which elements were added to the hash structure. You
should usually choose LinkedHashMap, LinkedHashSet, TreeMap,
or TreeSet over HashMap and HashSet. The linked structures give
elements in their insertion order (as a list would), whereas the tree
100 ece351 lab manual [september 2, 2023]

structures give you elements in alphabetical order (assuming that


there is some alphabetical ordering for the elements).
You could safely use HashMap and HashSet without introduc-
ing non-determinism into your program if you never iterate over their
elements. It’s hard to keep that promise though. Once you’ve got a
data structure you might want to print out its values, or pass it in to
some third party code, etc. So it’s better to just use a structure with a
deterministic iteration ordering.
The skeleton code for this lab makes use of two other hash struc- IdentityHashMap is from the JDK.
tures: IdentityHashMap and IdentityHashSet. What distinguishes IdentityHashSet is from the open-
source project named Kodkod. There
these structures from the common structures? are a number of open source projects
The definition of a set is that it does not contain duplicate ele- that implement an IdentityHashSet due
to its omission in the JDK. See Java bug
ments. How is ‘duplicate’ determined? We work with four different report #4479578.
definitions of ‘duplicate’ in these labs:

x == y x and y have the same physical memory address.

x.equals(y) any computation that uses x or y will have no observ-


able differences.

x.isomorphic(y) x and y might have some minor structural differ-


ences, but are essentially the same.

x.equivalent(y) x and y are semantically equivalent and might not


have any structural similarities.

Set hset = new HashSet(); Set lhset = new LinkedHashSet(); Set iset = new IdentityHashSet();
hset.add(new VarExpr("Y")); lhset.add(new VarExpr("Y")); iset.add(new VarExpr("Y"));
hset.add(new VarExpr("X")); lhset.add(new VarExpr("X")); iset.add(new VarExpr("X"));
hset.add(new VarExpr("X")); lhset.add(new VarExpr("X")); iset.add(new VarExpr("X"));
System.out.println(hset); System.out.println(lhset); System.out.println(iset);

Figure 7.14: Object identity for different


data structures
The common structures (HashSet, TreeSet, etc.) use the equals()
method to determine duplicates, whereas the IdentityHashSet and
IdentityHashMap use the memory address (==) to determine du-
plicates. In this lab we want to ensure that our substitution table
contains an entry for every single FExpr object (physical memory ad-
dress), so we use IdentityHashSet and IdentityHashMap. Notice that
ExtractAllExprs also returns an IdentityHashSet.
The skeleton code is careful about its use of IdentityHashSet/Map,
LinkedHashSet/Map, and TreeSet/Map. You should think about
the concept of duplicate and the iteration ordering of each data
structure used in this lab, including your temporary variables. The
GraphvizToF converter requires that the edges are printed according
[lab 7] technology mapping: F → graphviz 101

to a post-order traversal of the ast, so that it can reconstruct the ast


bottom-up.
102 ece351 lab manual [september 2, 2023]

7.5 Non-determinism: Angelic, Demonic, and Arbitrary†

We previously saw non-determinism when studying finite automata. In theory they also consider demonic
In that case we focused on angelic non-determinism: the automata non-determinism, where the machine
always chooses a path that leads to the
would ‘magically’ (non-deterministically) choose the right path that worst possible outcome.
will lead to success (acceptance).
In this lab we’re seeing arbitrary non-determinism: the machine
choose some path arbitrarily, and that path may lead to a good result
or a bad result.
As engineers, we want to avoid non-determinism. It makes testing
unnecessarily difficult. We transform non-deterministic machines
into deterministic ones. We are careful to avoid unnecessary sources
of non-determinism in our programs, such as iterating over hash
structures or unregulated concurrency.

7.6 Translate one formula at a time

Flesh out the skeleton synthesizer in the provided TechnologyMapper


class. The TechnologyMapper extends one of the Visitor classes, and
in its visit methods it will create edges from the child nodes in the
ast to their parents.
The TechnologyMapper class contains a field called substitutions
that maps FExprs to FExprs. For now populate this data structure
by mapping every FExpr ast node to itself. Populating this data
structure in a more sophisticated manner is the next task.

7.7 Common Subexpression Elimination


We do not have to worry about vari-
An F program may contain common subexpressions. A smart syn- ables changing values in F because all
thesizer will synthesize shared hardware for these common subex- expressions in F are referentially trans-
parent: i.e., they always have the same
pressions. Consider the following simple F program where the value no matter where they occur in the
subexpression A or B appears twice: program. Referential transparency is
a common property of functional pro-
gramming languages such as Scheme,
X <= D and (A or B);
Lisp, Haskell, ML, and F . Consider the
Y <= E and (A or B); following program fragment written in
an imperative language:
Figure 7.15 shows the circuit produced by a synthesizer that does a = 1;
b = 2;
not perform common subexpression elimination: there are two or x = a + b; // = 3
gates that both compute A or B. A more sophisticated synthesizer that a = 3;
y = a + b; // = 5
does perform common subexpression elimination would synthesize The subexpression a + b occurs twice in this
a circuit like the one in Figure 7.16 in which there is only a single fragment but cannot be eliminated because
or gate. It is also possible to have common subexpressions within a the value of a has changed. A dataflow
analysis is required to determine if the
single formula, for example: Z <= (A or B) and !(A or B); values of the variables have changed.
[lab 7] technology mapping: F → graphviz 103

7.8 Designing a Common Subexpression Eliminator for F

The key question that the eliminator needs to answer is, for each sub-
expression, whether that sub-expression should be used, or whether
it should be eliminated in favour of another sub-expression.
What measure of ‘sameness’ should be used for this decision?
What concept of identity should we use? There are four options:
physical memory address (==), equals, isomorphic, and equivalent.
Memory address is (hopefully obviously) not a useful choice: if
our input asts are already sharing physical objects for common
subexpressions then the elimination has already been done. Equiva-
lence is, in some sense, the ideal measure. But computing equivalence
for F formulas is an np-complete problem, and we want to stick with
polynomial complexity. That leaves equals and isomorphic. The equals
method is perhaps easier to work with, since it is already used by the
standard Java collections classes. Isomorphic will give us better results,
at the cost of slightly increased design complexity — but no signifi-
cant change in computational complexity. Possible designs include: These alternatives came out in discus-
sion with the class of the summer of
a. Just use equals via the standard Java collections classes. Easy to 2017.

design and implement. Results not as good as they could be if


isomorphic were used.

b. Convert all sub-expressions to a canonical form first, then use


equals as above. We used this kind of approach in lab4: NaryEx-
prs have a canonical form because they sort their children lexico-
graphically. We will never have an NaryOrExpr like B or A, because
it will be sorted to A or B. Once we have all sub-expressions in a
canonical form, then we can use equals as above.

c. Implement a customized data structure based on isomorphic. If


hashCode is consistent with isomorphic then it can be used as a first
step in the search for substitutable sub-expression objects.

d. Write some wrapper code that compares each sub-expression to


every other one using isomorphic, and builds a substitution table
based on physical memory address. This is not necessarily the
most elegant design, but it is one that is described in some detail
below.

There is another design decision about whether the asts get rebuilt
with the substitutions, or whether the substitution table is just used
in the printing process.
104 ece351 lab manual [september 2, 2023]

7.8.1 An n2 design using isomorphic and memory address


This is the solution in the staff code, which is hinted at in the skele-
ton. You are welcome to delete those parts of the skeleton go with
another design. Some alternatives are described above.

a. Build up a table of FExpr substitutions. Suppose our F program


has three occurrences of A or B: (A or B)1 , (A or B)2 , and (A or B)3 .
These three expressions are in the same equivalence class accord-
ing to our isomorphic() method. We select one representative from Why do we use the isomorphic()
this group, say (A or B)2 , and use that to represent all of the iso- method here instead of the equiva-
lent() method? Because the isomor-
morphic expressions. So our substitution table will contain the phic() method runs in polynomial time
tuples h(A or B)1 , (A or B)2 i, h(A or B)2 , (A or B)2 i, h(A or B)3 , (A or B)2 i. whereas the equivalent method runs
in exponential time. Our common
In other words, we’ll use (A or B)2 in place of (A or B)1 and (A or B)3 . subexpression elimination only makes
This substitution table can be built in the following way. First, engineering sense if it costs polynomial
compare every FExpr in the program with every other FExpr with time. If we were to spend exponential
time then we could do more sophis-
the isomorphic() method to discover the equivalence classes. For ticated circuit minimization, such as
each equivalence class pick a representative (doesn’t matter which computing the prime implicants with
the Quine-McCluskey algorithm.
one), and then make an entry in the substitution table mapping
each FExpr in the equivalence class to the representative.
b. Visit the ast and produce edges from children FExprs to parent See the provided utility methods in the
skeleton TechnologyMapper class.
FExprs using the substitution table.

Figure 7.15: Synthesized cir-


cuit without common subexpres-
sion elimination for F program
A X <= D and (A or B); Y <= E and (A or B).
There are two gates synthesized that
X both compute A or B.
D
B

Figure 7.16: Synthesized cir-


D cuit with common subexpres-
X sion elimination for F program
X <= D and (A or B); Y <= E and (A or B).
A There is only one gate that computes
the value A or B, and the output of this
gate is fed to two places.
B
Y
E
[lab 7] technology mapping: F → graphviz 105

7.9 Evaluation

The last pushed commit before the deadline is evaluated both on the
shared test inputs and on a set of secret test inputs according to the
weights in the table below.
The first testing equation for this lab is as follows. Let f be the ast
produced by your F parser.
f.equivalent(GraphVizToF(TechnologyMapper(simplify(f))))
In other words, take the output of your F parser, simplify it, convert
it to dot (this is the part that you are doing, the technology mapper),
then convert that output back to an F program (we provide this code
for you), and compare this result to the original ast produced by the
parser.
The second testing equation for this lab compares your output
with the staff output (dot file). Let s name the staff dot file, and f
names the ast produced by your F parser.
simplify(GraphVizToF(s)).equivalent(GraphVizToF(TechnologyMapper(simplify(f))))
The two equations above just check that you produced an equivalent
circuit, but they do not measure the number of gates used in that cir-
cuit. A third component of the evaluation is comparing the number
of gates in your circuit versus the staff circuit.

TestTechnologyMapper Shared Secret


correctness 50 20
gate count 20 10

The baseline gate count marks are 16/20 and 8/10. If you produce
fewer gates than the staff solution you bump up to 20/20 and 10/10.
If you produce more gates than the staff solution than for each ex-
tra gate we subtract one point from the baseline. Some strategies
that might produce fewer gates but probably take too much time to
implement include:

• Using DeMorgan’s Laws or other algebraic identities.


• Implementing the Quine-McCluskey or espresso algorithms to The Quine-McCluskey algorithm
compute the minimal form of the circuit (a much more advanced is not used in practice because it is
exponential/NP-complete, which
version of our simplifier). means it is too expensive for most real
• Translating our FExprs to Binary Decision Diagrams (bdds).8 bdds circuits. We can afford the price for the
small circuits considered in this course.
are used in practice for functional technology mapping and circuit The espresso algorithm is efficient
equivalence checking. enough to be used in practice but is not
guaranteed to find the global minimum.
http://en.wikipedia.org/wiki/
Moral: Reducing the gate count is an np-complete problem. Our Espresso_heuristic_logic_minimizer
simplifier and common subexpression eliminator are both polyno- http://en.wikipedia.org/wiki/
Quine-McCluskey_algorithm
mial. No combination of polynomial algorithms is ever going to be 8
R. E. Bryant. Graph-based algorithms
a perfect solution to an np-complete problem. The espresso algo- for boolean function manipulation.
rithm is an example of a very good polynomial approximation for IEEE Transactions on Computers, C-
35(8):677–691, Aug. 1986
this np-complete problem.
Compiler Concepts: program generation,
name capture
Programming Concepts:

Lab 8
Simulation: F → Java

Files:
Consider the following F program: X <= A or B;. A simulator for this ece351.f.simgen.SimulatorGenerator
F program would read an input W file describing the values of A Libraries:
and B at each time step, and would write an output W file describ- ece351.f.analysis.DetermineInputVars
Tests:
ing the values of X at each time step. Such a simulator is listed in ece351.f.simgen.TestSimulatorGenerator
Figure 8.1. Your task is to write a program that, given an input F
program, will produce a Java program like the one in Figure 8.1. In
other words, you are going to write a simulator generator. The gener-
ated simulators will perform the following steps:
An alternative to generating a simulator
would be to write an F interpreter.
a. Read the input W program. What we are doing here is a writing
b. Allocate storage for the output W program. an F compiler. We call it a ‘simulator
generator’ because the term ‘compile’ is
c. Loop over the time steps and compute the outputs. not clear in the context of F : in the last
d. Write the output W program. lab we ‘compiled’ F to a circuit; here
we ‘compile’ it to a simulator.
The generated simulator will have a method to compute the value An interpreter evaluates a program
with an input, but does not transform
of each output pin. In our example F program listed above the the program to another language. A
output pin is X, and so the generated simulator in Figure 8.1 has a compiler, by contrast, is almost the
opposite: it translates a program to
method named X. The body of an output pin method is generated another language, but does not evaluate
by performing a pre-order traversal of the corresponding F expres- that program on any input.
sion AST. F expressions are written with operators in-order: that is, It is usually the case that it takes less
effort to write an interpreter, but that
the operators appear between the variables. For example, in the F the target program’s runtime is faster if
program we have the in-order expression A + B, while in the Java it is compiled first.
translation we have the pre-order expression or(A, B).

8.1 Name Collision/Capture†


A name capture problem occurred in
When we generate code in another language we need to be careful some of the utility code. The FPro-
that the variable names we generate based on our input program are gram.equivalent() method checks that
the equivalence of two F programs
legal in the target language, and that they do not collide/capture by translating them to sat. For conve-
an already existing name. All legal W and F variable names are nience, that process first translates the
F program to Alloy, and then the Alloy
also legal Java variable names, so we don’t need to worry about the is translated to sat. An older version
first point here. On the second point, notice that the generated Java of this translation did not take care to
variable names in Figure 8.1 are prefixed with ‘in_’ or ‘out_’, and that avoid name capture, so if the input F
program had variables corresponding
none of the boilerplate names have such prefixes. to Alloy keywords (e.g., ‘this’, ‘int’) then
the Alloy to sat translation would fail.
108 ece351 lab manual [september 2, 2023]

Figure 8.1: Simulator for F program x


import java.util.*; <= a or b;
import ece351.w.ast.*;
import ece351.w.parboiled.*;
import static ece351.util.Boolean351.*;
import ece351.util.CommandLine;
import java.io.File;
import java.io.FileWriter;
import java.io.StringWriter;
import java.io.PrintWriter;
import java.io.IOException;
import ece351.util.Debug;

public final class Simulator_ex11 {


public static void main(final String[] args) {
final String s = File.separator;
// read the input F program
// write the output
// read input WProgram
final CommandLine cmd = new CommandLine(args);
final String input = cmd.readInputSpec();
final WProgram wprogram = WParboiledParser.parse(input);
// construct storage for output
final Map<String,StringBuilder> output = new LinkedHashMap<String,StringBuilder>();
output.put("x", new StringBuilder());
// loop over each time step
final int timeCount = wprogram.timeCount();
for (int time = 0; time < timeCount; time++) {
// values of input variables at this time step
final boolean in_a = wprogram.valueAtTime("a", time);
final boolean in_b = wprogram.valueAtTime("b", time);
// values of output variables at this time step
final String out_x = x(in_a, in_b) ? "1 " : "0 ";
// store outputs
output.get("x").append(out_x);
}
try {
final File f = cmd.getOutputFile();
f.getParentFile().mkdirs();
final PrintWriter pw = new PrintWriter(new FileWriter(f));
// write the input
System.out.println(wprogram.toString());
pw.println(wprogram.toString());
// write the output
System.out.println(f.getAbsolutePath());
for (final Map.Entry<String,StringBuilder> e : output.entrySet()) {
System.out.println(e.getKey() + ":" + e.getValue().toString()+ ";");
pw.write(e.getKey() + ":" + e.getValue().toString()+ ";\n");
}
pw.close();
} catch (final IOException e) {
Debug.barf(e.getMessage());
}
}
// methods to compute values for output pins
public static boolean x(final boolean a, final boolean b) { return or(a, b) ; }
}
[lab 8] simulation: F → java 109

Figure 8.2 lists a visitor that determines all of the input variables
used by an F expression ast. This code is listed here for your ref-
erence. It is also in your repository. You will need to call this code
when writing your simulator generator.

Figure 8.2: Implementation of Deter-


public final class DetermineInputVars extends PostOrderExprVisitor { mineInputVars
private final Set<String> inputVars = new LinkedHashSet<String>();
private DetermineInputVars(final AssignmentStatement f) { traverseExpr(f.expr); }
/** Input variables of an AssignmentStatement, in order of occurrence. */
public static Set<String> inputVars(final AssignmentStatement f) {
final DetermineInputVars div = new DetermineInputVars(f);
return Collections.unmodifiableSet(div.inputVars);
}
/** Input variables of an FProgram, sorted lexicographically. */
public static SortedSet<String> inputVars(final FProgram p) {
final SortedSet<String> vars = new TreeSet<String>();
for (final AssignmentStatement f : p.formulas) {
vars.addAll(DetermineInputVars.inputVars(f));
}
return vars;
}
@Override public Expr visitConstant(final ConstantExpr e) { return e; }
@Override public Expr visitVar(final VarExpr e) { inputVars.add(e.identifier); return e; }
@Override public Expr visitNot(final NotExpr e) { return e; }
@Override public Expr visitAnd(final AndExpr e) { return e; }
@Override public Expr visitOr(final OrExpr e) { return e; }
@Override public Expr visitNaryAnd(final NaryAndExpr e) { return e; }
@Override public Expr visitNaryOr(final NaryOrExpr e) { return e; }
@Override public Expr visitXOr(final XOrExpr e) { return e; }
@Override public Expr visitNAnd(final NAndExpr e) { return e; }
@Override public Expr visitNOr(final NOrExpr e) { return e; }
@Override public Expr visitXNOr(final XNOrExpr e) { return e; }
@Override public Expr visitEqual(final EqualExpr e) { return e; }
}
110 ece351 lab manual [september 2, 2023]

8.2 Testing a code generator†

In lab8 here you are writing a code generator: that is, the code that
you write produces code as its output, as shown in Figure 8.3.

Figure 8.3: Dataflow of lab8

F YourLab8
GenJavaCode W_out
W_in

How can the course staff evaluate your lab8 code?

a. Reading your hand-written source code.


• Hard to mechanize. Doesn’t scale.
• Easy to make mistakes in evaluation.

b. Mechanically comparing your hand-written source code to staff code.


• Syntactic comparison: many legitimate differences.
• Semantic comparison: impossible (in theory) — equivalent to
Halting Problem (because Java is Turing Complete).

c. Comparing your generated code to staff generated code.


• Same problems as above.

d. Comparing the W output produced by your generated code with the W


output from the staff generated code. This is one of the main ways that
compilers are tested in industry: by
• W is regular, so we can compare easily. running the code that they generate,
• Engineer’s Induction problem: how do we know if we did and examining its output.
enough tests? Test Suite Adequacy.

8.3 Evaluation

The last pushed commit before the deadline is evaluated both on


the shared test inputs and on a set of secret test inputs according to
the weights in the table below. Let ‘f’ name the F program we want
to simulate, let ‘w’ name an input W program for it, and let ‘wStaff’
name the output of the staff’s simulator for this F /W combination.
The testing equation is:
gen(f).simulate(w).isomorphic(wStaff)

Current New
TestSimulatorGenerator 70 30
Lab 9
This lab developed by past ece351

Simulation: F → x64 Assembly students Nicholas Champion, Akshay


Joshi, and Jackson Prange.

In §8 you generated Java code from an F program. That Java code


would read and write W files, simulating the F program for each
clock tick. Now we will do a similar thing, but instead of Java code
we will generate x64 (i.e., x86_64) assembly.

9.1 Reading the W Program

In §8, your generated Java code called your W parser to read in the
W input. So your generated Java code could work with different
input W programs.
Calling between Java and native assembler code is possible, but is
annoying and time consuming to set up, so we do not want to do it
in this lab.
Instead, we will hardcode the W program into the x64 assembly
code. This means that the generated assembly will not read any input
when it executes — and also that the generated assembly code will
not have to interact with Java code.

9.2 Register Allocation†

The programmer may use as many variables as they want, but the
machine has only a finite number of registers. Levels of register allo-
cation:

a. Every variable gets a register.

b. Variables that are not live at the same time can share a register.

If there are too many variables live at the same time, then some of
them need to be spilled to main memory.
112 ece351 lab manual [september 2, 2023]

9.2.1 Register Allocation in This Lab


We need to store the values for the input variables in registers. We
will use registers r8–r15 for the input variables. Conceptual levels of
register allocation, in order of increasing sophistication:

a. Every input variable gets its own register. Ok if the total number
of input variables is less than 8.

b. Each AssignmentStatement (formula) in the input F program This is the strategy used in the skeleton
might use only a subset of the input variables. So we could allo- code.

cate the registers differently for each AssignmentStatement. Ok so


long as no individual AssignmentStatement uses more than 8 vari-
ables (now the F program as a whole can use as many variables as
it wants).

9.3 Assembling & Linking


You can set the path environment
The test harness uses gcc for assembling and linking. It assumes that variable in the Eclipse run configuration
you have gcc installed and available in your path. for the test harness to specify where
gcc is, if necessary.
gcc is the Gnu Compiler Collection,
which is the standard compiler on
9.4 Evaluation Gnu/Linux systems, and is one of the
most commonly used C/C++ compilers
The last pushed commit before the deadline is evaluated both on the in the world.
shared test inputs and on a set of secret test inputs according to the
weights in the table below.
Let ‘f’ name the F program we want to simulate, let ‘w’ name an
input W program for it, and let ‘wStaff’ name the output of the staff’s
simulator for this F /W combination. The testing equation is:
genX64(f).simulate(w).isomorphic(wStaff)

Shared Tests Secret Tests


TestSimulatorGenerator_x86_64 90 10

9.4.1 Bonus Marks


As with all labs, bonus marks are available for improving the lab. A
number of such opportunities were discussed in class with respect to
this lab, such as:

• improving register allocation


• improving register usage
• improving tests
• switching to ASMJIT for assembling instead of gcc
• etc.
[lab 9] simulation: F → x64 assembly 113

You might have other creative ideas of how to improve this lab or
other labs. Bonus marks are assessed by the instructor.
114 ece351 lab manual [september 2, 2023]

Figure 9.1: x64 assembly for x <= a;


.text

.globl out_x
.type out_x, @function
out_x:
pushq %rbp // set up a stack frame; rbp is frame pointer
movq %rsp, %rbp // set up a stack frame; rbp is frame pointer; rsp is stack pointer
subq $24, %rsp // making space on stack
pushq %r8 // value of input variable "a" is in r8
popq %rax // return value goes in rax
leave
ret

.globl main
.type main, @function
main:
pushq %rbp
movq %rsp, %rbp
movq $120, %rdi // put char ’x’ in rdi
call putchar // print contents of rdi to console
movq $58, %rdi // put char ’:’ in rdi
call putchar // print contents of rdi to console
movq $32, %rdi // put char ’ ’ in rdi
call putchar // print contents of rdi to console
movq $0, %r8 // load 0 into r8 (value of "a" at first time step)
call out_x // evaluate output pin "x" at first time step
movq %rax, %rdi // move x’s computed value to rdi
add $48, %rdi // convert int to ascii (48 is ASCII value ’0’)
call putchar // print contents of rdi to console
movq $59, %rdi // put char ’;’ in rdi
call putchar // print contents of rdi to console
movq $10, %rdi // put newline char in rdi
call putchar // print contents of rdi to console

popq %rax
leave
ret
Compiler Concepts:
Programming Concepts:

Lab 10
vhdl Recognizer & Parser

Files:
In this lab we will build a recognizer and a parser using Parboiled for ece351.v.VRecognizer.java
a very small and simple subset of vhdl. The grammar for this subset ece351.v.VParser.java

of vhdl is listed in Figure 10.1. Restrictions on our subset of vhdl Tests:


ece351.v.test.TestVRecognizerAccept
include: ece351.v.test.TestVRecognizerReject
ece351.v.test.TestVParser
• only bit (boolean) variables
• no stdlogic
In w2013 the median time to complete
• no nested ifs this lab was about 9 hours. Therefore,
• no aggregate assignment we have made three changes to this lab
for s2013 to hopefully bring this time
• no combinational loops
down to the five hour budget.
• no arrays First, we added append methods
• no generate loops to the vhdl ast classes (as we also
did for F this term). This means that
• no case your parser does not need to place
• no when (switch) ImmutableList objects on the stack
directly, which many students found
• inside process: either multiple assignment statements or multi-
difficult to manage (and certainly the
ple if statements; inside an if there can be multiple assignment syntax for the casting is complex). This
statements also means that you can construct your
ast in a top-down manner, which
• no wait many students seem to find easier. Both
• no timing W and F were changed to this style
this term as well, so this is what you are
• no postponed
already familiar with.
• no malformed syntax (or, no good error messages) Second, we provided implementations
• identifiers are case sensitive of the object contract methods on the
vhdl ast classes for you, instead of
• one architecture per entity, and that single architecture must occur requiring you to fill them in.
immediately after the entity Third, we removed the desugarer and
define before use parts of this lab. So
now it’s just the parser.
116 ece351 lab manual [september 2, 2023]

Program → (DesignUnit)* Figure 10.1: Grammar for vhdl.


This grammar has gone through a
DesignUnit → EntityDecl ArchBody
number of evolutions. First, David Gao
EntityDecl → ‘entity’ Id ‘is’ ‘port’ ‘(’ IdList ‘:’ ‘in’ ‘bit’ ‘;’ and Rui Kong (3b) downloaded a com-
IdList ‘:’ ‘out’ ‘bit’ ‘)’ ‘;’ ‘end’ ( ‘entity’ | Id ) plete vhdl grammar implementation
for the antlr tool. They found that
‘;’ grammar was too complicated to use,
IdList → Id (‘,’ Id)* so they started writing their own with
antlr. Then Alex Duanmu and Luke
ArchBody → ‘architecture’ Id ‘of’ Id ‘is’ (‘signal’ IdList ‘:’
Li (1b) reimplemented their grammar
‘bit’ ‘;’)? ‘begin’ (CompInst)* ( ProcessStmts | in txl and simplified it. Aman Muthrej
SigAssnStmts ) ‘end’ Id ‘;’ (3a) reimplemented it in Parboiled
and simplified it further. Wallace Wu
SigAssnStmts → (SigAssnStmt)+ (ta) refactored Aman’s code. Michael
SigAssnStmt → Id ‘<=’ Expr ‘;’ Thiessen (3a) helped make the expres-
sion sub-grammar more consistent with
ProcessStmts → (ProcessStmt)+
the grammar of F .
ProcessStmt → ‘process’ ‘(’ IdList ‘)’ ‘begin’ ( IfElseStmts | There will no doubt be improve-
SigAssnStmts ) ‘end’ ‘process’ ‘;’ ments and simplifications that you
think of, and we would be happy to
IfElseStmts → (IfElseStmt)+ incorporate them into the lab.
IfElseStmt → ‘if’ Expr ‘then’ SigAssnStmts ‘else’
SigAssnStmts ‘end’ ‘if’ (Id)? ‘;’
CompInst → Id ‘:’ ‘entity’ ‘work.’ Id ‘port’ ‘map’ ‘(’ IdList
‘)’ ‘;’
Expr → XnorTerm (‘xnor’ XnorTerm)*
XnorTerm → XorTerm (‘xor’ XorTerm)*
XorTerm → NorTerm (‘nor’ NorTerm)*
NorTerm → NandTerm (‘nand’ NandTerm)*
NandTerm → OrTerm (‘or’ OrTerm)*
OrTerm → AndTerm (‘and’ AndTerm)*
AndTerm → EqTerm (‘=’ EqTerm)*
EqTerm → ‘not’ EqTerm | ‘(’ Expr ‘)’ | Var | Constant
Constant → ‘0’ | ‘1’
Var → id
Id → Char ( Char | Digit | ‘_’ )*
Char → [A-Za-z]
Digit → [0-9]
[lab 10] vhdl recognizer & parser 117

AST
VDHL File

entity XNOR_test is port(


VProgram
x, y: in bit;
F: out bit
);
end XNOR_test; designUnits

architecture XNOR_test_arch of
XNOR_test is [DesignUnit]
begin
process(x, y)
begin
F <= x xnor y;
end process;
arch
end XNOR_test_arch; DesignUnit

identifier entity

Entity

identifier
input output

XNOR_test [x,y] [F]

entityName

Architecture statements
[processStatement]

architectureName

processStatement

XNOR_test_arch
signals components

sensitivityList sequentialStatements

[x,y]
[AssignmentStatement]

[] []

AssignmentStatement

outputVar
Expr

VarExpr XNOrExpr

left right
identifier

F VarExpr VarExpr

identifier identifier

x y

Figure 10.2: Object diagram for example


vhdl ast
118 ece351 lab manual [september 2, 2023]

10.1 Keywords and Whitespaces

To make the code for the recognizer and parser a little bit more leg-
ible (and easier to implement), classes VRecognizer and VParser
inherit a common base class called VBase,1 which contains a set of 1
VBase itself extends BaseParser351,

rules that will match keywords found in the language in a case- which provides extra utility and de-
bugging methods over Parboiled’s
independent manner (since vhdl is case-insensitive2 – e.g., both BaseParser class.
‘signal’ and ‘SIGNAL’ represent the same keyword in the language). 2
In some vhdl compilers, there may be
some exceptions, where the case sen-
Figure 10.3 is an example of one of the rules found in VBase. Here, sitivity matters a handful of grammar
the rule matches the keyword ‘signal’ and as there is no whitespace productions
handling, your code should ensure that at least one whitespace exists
between this keyword and the next token/string that is matched.

Figure 10.3: Implementation of rule


Rule NOR() { used to match the keyword ‘signal’

10.2 vhdl Recognizer


Sources:
Write a recognizer using Parboiled for the vhdl grammar defined in ece351.v.VRecognizer
Figure 10.1. Follow the methodology that we have been developing Tests:
ece351.v.test.TestVRecognizerAccept
this term:
ece351.v.test.TestVRecognizerReject

• Make a method for every non-terminal [§1.3]


• Convert ebnf multiplicities in the grammar into Parboiled library
calls [§5.1]
• Convert the lexical specification into Parboiled library calls [§5.1]
• Remember to explicitly check for EOI

You should be able to copy and paste the code from your F par-
boiled recognizer with minimal modifications, since F is a subset of
our vhdl grammar.

10.3 vhdl Parser


Sources:
Write a parser using Parboiled for the vhdl grammar defined in ece351.v.VParser
Figure 10.1. You do not need to write a pretty-printer nor the ob- Libraries:
ece351.v.ast.*
ject contract methods: these are provided for you in the new vhdl ece351.common.ast.*
ast classes, and you already wrote them in the shared ast classes Tests:
(ece351.common.ast). You do not need to edit the ast classes. Just ece351.v.test.TestVParser
write the parser. Follow the steps we learned earlier in the term:

• Draw out the evolution of the stack for a simple input [Figure 5.3].
• Print out our vhdl grammar.
• Annotate the printed copy of the grammar with where stack ma-
nipulations occur according to your diagram.
[lab 10] vhdl recognizer & parser 119

• Calculate the maximum and expected size of the stack for different
points in the grammar.
• Copy your recognizer code into the parser class.
• Add some (but not all) actions (push, pop, etc.).
• Add some checkType() and debugmsg() calls [Figure 5.5].
• Run the TestVParser harness to see that it passes.
• Add more actions, checkType() calls, etc..
• Repeat until complete.

When you implement the vhdl parser, there may be a few gram-
mar productions shown in Figure 10.1 where you will need to rewrite
each of these rules in a different way so that you can instantiate all of
the required objects used to construct the AST of the program being
parsed.

10.4 Engineering a Grammar†

For simple languages like W and F we can write a ‘perfect’ recog-


nizer: that is, a recognizer that accepts all grammatical strings and
rejects all ungrammatical strings. For more complicated languages,
such as vhdl, it is common to design a grammar for a particular
task, and that task may be something other than being an ideal rec-
ognizer. For example, the grammar might accept some strings that
should be rejected.
Industrial programming environments such as Eclipse often in-
clude multiple parsers for the same language. Eclipse’s Java de-
velopment tools include three parsers: one that gives feedback on
code while the programmer is editing it (and so the code is proba-
bly ungrammatical); one that gives feedback on the code when the
programmer saves the file (when the code is expected to be gram-
matical); and one that is part of the compiler. As you may surmise
from this example, the task of giving good feedback to a programmer
about ungrammatical code can be quite different from the task of
translating grammatical code.
The TestVRecognizerReject harness tests for some obvious cases
that should be rejected. Do not invest substantial effort into trying to
reject more cases.
120 ece351 lab manual [september 2, 2023]

10.5 Evaluation

The last pushed commit before the deadline is evaluated both on the
shared test inputs and on a set of secret test inputs according to the
weights in the table below.

Current New You do not get any marks for any other
TestVParboiledRecognizerAccept 20 5 parser tests until TestObjectContractF
passes everything. You do not get any
TestVParboiledRecognizerReject 5 0 marks for rejection tests until some
TestVParboiledParser 45 25 acceptance tests pass.

TestVParboiledRecognizer just runs the recognizer to see if it


crashes. TestVParboiledParser checks the following equation:
∀ v : *.vhd | parse(v).isomorphic(parse(prettyprint(parse(v))))
Compiler Concepts: desugaring, function
inlining
Programming Concepts:

Lab 11
vhdl → vhdl: Desugaring & Elaboration

In this lab, you will be writing two vhdl to vhdl transformers us-
ing the Visitor design pattern. The first, and simpler one, will simply
rewrite expressions to replace operators like xor with their equiva-
lent formulation in terms of and, or, and not. We call this desug-
aring, where operators like xor are considered to be syntactic sugar:
things that might be convenient for the user, but do not enlarge the
set of computations that can be described by the language.
The second, and more sophisticated, transformer will expand
and inline any component instance declared within an architecture
of a design unit if the component is defined within the same vhdl
source file. The procedure that this transformer performs is known
as elaboration. Elaboration is essential for us to process the four bit
ripple-carry adder, for example, because it comprises four one-bit full
adders.

11.1 vhdl → vhdl: Desugaring


Sources:
Often times, languages provide redundant syntactic constructs for the ece351.v.DeSugarer
purpose of making code easier to write, read, and maintain for the Tests:
ece351.v.tests.TestDeSugarer
programmer. These redundant constructs provide ‘syntactic sugar’.
In this part of the lab, you will transform vhdl programs that you
parse into valid vhdl source code that ‘desugars’ the expressions
in the signal assignment statements. In other words, the desugaring
process reduces the source code into a smaller set of (i.e., more basic)
constructs.
In vhdl, desugaring may be useful in cases where the types of
available logic gates are limited by the programmable hardware
you might be using to run your vhdl code. For example, if the pro-
grammable hardware only comprises of nand gates, a vhdl synthe-
sizer will be required to rewrite all of the logical expressions in your
code using nand operators.
For this part of the lab, write a ‘desugarer’ that traverses through
122 ece351 lab manual [september 2, 2023]

an ast corresponding to an input vhdl program and rewrites all


expressions so that all expressions in the transformed ast only consist
of and, or, and not logical operators. This means that expressions
containing xor, nand, nor, xnor, and = operators must be rewrit-
ten. For example, the expression x ⊕ y (xor) is equivalent to:

x ⊕ y ≡ x ·!y+!x · y (11.1)
Table 11.1 is the truth table for the ‘=’ operator: By observation, we
can see that this truth table is equivalent to that of xnor.

x y x=y Table 11.1: Truth table for the ‘=’


operator. Equivalent to xnor.
0 0 1
0 1 0
1 0 0
1 1 1

11.2 Elaboration
Sources:
The following sections describe the expected behaviour of the elabo- ece351.v.Elaborator
rator. Tests:
ece351.v.test.TestElaborator

11.2.1 Inlining Components without Signal List in Architecture


Consider the vhdl source shown in Figure 11.1. Here, we have two
design units, OR_gate_2 and four_port_OR_gate_2, where the archi-
tecture body corresponding to four_port_OR_gate_2 instantiates two
instances of OR_gate_2, namely OR1 and OR2 (lines 20 and 21).
When the elaborator processes this program, it will check the de-
sign units sequentially. In this example, OR_gate_2 is checked first.
The architecture body corresponding to OR_gate_2 does not instan-
tiate any components, so the elaborator does not do anything to this
design unit and moves onto the next design unit. In the architecture
body, four_port_structure, we see that there are two components
that are instantiated. Since there are components within this architec-
ture body, the elaborator should then proceed to inline the behaviour
of the components into four_port_structure and make the appro-
priate parameter/signal substitutions.
Consider the component OR1. The elaborator will search through
the list of design units that make up the program and determine the
design unit that the component is instantiating. In this example, OR1
is an instance of the design unit OR_gate_2 (see the string immedi-
ately following “work.” on line 20). Then the elaborator proceeds to
determine how the signals used in the port map relate to the ports
defined in the entity declaration of OR_gate_2. OR1 maps the signals
[lab 11] vhdl → vhdl: desugaring & elaboration 123

Figure 11.1: vhdl program used to


1 entity OR_gate_2 is port ( illustrate elaboration.
2 x , y: in bit;
3 F: out bit
4 );
5 end OR_gate_2;
6
7 architecture OR_gate_arch of OR_gate_2 is begin
8 F <= x or y;
9 end OR_gate_arch;
10
11 entity four_port_OR_gate_2 is port (
12 a,b,c,d : in bit;
13 result : out bit
14 );
15 end four_port_OR_gate_2;
16
17 architecture four_port_structure of four_port_OR_gate_2 is
18 signal e, f : bit;
19 begin
20 OR1: entity work.OR_gate_2 port map(a,b,e);
21 OR2: entity work.OR_gate_2 port map(c,d,f);
22 result <= e or f;
23 end four_port_structure;
124 ece351 lab manual [september 2, 2023]

a, b, and e, to the ports x, y, and F of the entity OR_gate_2, respec-


tively. Using the private member, current_map, in the Elaborator
class will help you with the signal substitutions that occurs when a
component is inlined. After the mapping is established, the elabo-
rator then proceeds to replace OR1 by inlining the architecture body
corresponding to the entity OR_gate_2 into four_port_structure.
The same procedure is carried out for the component OR2 and
we now will have an equivalent architecture body as shown in Fig-
ure 11.2. Lines 5 and 6 in Figure 11.2 corresponds to the inlining of
OR1 and OR2 (found in lines 20 and 21 from Figure 11.1).

Figure 11.2: Elaborated architecture


1 begin body, four_port_structure.
2 result <= ( e or f );
3 e <= ( a or b );
4 f <= ( c or d );
5 end four_port_structure;
6
7 entity eight_port_OR_gate_2 is port(
8 x0, x1, x2, x3, x4, x5, x6, x7 : in bit;

11.2.2 Inlining Components with Signal List in Architecture

In addition to substituting the input and output pins for a port map,
you will also encounter situations where there are signals declared in
the architecture body that you are trying to inline to the design unit
the elaborator is currently processing.
For example, consider vhdl source in Figure 11.3, which is the
extension of the program in Figure 11.1.
The two components in eight_port_structure are instances of
four_port_OR_gate_2; the architecture of four_port_OR_gate_2 de-
fines signals e and f. Now, if we elaborate, say, OR1, we determine the
mapping as before for the input and output pins, but we also need
to consider the signals defined within the architecture. If we simply
add e and f to the list of signals of eight_port_structure, we will
run into problems of multiply defined signals when we elaborate OR2;
we will obtain a signal list with two e’s and two f’s. Furthermore, we
will change the logical behaviour defined in eight_port_structure.
To address this issue, all internal signals that are added as a re-
sult of elaboration will be prefixed with ‘comp<num>_’, where <num>
is a unique identifier1 used to ensure that the elaboration does not 1
This number starts at 1 and increments
change the logical behaviour of the program. The result of elaborat- for each component that is instantiated
in the vhdl program. <num> is never
ing eight_port_OR_gate_2 is shown Figure 11.4. reset.
[lab 11] vhdl → vhdl: desugaring & elaboration 125

Figure 11.3: Extension of the vhdl


1 entity eight_port_OR_gate_2 is port ( program shown in Figure 11.1.
2 x0, x1, x2, x3, x4, x5, x6, x7 : in bit;
3 y : out bit
4 );
5 end eight_port_OR_gate_2;
6
7 architecture eight_port_structure of eight_port_OR_gate_2 is
8 signal result1, result2 : bit;
9 begin
10 OR1: entity work.four_port_OR_gate_2 port map(x0, x1, x2, x3, result1);
11 OR2: entity work.four_port_OR_gate_2 port map(x4, x5, x6, x7, result2);
12 y <= result1 or result2;
13 end eight_port_structure;

Figure 11.4: Elaborated architecture


1 entity eight_port_OR_gate_2 is port( body, eight_port_structure.
2 x0, x1, x2, x3, x4, x5, x6, x7 : in bit;
3 y : out bit
4 );
5 end eight_port_OR_gate_2;
6 architecture eight_port_structure of eight_port_OR_gate_2 is
7 signal result1, result2, comp3_e, comp3_f, comp4_e, comp4_f : bit;
8 begin
9 y <= ( result1 or result2 );
10 result1 <= ( comp3_e or comp3_f );
11 comp3_e <= ( x0 or x1 );
12 comp3_f <= ( x2 or x3 );
13 result2 <= ( comp4_e or comp4_f );
14 comp4_e <= ( x4 or x5 );
15 comp4_f <= ( x6 or x7 );
16 end eight_port_structure;
126 ece351 lab manual [september 2, 2023]

11.2.3 Inlining Components with Processes in Architecture


The previous examples demonstrated the behaviour of inlining com-
ponents with architectures that only contain signal assignment state-
ments. When the elaborator encounters processes in the inlining, a
similar procedure is performed. The main difference in the proce-
dure for processes is to make the appropriate signal substitutions in
sensitivity lists and if-else statement conditions.

11.2.4 Notes
• The elaborator will only expand components when its correspond-
ing design unit is also defined in the same file.
• The elaborator processes the design units in sequential order. We
assume that the vhdl programs we are transforming are written
so that you do not encounter cases where the architecture that you
are inlining contains components that have not yet been elabo-
rated.
• We will assume that the vhdl programs being elaborated will
not result in architecture bodies with a mixture of parallel signal
assignment statements and process statements (so that the parser
from Lab 10 can parse the transformed programs).

11.3 Evaluation

The last pushed commit before the deadline is evaluated both on the
shared test inputs and on a set of secret test inputs according to the
weights in the table below. The testing equation for desugaring is:
parse(v).desugar().isomorphic(parse(vStaff))
Similarly, the testing equation for elaboration is:
parse(v).elaborate().isomorphic(parse(vStaff))

Shared Secret
TestDeSugarer 20 10
TestElaborator 50 20
Choice: either do this lab or do the lab
marked as lab12. Do not do both.

Lab 12
vhdl Process Splitting & Combinational Synthesis

Files:
For this lab, you will be writing code to perform two other transfor- ece351.v.Splitter.java
mations on vhdl programs. The first transformation is a vhdl to ece351.v.Synthesizer.java

vhdl transformation, which we call process splitting. Process splitting Tests:


ece351.v.test.TestSplitter
involves breaking up if/else statements where multiple signals are ece351.v.test.TestSynthesizer
being assigned new values in the if and else clauses.
In the second part of this lab, you will be translating vhdl to F ,
which we call combinational synthesis. The combinational synthesizer
will take the vhdl program output from the process splitter and
generate a valid F program from it.

12.1 Process Splitter

The process splitter will perform the necessary transformations to


vhdl program ASTs so that exactly one signal is being assigned a
value in a process. Consider the vhdl code shown in Figure 12.1.
Here, we have a process in the architecture behv1 of the entity Mux
the requires splitting because in the if and else clauses, there are two
signals that are being assigned new values: O0 and O1. The splitter
should observe this and proceed to replace the current process with
two new processes: one to handle the output signal O0 and the other
to handle O1. Figure 12.2 shows the splitter’s output when the code
in Figure 12.1 is processed. Note that the sensitivity lists for the two
new processes only contain the set of signals that may cause a change
to the output signals. This desugaring highlights the im-
You might also notice that it appears that the condition becomes portance of syntactic sugar in terms
of usability for the programmer, and
longer and more complicated from Figure 12.1 to Figure 12.2. That the importance of desugaring for the
is not related to process splitting: it’s the result of desugaring the ‘=’ compiler engineer. Imagine if the pro-
grammer had to write the condition in
in the condition. Recall that ‘=’ does not exist in F , so it needs to be Figure 12.2: unusable! But conversely,
expressed in terms of the operators that do exist in F : and, or, not. imagine if the compiler engineer had
Table 11.1 shows that ‘=’ is equivalent to xnor. to enhance the entire F toolchain to
deal with so many more operators: too
much work; too many opportunities for
error.
128 ece351 lab manual [september 2, 2023]

Figure 12.1: Example vhdl program


1 entity Mux is port( used to illustrate process splitting.
2 I3,I2,I1,I0,S: in bit;
3 O0,O1: out bit
4 );
5 end Mux;
6
7 architecture behv1 of Mux is
8 begin
9 process(I3,I2,I1,I0,S)
10 begin
11 if (S = ’0’) then
12 O0 <= I0;
13 O1 <= I2;
14 else
15 O0 <= I1;
16 O1 <= I3;
17 end if;
18 end process;
19
20 end behv1;
[lab 12] vhdl process splitting & combinational synthesis 129

Figure 12.2: The resulting vhdl pro-


1 entity Mux is port( gram of Figure 12.1 after process
2 I3, I2, I1, I0, S : in bit; splitting.

3 O0, O1 : out bit


4 );
5 end Mux;
6 architecture behv1 of Mux is
7
8 begin
9 process ( S, I0, I1 )
10 begin
11 if ( ( not ( ( ( S and ( not ( ’0’ ) ) ) or ( ( not ( S ) ) and ’0’ ) ) ) ) ) then
12 O0 <= I0;
13 else
14 O0 <= I1;
15 end if;
16 end process;
17 process ( S, I2, I3 )
18 begin
19 if ( ( not ( ( ( S and ( not ( ’0’ ) ) ) or ( ( not ( S ) ) and ’0’ ) ) ) ) ) then
20 O1 <= I2;
21 else
22 O1 <= I3;
23 end if;
24 end process;
25 end behv1;

12.2 Splitter Notes

• Assume that there is exactly one assignment statement in the if


body and one assignment statement in the else body that write to
the same output signal. This implies that you do not need to han-
dle the case where latches are inferred. Making this assumption
should reduce the amount of code you need to write for this lab.

• The private variable usedVarsInExpr is used to store the vari-


ables/signals that are used in a vhdl expression. This is helpful
when you are trying to create sensitivity lists for new processes.
130 ece351 lab manual [september 2, 2023]

12.3 Synthesizer

After parsing the input vhdl source file and performing all of the
transformations (i.e., desugaring, elaborating, and process splitting),
the synthesizer will traverse the (transformed) ast (of the vhdl
program), extract all expressions in the program, and generate an F
program containing these expressions.
In §8 we generated a Java program by constructing a string. We
noted that there was no guarantee that the string we generated
would be a legal Java program. One way to ensure that we gener-
ate a legal program is to build up an ast instead of building up a
string. Of course, to do that we need to have ast classes, which we
do not have for Java — but do have for F .
Because the F grammar and ast classes are a subset of those for
vhdl, we can simply reuse many of the expressions from the input
vhdl program with only minor modification to rename the variables
to avoid name collision.

12.3.1 If/Else Statements


Whenever an if/else statement is encountered, first extract the con-
dition and add it to the F program. Because there is no assignment
that occurs in the condition expression, generate an output variable
for the condition when you output the condition to the F program.
The output variable will be of the form: ‘condition<num>’, where
<num> = 1, 2, 3, . . . . <num> is incremented each time you encounter a
condition in the vhdl program and is never reset.
After appending the condition of the if/else statement to the F
program, construct an assignment for the output variable like so:1 1
Note that process splitting is useful
here because we will only have one
if ( vexpr ) then signal assignment statement in the if-
output <= vexpr1; and else- bodies, where both statements
assign the expressions to the same
else output signal.
output <= vexpr2;
end if;

The formulas that should be appended to the F program is:


condition<num> <= vexpr;
output <= ( condition<num> and (vexpr1) or (not condition<num>) and (vexpr2) );

Let’s refer to the variable named conditionX as an intermediate variable The synthesizer here is not the only
because it is not intended as part of the circuit’s final output, but is place where these intermediate vari-
ables are created. They also occur in
used in computing the final output pins. Our F simulator generator elaboration. They are helpful for debug-
and technology mapper do not support these intermediate variables. ging, so it’s worth keeping them here
and removing them later.
Removing them is not hard: we just inline their definition wherever
they are used. We will provide you with code to do that later.
[lab 12] vhdl process splitting & combinational synthesis 131

12.3.2 Example Synthesizing Signal Assignment Statements


Consider the vhdl program shown in Figure 12.3. When the synthe-
sizer processes this program2 , an F program consisting of a single 2
after desugaring, elaborating, and
splitting
formula (corresponding to ) is generated (see Figure 12.4).

Figure 12.3: Example used to illustrate


1 entity NOR_gate is port( synthesizing assignment statements.
2 x,y: in bit;
3 F: out bit
4 );
5 end NOR_gate;
6
7 architecture NOR_gate_arch of NOR_gate is
8 begin
9
10 F <= x nor y;
11
12 end NOR_gate_arch;

Figure 12.4: Synthesized output of the


1 NOR_gateF <= ( not ( NOR_gatex or NOR_gatey ) ); program in Figure 12.3.

12.3.3 Example Synthesizing If/Else Statements


Consider the vhdl program show in Figure 12.5. After performing
all of the vhdl transformations that you have written, the synthe-
sizer will generate the F program shown in Figure 12.6.
In Figure 12.6, observe that the synthesizer translates the if/else
statement in Figure 12.5 into two F formulae. The first formula that
is generated corresponds to the condition that is checked in the state-
ment (i.e., x=‘0’ and y=‘0’). The second formula combines the ex-
pressions found in the if and else bodies so that the expression in
the if-body is assigned to F if the condition is true; otherwise, the
expression in the else-body is assigned to F.
132 ece351 lab manual [september 2, 2023]

Figure 12.5: Example used to illustrate


1 entity NOR_gate is port( synthesizing assignment statements.
2 x, y: in bit;
3 F: out bit
4 );
5 end NOR_gate;
6
7 architecture NOR_gate_arch of NOR_gate is
8 begin
9 process(x, y)
10 begin
11 if (x=’0’ and y=’0’) then
12 F <= ’1’;
13 else
14 F <= ’0’;
15 end if;
16 end process;
17 end NOR_gate_arch;

Figure 12.6: Synthesized output of the


1 condition1 <= ( ( not ( ( NOR_gatex and ( not ’0’ ) ) or ( ( not NOR_gatex ) program in Figure 12.5.
and ’0’ ) ) ) and ( not ( ( NOR_gatey and ( not ’0’ ) ) or ( ( not
NOR_gatey ) and ’0’ ) ) ) );
2 NOR_gateF <= ( condition1 and ( ’1’ ) ) or ( ( not condition1 ) and ( ’0’ ) );

12.4 Evaluation

The last pushed commit before the deadline is evaluated both on the
shared test inputs and on a set of secret test inputs according to the
weights in the table below.

Current New
TestSplitter 40 10
TestSynthesizer 40 10
Lab 13 This lab by Nicholas Champion.

Simulation: F → Assembly Choice: You can do either this lab or the


original lab11. Don’t do both.
There is no official instructional
support for this lab. Do not attempt this
lab if you are not comfortable learning
In §8 you generated Java code from an F program. That Java code assembly on your own and configuring
your system to work with the assembler
would read and write W files, simulating the F program for each that you have chosen.
clock tick. In this lab you will make a variation of the simulator gen-
erator from §8. The main loop is still the same, with the call to the
W parser etc.. The only part that will be different is the methods
you generate to compute the values of the output pins. For exam-
ple, in §8 you might have generated a method like the one shown
in Figure 13.1. The point of this lab is to generate these methods in
assembly, and have the main body (still in Java) call the assembly
versions.

Figure 13.1: Generate this method in


// methods to compute values for output pins assembly rather than in Java
public static boolean x(final boolean a, final boolean b) { return or(a, b) ; }

13.1 Which assembler to use?

Options include:

• masm32. Requires a 32 bit jdk. This is the option supported by


the description below and the skeleton code. The other options
have no instructional support.

• gcc with inline assembly.

• JWasm http://www.japheth.de/JWasm.html. JWasm is an open-


source fork of the Watcom assembler (Wasm). It is written in C
and is masm compatible. The ‘J’ here apparently stands for the
maintainer (Japheth), and not for Java.
Watcom, as you know, is from the University of Waterloo. UW has
a long and famous history with compilers, that you can read a lit-
tle bit about at http://www.openwatcom.com/index.php/Project_
History
134 ece351 lab manual [september 2, 2023]

• MIPS Assembler and Runtime Simulator (MARS). A Java implemen- There might be some bonus marks
tation of the risc machine described in the Hennessey & Patterson available if you develop a skeleton
and some instructional support for
computer architecture textbook. http://courses.missouristate. an assembler such as mars that has
edu/kenvollmar/mars/ Because this is a simulator written in Java, no system dependencies outside of a
regular jvm.
it might avoid a lot of the systems issues of going to x86 assembly.
Also, it might be easier to generate MIPS assembly than x86 (RISC
instruction sets are usually simpler than CISC ones).

13.2 masm32 Configuration

• MASM32 requires Windows (any version should work). MASM32


includes an old version of Microsoft’s assembler ml.exe. Newer
version are included with releases of Visual Studio releases and
support newer instruction.
• Install the 32-bit JDK and configure your Eclipse installation to use
this JDK. JNI and Windows operating systems require that 64-bit
programs use 64-bit dynamic libraries (i.e., what MASM32 will
generate). 64-bit programs cannot use 32-bit libraries.
• A possible multi-platform alternative to experiment with, which is
NOT-supported by the skeleton or test harness, is JWASM.
• It can be downloaded from http://www.masm32.com/
• It is recommended that you install MASM32 to C:\masm32
• Add (<Install Path>\bin) to your system path environment variable
(i.e., C:\masm32\bin)
• To check if it was added to the path, open a new command win-
dow and type "ml.exe". It should output "Microsoft (R) Macro
Assembler Version 6.14.8444...".

13.3 masm32 JNI and x86 Assembly


You can use the JavaH tool in order
The skeleton and test harness created for solving this lab using to see the function signatures. It is in-
MASM32 are based on the Lab 8 and so approximately a third of cluded with the JDK to generate the C
header for a java class. You can modify
the redacted skeleton code can be taken from a lab 8 solution. The TestSimulatorGeneratorMasm32.java to
main difference from lab 8 is the means in which the F-statements generate the headers files by uncom-
menting the lines which write the batch
are evaluated. In lab 8, methods were created and defined in Java for file. You will need to change the path
each statement by walking the tree using pre-order. In lab 12, these to point to your JDK. To see how java
methods are to be implemented in x86 assembly. It is not possible to call types map to standard C types (and
hence ASM), see jni.h (in the JDK).
include inline assembly in Java as Java is executed in the JVM. There-
fore the x86 is assembled and linked into a shared library (dynamic
link library (DLL) on Windows). This library must be loaded by the
generated java code using the Java Native Interface (JNI). JNI allows
native functions to be added to the class which loads the library. The
library load in the skeleton is static while the methods are not static.
Hence the main method will instantiate its own class and make calls
[lab 13] simulation: F → assembly 135

to its non-static native methods. JNI also imposes certain naming


conventions on the signatures of functions declared in the shared
library. For example the native method
public native boolean Simulator_opt1_or_true1 (final boolean a);
translates to the C shared library export signature
JNIEXPORT jboolean JNICALL Java_Simulator_1opt1_1or_1true1_Evalx(JNIEnv *, jobject, jboolean);
This C export translates to the x86 assembly signature (for the pur-
poses of the this lab)
Java_Simulator_1opt1_1or_1true1_Evalx proc JNIEnv:DWORD, jobject:DWORD, in_a:BYTE
It is notable that JNI requires including pointers to the Java environ-
ment and the Java class instance. It is also notable that the under-
scores in the class name and the method name must be replaced with
_1. Most of the function prototype details are already included in the
skeleton.
Figure 13.2 is a simple example of the generate code for a shared
library for a F Program.

Figure 13.2: Example assembly for an F


.386 Program
.model flat,stdcall
option casemap:none

Java_Simulator_1opt1_1or_1true1_Evalx PROTO :DWORD, :DWORD, :BYTE

.code
LibMain proc hInstance:DWORD, reason:DWORD, reserved1:DWORD
mov eax, 1
ret
LibMain endp

Java_Simulator_1opt1_1or_1true1_Evalx proc JNIEnv:DWORD, jobject:DWORD, in_a:BYTE


; (a, true)or

mov EAX, 0
mov EBX, 0
mov AL,in_a
mov EBX,1
or EAX, EBX
ret
Java_Simulator_1opt1_1or_1true1_Evalx endp
End LibMain
136 ece351 lab manual [september 2, 2023]

Figure 13.3 the linker for MASM32 uses a definitions file to deter-
mine which symbols should be exported. Note that the library name
does not require the replacing underscores. Additional exports are
added on separate lines.

Figure 13.3: Example definitions file for


LIBRARY Simulator_opt1_or_true1 a dynamic link library (DLL)
EXPORTS Java_Simulator_1opt1_1or_1true1_Evalx

Edit SimulatorGeneratorMasm32.java
This lab requires generating the instructions for the F-statement. and run TestSimulatorGenerator-
Masm32.java
Most of the other code is generated by the skeleton. To generate the
instructions, the ast should be walked in post order as a simple
topological sort and stored in the operations list. It is then necessary
to assign signals and intermediate signals to registers and write in-
structions for each ast object. For example VarExpr or ConstantExpr
can be converted into 8-bit immediate-mode mov instructions. The
other operators can be implemented using various Boolean operator
instruction. It is possible to implement all expressions including a
NotExpr using a single instruction and no conditionals. The result
of the last computation for the statement should be stored in EAX
as the return value for the function. x86 has four 32-bit general pur-
pose registers EAX, EBX, ECX, and EDX and supports 16-bit and
8-bit access modes. A helper class is provided which contains the
names of that correspond to the other addressing modes (i.e., AL ref-
erences the low byte of EAX). Since there are only four registers (or
fewer if you remove some from the Register.General list for debug-
ging purposes), it is possible that large assignment statements will
require the use of memory. A simple FIFO (registerQueue) strategy
is used to determine which register to save to memory and reassign
before performing an operation. Memory can be allocated using
the LOCAL keyword: LOCAL m0 :BYTE. These allocation statements
should be immediately following the function signature before any
instructions. In order to track which memory is in use the hash map
memoryMap maps the name of the memory allocation to a Boolean
value of whether it is used. For convenience the IdentityHashMap
storageMap maps an Expr to the name of either the memory or regis-
ter currently storing its output value.
[lab 13] simulation: F → assembly 137

13.4 Evaluation

There is no automated evaluation for this lab due to the variety of


possible system dependencies. You will have to meet with the course
staff to be evaluated, which will include you answering questions
about how your design works. Your generated assembly code should
pass all of the tests from §8.
There are potentially bonus marks if you find a cross-platform
assembler with minimal system dependencies and develop a bit of a
skeleton and instructional support for it. The mars mips simulator
mentioned above is the best possibility for this that we are aware of.
Compiler Concepts: code generation
Programming Concepts: assembly

Lab 14
Simulation: F → JVM

14.1 How javac compiles boolean logical operators†

The Java operand stack treats booleans as integers because each slot on
the operand stack is 32-bits. Zero is false and one is true (as in C).
Boolean logic operators are translated into if comparisons against
zero. There are no jvm opcodes for boolean logic.
Also note the short-circuiting semantics of boolean logic opera-
tors in Java (as in C): operands are only evaluated if necessary. For
example, if the first operand of a conjunction (and) is false, then the
second operand is not evaluated. Similarly, if the first operand of a
disjunction (or) is true, the second operand is not evaluated.
public class javac_code {
boolean or (boolean x, boolean y) { return x || y; }
boolean and(boolean x, boolean y) { return x && y; }
boolean not(boolean x) { return ! x; }
}
140 ece351 lab manual [september 2, 2023]

Compiled from "javac_code.java"


public class javac_code {
public javac_code();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init >":() V
4: return

boolean or(boolean, boolean); // x || y


Code:
0: iload_1 // push x
1: ifne 8 // if ( x != 0) goto instruction 8
4: iload_2 // push y
5: ifeq 12 // if ( y == 0) goto instruction 12
8: iconst_1 // push true ( one)
9: goto 13 // goto return true
12: iconst_0 // push false ( zero)
13: ireturn // return top of stack

boolean and(boolean, boolean); // x && y


Code:
0: iload_1 // push x
1: ifeq 12 // if ( x == 0) goto instruction 12
4: iload_2 // push y
5: ifeq 12 // if ( y == 0) goto instruction 12
8: iconst_1 // push true ( one)
9: goto 13 // goto return true
12: iconst_0 // push false ( zero)
13: ireturn // return top of stack

boolean not(boolean); // ! x
Code:
0: iload_1 // push x
1: ifne 8 // if ( x != 0) goto instruction 8
4: iconst_1 // push true ( one)
5: goto 9 // goto return true
8: iconst_0 // push false ( zero)
9: ireturn // return top of stack
}
[lab 14] simulation: F → jvm 141

14.2 How we will generate code


Files:
We do not need to respect the short-circuiting semantics of boolean ece351.f.SimulatorGeneratorASM
logic operators in Java/C — because our input language is not ece351.f.DepthCounter
Tests:
Java/C. Our input language is F , which does not have short-circuiting
ece351.f.TestSimulatorGeneratorASM
semantics. So we can compute disjunction (or) with integer addition
and conjunction (and) with integer multiplication.
We can keep all of our temporary values on the jvm operand
stack, so we do not need to use any locals/registers beyond what is
necessary to hold the method arguments. A simple post-order Infix: a + b
traversal of the F ast produces the opcodes we need. In other Prefix: + a b
Postfix: a b +
words, we are converting the F ast from infix form to postfix (suffix)
form (i.e., rpn: Reverse Polish Notation). In lab8 we converted the
F ast from infix form to prefix form (i.e., Polish Notation). At the
end of our generated methods we will convert our computed value to
either zero (false) or one (true) before returning it.
. class public Simulator_ex00_asm x <= ’0’;
. super java/lang/Object
. method public <init>()V
In these listings there appears to be
aload_0
a space in Object.<init >()V. That is a
invokespecial java/ lang/Object.<init >() V phantom space inserted by the LATEX
return package that formats the source code.
There is no space in the underlying
. end method source file. You should not have that
. method public static x()Z space in your generated code.
. limit locals 0 ; one for each argument
. limit stack 1 ; depth of the AST
iconst_0 ; push false
ifeq False ; if top of stack is zero, then jump to False
True: ; the true case
iconst_1 ; push true
ireturn ; return top of stack ( true)
False: ; label we can jump to
iconst_0 ; push false
ireturn ; return top of stack ( false)
. end method
142 ece351 lab manual [september 2, 2023]

. class public Simulator_ex01_asm x <= ’1’;


. super java/lang/Object
. method public <init>()V
aload_0
invokespecial java/ lang/Object.<init >() V
return
. end method
. method public static x()Z
. limit locals 0 ; one for each argument
. limit stack 1 ; depth of the AST
iconst_1 ; push true
ifeq False ; if top of stack is zero, then jump to False
True: ; the true case
iconst_1 ; push true
ireturn ; return top of stack ( true)
False: ; label we can jump to
iconst_0 ; push false
ireturn ; return top of stack ( false)
. end method

. class public Simulator_ex02_asm x <= a;


. super java/lang/Object
. method public <init>()V
aload_0
invokespecial java/ lang/Object.<init >() V
return
. end method
. method public static x(Z)Z
. limit locals 1 ; one for each argument
. limit stack 1 ; depth of the AST
iload_0 ; a
ifeq False ; if top of stack is zero, then jump to False
True: ; the true case
iconst_1 ; push true
ireturn ; return top of stack ( true)
False: ; label we can jump to
iconst_0 ; push false
ireturn ; return top of stack ( false)
. end method
[lab 14] simulation: F → jvm 143

x <= a or b;
. class public Simulator_ex05_asm
. super java/lang/Object
. method public <init>()V
aload_0
invokespecial java/ lang/Object.<init >() V
return
. end method
. method public static x(ZZ)Z
. limit locals 2 ; one for each argument
. limit stack 2 ; depth of the AST
iload_0 ; a
iload_1 ; b
iadd ; ( a or b )
ifeq False ; if top of stack is zero, then jump to False
True: ; the true case
iconst_1 ; push true
ireturn ; return top of stack ( true)
False: ; label we can jump to
iconst_0 ; push false
ireturn ; return top of stack ( false)
. end method

x <= c and ( a or b );
. class public Simulator_ex06_asm
. super java/lang/Object
. method public <init>()V
aload_0
invokespecial java/ lang/Object.<init >() V
return
. end method
. method public static x(ZZZ)Z
. limit locals 3 ; one for each argument
. limit stack 3 ; depth of the AST
iload_0 ; c
iload_1 ; a
iload_2 ; b
iadd ; ( a or b )
imul ; ( c and ( a or b ) )
ifeq False ; if top of stack is zero, then jump to False
True: ; the true case
iconst_1 ; push true
ireturn ; return top of stack ( true)
False: ; label we can jump to
iconst_0 ; push false
ireturn ; return top of stack ( false)
. end method
144 ece351 lab manual [september 2, 2023]

14.3 Negation

Negation (not) needs special consideration in this encoding. There


are multiple ways it can be done.
One possibility is to generate conditionals (ifs) to test if the integer
on the top of the stack is zero (i.e., false) or non-zero (i.e., true).
There is another possibility that does not introduce any condition-
als into the generated code. Conditionals are relatively expensive,
compared to integer operations. We can use bitwise xor (exclusive
or) will toggle the 1-bit of an integer, which is what logical negation
does. For this to work, we need a normalization function that reduces
non-zero integers to one, yet still leaves zero as zero. Here is such a
function that will work for non-negative integers up to half-max-int:

2x
1+x

14.4 Evaluation

The last pushed commit before the deadline is evaluated both on


the shared test inputs and on a set of secret test inputs according to
the weights in the table below. Let ‘f’ name the F program we want
to simulate, let ‘w’ name an input W program for it, and let ‘wStaff’
name the output of the staff’s simulator for this F / W combination.
The testing equation is:
genASM(f).simulate(w).isomorphic(wStaff) The previously mentioned z_challenge.f
input file turns out to be not that chal-
lenging: specifically, it is not triggering
Regular Cases Negation overflow in anyone’s implementation.
TestSimulatorGenerator 70 30 So it did not meet its design objective.
One of the students this term pointed
out that the handling of negation
previously proposed was not fully
14.5 Bonus Lab: F to x86/x64/etc correct. So the new ‘challenge’ is to use
one of the two correct ways of handling
See Lab 13. This is completely bonus. In past terms only one or two negation now described in this manual
(or come up with your own correct
students have attempted it. It is hard. It is time consuming. It is way).
intellectually rewarding.
Appendix A
Extra vhdl Lab Exercises

A.1 Define-Before-Use Checker

Compilers generally perform some form of semantic analysis on the


AST of the program after the input program is parsed. The analysis
might include checking that all variables/signals are defined be-
fore they are used. In this part of the lab, write a define-before-use
checked that traverses through a vhdl AST and determines whether
all signals that are used in the signal assignment statements are de-
fined before they are used. In addition, for all signal assignment
statements, a signal that is being assigned (left-hand side) to an ex-
pression (right-hand side) must not be an input signal. Driving an
input signal simultaneously from two sources would cause undefined
behaviour at run time.
Within a design unit, a signal may be defined in the entity dec-
laration as an input bit or output bit; it may also be defined in the
optional signal declaration list within the body of the correspond-
ing architecture. For example, if we consider the vhdl program
shown in Figure A.1, the following signals are defined in this entity-
architecture pair: a0, b0, a1, b1, a2, b2, a3, b3, Cin, sum0, sum1, sum2,
sum3, Cout, V, c0, c1, c2, c3, and c4.
The define before use checker should throw an exception if it
checks the program shown in Figure A.1. This figure illustrates the
two violations your checker should detect:

a. The assignment statement in line 16 uses a signal called ‘c’, which


is undefind in this program.
b. The assignment statement in line 17 tries to assign an input
pin/signal (‘Cin’) to an expression.

The checker should throw a RuntimeException exception upon the


first violation that is encountered.
All of the code that you will write for the checker should be in the
class DefBeforeUseChecker in the package ece351.v. TestDefBeforeUseCheckerAccept
146 ece351 lab manual [september 2, 2023]

and TestDefBeforeUseCheckerReject are JUnit tests available for


testing the checker. These two classes are found in the package
ece351.v.test.

A.1.1 Some Tips


• For vhdl programs that have multiple design units, apply the
violation checks per design unit (i.e., treat them separately).
• Maintain the set of declared signals in such a way that it is easy to
identify where the signals are declared in the design unit.

Figure A.1: vhdl program used to


1 entity four_bit_ripple_carry_adder is port ( illustrate signal declarations and the
2 a0, b0, a1, b1, a2, b2, a3, b3, Cin : in bit; use of undefined signals in signal
assignment statements.
3 sum0, sum1, sum2, sum3,Cout, V: out bit
4 );
5 end four_bit_ripple_carry_adder;
6
7
8 architecture fouradder_structure of four_bit_ripple_carry_adder is
9 signal c1, c2, c3, c4: bit;
10 begin
11 FA0 : entity work.full_adder port map(a0,b0,Cin,sum0,c1);
12 FA1 : entity work.full_adder port map(a1,b1,c1,sum1,c2);
13 FA2 : entity work.full_adder port map(a2,b2,c2,sum2,c3);
14 FA3 : entity work.full_adder port map(a3,b3,c3,sum3,c4);
15
16 V <= c xor c4;
17 Cin <= c4;
18 end fouradder_structure;

A.2 Inline F intermediate variables

X <= A or B;
Y <= X or C;
...
Y <= (A or B) or C;
Appendix B
Advanced Programming Concepts

These are some advanced programming ideas that we explored in the


labs.

B.1 Immutability
ast classes
Immutability is not a ‘design pattern’ in the sense that it is not in the
design patterns book.1 It is, however, perhaps the most important 1
E. Gamma, R. Helm, R. Johnson, and
general design guideline in this section. Immutability has a number J. Vlissides. Design Patterns: Elements
of Reusable Object-Oriented Software.
of important advantages: Addison-Wesley, 1995

• Immutable objects can be shared and reused freely. There is no


danger of race conditions or confusing aliasing. Those kinds of
bugs are subtle and difficult to debug and fix.

• No need for defensive copies. Some object-oriented programming


guidelines advocate for making defensive copies of objects in order
to avoid aliasing and race condition bugs.

• Representation invariants need to be checked only at object con-


struction time, since there is no other time the object is mutated.

There are also a number of possible disadvantages:

• Object trees must be constructed bottom-up. This discipline can be


useful for understanding the program, but sometimes requires a
bit of adjustment for programmers who are not familiar with this
discipline.

• Changing small parts of large complex object graphs can be diffi-


cult. If the object graphs are trees then the visitor design pattern
can be used to make the rewriting code fairly simple.

• Sometimes data in the problem domain are mutable, and so are


best modelled by mutable objects. A bank account balance is a
classic example.
148 ece351 lab manual [september 2, 2023]

B.2 Representation Invariants


Tiger: repOk() methods
Representation invariants are general rules that should be true of all
objects of a particular class. For example:

• LinkedList.head != null
• LinkedList.size is the number of nodes in the list
• that the values in a certain field are sorted

These rules are often written in a repOk() method, which can be


called at the end of every method that mutates the object (for im-
mutable objects these methods are just the constructors). It is a good
idea to check these rules right after mutations are made so that the
failure (bad behaviour) of the software occurs close to the fault (error
in the source code). If these rules are not checked right away, then
the failure might not occur until sometime later when the execution
is in some other part of the program: in those circumstances it can be
difficult to find and fix the fault.

B.3 Functional Programming

Some ideas from functional programming that are useful in the


object-oriented programming context:

• immutability
• recursion
• lazy computation
• higher-order functions
• computation as a transformation from inputs to outputs

In w2013 we’ve talked about immutability, and the code has embod-
ied both immutability and computation as a transformation from
inputs to outputs.
See the Design Patterns book or
Wikipedia or SourceMaking.com

Appendix C The material in this section is mostly


copied from SourceMaking.com.
Design Patterns

Design patterns represent a standard set of ideas and terms used by


object-oriented programmers. If you get a job writing object-oriented 1
E. Gamma, R. Helm, R. Johnson, and
code these are concepts and terms you will be expected
 to know. J. Vlissides. Design Patterns: Elements
 creational of Reusable Object-Oriented Software.

Addison-Wesley, 1995
Design patterns1 are classified into three groups structural

 behavioural

C.1 Iterator

• Behavioural
• Provide a way to access the elements of an aggregate object se-
quentially without exposing its underlying representation.
• Enables decoupling of collection classes and algorithms.
• Polymorphic traversal

C.2 Singleton
ConstantExpr
• Creational
• Ensure a class has only predetermined small number of named
instances, and provide a global point of access to them.
• Encapsulated ‘lazy’ (‘just-in-time’) initialization.

C.3 Composite
Expr class hierarchy
• Structural
• Compose objects into tree structures to represent whole-part hier-
archies. Composite lets clients treat individual objects and compo-
sitions of objects uniformly.
• Recursive composition
• e.g., ‘Directories contain entries, each of which could be a file or a
directory.’
150 ece351 lab manual [september 2, 2023]

C.4 Template Method


BinaryExpr.simplifySelf()
• Behavioural
• Define the skeleton of an algorithm in an operation, deferring
some steps to client subclasses. Template Method lets subclasses
redefine certain steps of an algorithm without changing the algo-
rithm’s structure.
• Base class declares algorithm ‘placeholders’, and derived classes
implement the placeholders.

C.5 Interpreter
Expr.simplify()
• Behavioural
• Given a language, define a representation for its grammar along
with an interpreter that uses the representation to interpret sen-
tences in the language.
• Map a domain to a language, the language to a grammar, and the Tiger: §4.3 + lab manual
grammar to a hierarchical object-oriented design.
• In other words, add a method to the AST for the desired opera-
tion. Then implement that method for every leaf AST class.
• An alternative to Visitor. Interpreter makes it easier to modify the
grammar of the language than Visitor does, but with Visitor it is
easier to add new operations.

C.6 Visitor
Expr class hierarchy
• Behavioural
• Represent an operation to be performed on the elements of an
object structure. Visitor lets you define a new operation without
changing the classes of the elements on which it operates.
• Do the right thing based on the type of two objects.
• Double dispatch
• An alternative to Interpreter.
Appendix D Term names follow the UW convention:
cyym, where c is century (0 for 20th,
Lab Instructor Notes (GitLab) 1 for 21st); yy is the last two digits of
the year; m is the number of the month
at the start of the term (either 1, 5, or
9). So 1151 is the term that starts in
January of 2015.

D.1 Repository URLs

Staff: Same repositories used over time.


https://git.uwaterloo.ca/ece351-notes/ece351-tex Source for Lab Manual + Course Notes
[email protected]:drayside/ece351-code Staff source code

Staff: Metadata for each offering (create fresh; §E.2):


[email protected]:ece351/term/metadata Offering metadata

Students: Need to be created fresh each term (see §E.2).


[email protected]:ece351/term/ece351-notes Compiled version of lab manual, etc.
[email protected]:ece351/term/lib Shared libraries
[email protected]:ece351/term/skeleton Code to be distributed to students
[email protected]:ece351/term/student/labs Each student’s individual repo

D.2 Setting Up

D.2.1 Creating the Wildcard Repo ece351/term


Gitolite documentation:
These steps done by the Gitolite administrator: http://gitolite.com/gitolite/
conf.html

a. git clone [email protected]:gitolite-admin


b. cp conf/courses/ece351-last.conf conf/courses/ece351-term.conf
c. edit conf/courses/ece351-term.conf
• define staff group @ece351-term-staff
• define student group @ece351-term-students
(student group should include staff IDs)
d. edit conf/gitolite.conf
• add include for conf/courses/ece351-term.conf
e. git add conf/gitolite.conf conf/courses/ece351-term.conf
f. git commit -m ’creating new wildcard repo for ECE351 term term’ The post-commit hooks might take a
g. git push long time to run (maybe an hour or
more). The permissions changes will
take effect almost immediately. What
The subsequent steps can be done by the course lab instructor. takes so long is updating all of the web
pages: it checks which repositories are
visible for each user.
152 ece351 lab manual [september 2, 2023]

D.2.2 Notes/PDF Repository


• git clone [email protected]:ece351/term/ece351-notes To create the repo
• outline.pdf
• lab-manual.pdf
• course-notes.pdf
• practice-questions/
• ssh [email protected] perms ece351/term/ece351-notes + READERS @ece351-term-students
• ssh [email protected] perms ece351/term/ece351-notes + WRITERS @ece351-term-staff

D.2.3 Lib Repository


• git clone [email protected]:ece351/term/lib To create the repo
• README.txt as first commit to create master branch
• jar files
• ssh [email protected] perms ece351/term/lib + READERS @ece351-term-students
• ssh [email protected] perms ece351/term/lib + WRITERS @ece351-term-staff

D.2.4 Course Offering Metadata Repository


• git clone [email protected]:ece351/term/metadata To create the repo
• README.txt as first commit to create master branch
• ssh [email protected] perms ece351/term/metadata + WRITERS @ece351-term-staff
• download roster from Quest
• extract student userids
• cd ece351-code (change to the staff solution repo)
• cd marking
• git submodule add [email protected]:ece351/term/metadata
metadata-term

D.2.5 Skeleton Repository


• git clone [email protected]:ece351/term/skeleton To create the repo
• README.txt as first commit to create master branch
• git submodule add [email protected]:ece351/term/lib
• meta/
• .project + .classpath
• .gitignore
• build.xml
• prelab: src/ece351/util/* [Note: See §E.6]
• ssh [email protected] perms ece351/term/skeleton + READERS @ece351-term-students
• ssh [email protected] perms ece351/term/skeleton + WRITERS @ece351-term-staff

Fork student repos (see §E.3 below). Then add: Students will have to pull from skeleton
during the prelab exercise to get these
test files.
• TestPrelab*.java
[appendix d] lab instructor notes (gitlab) 153

• TestImports.java
154 ece351 lab manual [september 2, 2023]

D.3 Forking Student Repos


See scripts/fork.sh

#!/bin/bash

echo "copy this script to the appropriate location and customize it"
exit 1

t=1181

for s in ‘cat students.txt‘


do
echo "********************************************************************"
echo $s
git clone [email protected]:ece351/$t/$s/labs $s
cd $s
git remote add skeleton [email protected]:ece351/$t/skeleton
git pull skeleton master
git push
ssh [email protected] perms ece351/$t/$s/labs + WRITERS $s
ssh [email protected] perms ece351/$t/$s/labs + WRITERS @ece351−$t−staff
#ssh [email protected] perms ece351/$t/$s/labs −l
cd ..
done

D.4 Marking Script: Build.xml

The students also have a copy of this.

D.5 Web User Interface for Gitolite Server

Written with Django/Python. Source available at:


https://github.com/eyolfson/site-ecegit
It depends on two Django apps:
https://github.com/eyolfson/django-gitolite
https://github.com/eyolfson/django-ssh
[appendix d] lab instructor notes (gitlab) 155

D.6 Exporting to Skeleton: export.sh


Don’t forget to also copy the test input files
Use the export.sh script to prepare the source files for the skeleton. into the skeleton!
Never copy source files directly from the staff repo to the skeleton. The Never copy source files directly from
export.sh script performs two functions: (1) snips out the parts that the staff repo to the skeleton.

the students are supposed to write, and (2) inserts the intellectual
property header.
If export.sh runs successfully you should see many lines of output export.sh makes a clone of your local
that start with ‘dos2unix’. If you get a message that the temporary copy of the instructor code repo and
works from that. So if you have uncom-
directory exists and the script is exiting, you need to delete that old mitted changes in your local copy of the
temporary directory first and then run export.sh. repo they will not be included in the
export.
After running export.sh use a visual diff tool such as meld to com-
pare the snipped source code with the skeleton. Merge manually.
Sometimes there will be changes that are made to the skeleton manu-
ally that should not be overwritten by a future export.

D.6.1 Update header.txt at the start of term


At the beginning of each term, update the file src/ece351/header.txt
to have the current term date and number. This file is pre-pended to
each source file by the export script.

D.6.2 Export on an as-needed basis


Experience has shown that it’s best to do four major releases during
the term: prelab, W , F , and vhdl. Exporting all of the code at the
beginning is overwhelming for many students. Also, the gradual
export strategy makes it easier for the staff to revise later labs after
the term has started.

D.6.3 Exporting to skeleton dev branch


Sometimes it is a good idea to make a dev branch in the skeleton
repo and export to that first. This way staff and advanced students
can test the code before it is released to the class as a whole. Also,
usually the code for lab k + 1 is released to dev before the deadline
for lab k. After the deadline for k then dev is merged to skeleton mas-
ter. If code for lab k + 1 is released to students before the deadline for
lab k it can cause a lot of confusion, especially if that code is mixed
with critical patches for lab k.
156 ece351 lab manual [september 2, 2023]

D.6.4 Files to Release for Each Lab

After the base files, fork student re-


0. update header.txt before populating skeleton §E.6.1 pos. Then add TestImports.java and
README.txt TestPrelab.java

lib submodule
.gitignore
.project
.classpath
build.xml
meta/hours.txt + meta/collaboration.txt
src/ece351/util/*.java
1. tests/wave/*.wave
src/ece351/w/ast/*
src/ece351/w/rdescent/* Manually remove Todo351Exception
src/ece351/w/regex/* from TestWRegexSimpleData.
2. tests/wave/staff.out/*
src/ece351/objectcontract/*
src/ece351/w/svg/* Excluding tests/f/secret!
3. tests/f/* including ungrammatical/, gates/ and staff.out/
src/ece351/common/ast/*
src/ece351/common/visitor/* for compilation dependencies
src/ece351/f/FParser.java
src/ece351/f/analysis/* for compilation dependencies
src/ece351/f/ast/*
src/ece351/f/rdescent/*
src/ece351/f/test/*
4. src/ece351/f/simplifier/*
5. src/ece351/w/parboiled/*
6. src/ece351/f/parboiled/*
7. src/ece351/f/techmapper/*
tests/f/staff.out/graph/*
8. tests/f/staff.out/simulator/*
src/ece351/f/simgen/TestSimulatorGenerator.java
src/ece351/f/simgen/SimulatorGenerator.java
9. tests/v/*
src/ece351/v/VBase.java
src/ece351/v/ast/*
src/ece351/v/test/*
src/ece351/v/test/VRecognizer.java (or class)
src/ece351/v/test/VParser.java (or class)
src/ece351/v/test/TestArchitectureEquivalence
10. src/ece351/v/PostOrderVVisitor.java
src/ece351/v/DeSugarer.java
src/ece351/v/Elaborator.java
[appendix d] lab instructor notes (gitlab) 157

src/ece351/v/test/TestDeSugarer.java
src/ece351/v/test/TestElaborator.java
Appendix E Term names follow the UW convention:
cyym, where c is century (0 for 20th,
Lab Instructor Notes (gitolite) 1 for 21st); yy is the last two digits of
the year; m is the number of the month
at the start of the term (either 1, 5, or
9). So 1151 is the term that starts in
January of 2015.

E.1 Repository URLs

Staff: Same repositories used over time.


https://git.uwaterloo.ca/ece351-notes/ece351-tex Source for Lab Manual + Course Notes
[email protected]:drayside/ece351-code Staff source code

Staff: Metadata for each offering (create fresh; §E.2):


[email protected]:ece351/term/metadata Offering metadata

Students: Need to be created fresh each term (see §E.2).


[email protected]:ece351/term/ece351-notes Compiled version of lab manual, etc.
[email protected]:ece351/term/lib Shared libraries
[email protected]:ece351/term/skeleton Code to be distributed to students
[email protected]:ece351/term/student/labs Each student’s individual repo

E.2 Setting Up

E.2.1 Creating the Wildcard Repo ece351/term


Gitolite documentation:
These steps done by the Gitolite administrator: http://gitolite.com/gitolite/
conf.html

a. git clone [email protected]:gitolite-admin


b. cp conf/courses/ece351-last.conf conf/courses/ece351-term.conf
c. edit conf/courses/ece351-term.conf
• define staff group @ece351-term-staff
• define student group @ece351-term-students
(student group should include staff IDs)
d. edit conf/gitolite.conf
• add include for conf/courses/ece351-term.conf
e. git add conf/gitolite.conf conf/courses/ece351-term.conf
f. git commit -m ’creating new wildcard repo for ECE351 term term’ The post-commit hooks might take a
g. git push long time to run (maybe an hour or
more). The permissions changes will
take effect almost immediately. What
The subsequent steps can be done by the course lab instructor. takes so long is updating all of the web
pages: it checks which repositories are
visible for each user.
160 ece351 lab manual [september 2, 2023]

E.2.2 Notes/PDF Repository


• git clone [email protected]:ece351/term/ece351-notes To create the repo
• outline.pdf
• lab-manual.pdf
• course-notes.pdf
• practice-questions/
• ssh [email protected] perms ece351/term/ece351-notes + READERS @ece351-term-students
• ssh [email protected] perms ece351/term/ece351-notes + WRITERS @ece351-term-staff

E.2.3 Lib Repository


• git clone [email protected]:ece351/term/lib To create the repo
• README.txt as first commit to create master branch
• jar files
• ssh [email protected] perms ece351/term/lib + READERS @ece351-term-students
• ssh [email protected] perms ece351/term/lib + WRITERS @ece351-term-staff

E.2.4 Course Offering Metadata Repository


• git clone [email protected]:ece351/term/metadata To create the repo
• README.txt as first commit to create master branch
• ssh [email protected] perms ece351/term/metadata + WRITERS @ece351-term-staff
• download roster from Quest
• extract student userids
• cd ece351-code (change to the staff solution repo)
• cd marking
• git submodule add [email protected]:ece351/term/metadata
metadata-term

E.2.5 Skeleton Repository


• git clone [email protected]:ece351/term/skeleton To create the repo
• README.txt as first commit to create master branch
• git submodule add [email protected]:ece351/term/lib
• meta/
• .project + .classpath
• .gitignore
• build.xml
• prelab: src/ece351/util/* [Note: See §E.6]
• ssh [email protected] perms ece351/term/skeleton + READERS @ece351-term-students
• ssh [email protected] perms ece351/term/skeleton + WRITERS @ece351-term-staff

Fork student repos (see §E.3 below). Then add: Students will have to pull from skeleton
during the prelab exercise to get these
test files.
• TestPrelab*.java
[appendix e] lab instructor notes (gitolite) 161

• TestImports.java
162 ece351 lab manual [september 2, 2023]

E.3 Forking Student Repos


See scripts/fork.sh

#!/bin/bash

echo "copy this script to the appropriate location and customize it"
exit 1

t=1181

for s in ‘cat students.txt‘


do
echo "********************************************************************"
echo $s
git clone [email protected]:ece351/$t/$s/labs $s
cd $s
git remote add skeleton [email protected]:ece351/$t/skeleton
git pull skeleton master
git push
ssh [email protected] perms ece351/$t/$s/labs + WRITERS $s
ssh [email protected] perms ece351/$t/$s/labs + WRITERS @ece351−$t−staff
#ssh [email protected] perms ece351/$t/$s/labs −l
cd ..
done

E.4 Marking Script: Build.xml

The students also have a copy of this.

E.5 Web User Interface for Gitolite Server

Written with Django/Python. Source available at:


https://github.com/eyolfson/site-ecegit
It depends on two Django apps:
https://github.com/eyolfson/django-gitolite
https://github.com/eyolfson/django-ssh
[appendix e] lab instructor notes (gitolite) 163

E.6 Exporting to Skeleton: export.sh


Don’t forget to also copy the test input files
Use the export.sh script to prepare the source files for the skeleton. into the skeleton!
Never copy source files directly from the staff repo to the skeleton. The Never copy source files directly from
export.sh script performs two functions: (1) snips out the parts that the staff repo to the skeleton.

the students are supposed to write, and (2) inserts the intellectual
property header.
If export.sh runs successfully you should see many lines of output export.sh makes a clone of your local
that start with ‘dos2unix’. If you get a message that the temporary copy of the instructor code repo and
works from that. So if you have uncom-
directory exists and the script is exiting, you need to delete that old mitted changes in your local copy of the
temporary directory first and then run export.sh. repo they will not be included in the
export.
After running export.sh use a visual diff tool such as meld to com-
pare the snipped source code with the skeleton. Merge manually.
Sometimes there will be changes that are made to the skeleton manu-
ally that should not be overwritten by a future export.

E.6.1 Update header.txt at the start of term


At the beginning of each term, update the file src/ece351/header.txt
to have the current term date and number. This file is pre-pended to
each source file by the export script.

E.6.2 Export on an as-needed basis


Experience has shown that it’s best to do four major releases during
the term: prelab, W , F , and vhdl. Exporting all of the code at the
beginning is overwhelming for many students. Also, the gradual
export strategy makes it easier for the staff to revise later labs after
the term has started.

E.6.3 Exporting to skeleton dev branch


Sometimes it is a good idea to make a dev branch in the skeleton
repo and export to that first. This way staff and advanced students
can test the code before it is released to the class as a whole. Also,
usually the code for lab k + 1 is released to dev before the deadline
for lab k. After the deadline for k then dev is merged to skeleton mas-
ter. If code for lab k + 1 is released to students before the deadline for
lab k it can cause a lot of confusion, especially if that code is mixed
with critical patches for lab k.
164 ece351 lab manual [september 2, 2023]

E.6.4 Files to Release for Each Lab

After the base files, fork student re-


0. update header.txt before populating skeleton §E.6.1 pos. Then add TestImports.java and
README.txt TestPrelab.java

lib submodule
.gitignore
.project
.classpath
build.xml
meta/hours.txt + meta/collaboration.txt
src/ece351/util/*.java
1. tests/wave/*.wave
src/ece351/w/ast/*
src/ece351/w/rdescent/* Manually remove Todo351Exception
src/ece351/w/regex/* from TestWRegexSimpleData.
2. tests/wave/staff.out/*
src/ece351/objectcontract/*
src/ece351/w/svg/* Excluding tests/f/secret!
3. tests/f/* including ungrammatical/, gates/ and staff.out/
src/ece351/common/ast/*
src/ece351/common/visitor/* for compilation dependencies
src/ece351/f/FParser.java
src/ece351/f/analysis/* for compilation dependencies
src/ece351/f/ast/*
src/ece351/f/rdescent/*
src/ece351/f/test/*
4. src/ece351/f/simplifier/*
5. src/ece351/w/parboiled/*
6. src/ece351/f/parboiled/*
7. src/ece351/f/techmapper/*
tests/f/staff.out/graph/*
8. tests/f/staff.out/simulator/*
src/ece351/f/simgen/TestSimulatorGenerator.java
src/ece351/f/simgen/SimulatorGenerator.java
9. tests/v/*
src/ece351/v/VBase.java
src/ece351/v/ast/*
src/ece351/v/test/*
src/ece351/v/test/VRecognizer.java (or class)
src/ece351/v/test/VParser.java (or class)
src/ece351/v/test/TestArchitectureEquivalence
10. src/ece351/v/PostOrderVVisitor.java
src/ece351/v/DeSugarer.java
src/ece351/v/Elaborator.java
[appendix e] lab instructor notes (gitolite) 165

src/ece351/v/test/TestDeSugarer.java
src/ece351/v/test/TestElaborator.java
Appendix F
Bibliography

[1] A. W. Appel and J. Palsberg. Modern Compiler Implementation in


Java. Cambridge, 2004.

[2] J. Bloch. Effective Java. Addison-Wesley, 2001.

[3] R. E. Bryant. Graph-based algorithms for boolean function


manipulation. IEEE Transactions on Computers, C-35(8):677–691,
Aug. 1986.

[4] B. Eckel. Thinking in Java. Prentice-Hall, 2002. http://www.


mindview.net/Books/TIJ/.

[5] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Pat-


terns: Elements of Reusable Object-Oriented Software. Addison-
Wesley, 1995.

[6] B. Liskov and J. Guttag. Program Development in Java: Abstraction,


Specification, and Object-Oriented Design. Addison-Wesley, 2001.

[7] J. C. Reynolds. User-defined types and procedural data as com-


plementary approaches to data abstraction. In S. A. Schuman,
editor, New Directions in Algorithmic Languages: IFIP Working
Group 2.1 on Algol. INRIA, 1975.

[8] M. L. Scott. Programming Language Pragmatics. Morgan Kauf-


mann, 3 edition, 2009.

[9] M. Torgersen. The Expression Problem Revisited: Four new so-


lutions using generics. In M. Odersky, editor, Proc.18th ECOOP,
volume 3344 of LNCS, Oslo, Norway, June 2004. Springer-Verlag.

[10] P. Wadler. The expression problem, Nov. 1998. Email to Java


Generics list.

[11] W. Wulf and M. Shaw. Global variables considered harmful.


ACM SIGPLAN Notices, 8(2):80–86, Feb. 1973.
168 ece351 lab manual [september 2, 2023]

[12] M. Zenger and M. Odersky. Independently extensible solutions


to the expression problem. In Proc.12th Workshop on Foundations
of Object-Oriented Languages, 2005.

You might also like