MODULE 3
Semantics analysis: Syntax directed translation, S-attributed and L-attributed grammars, and
Intermediate code forms-AST, Polish notation, three address codes.
Type checking: Type checking, type conversions, equivalence of type expressions,
Overloading of functions and operations. Context sensitive features- Chomsky hierarchy of
languages and recognizers.
----------------------------------------------------------------------------------------------------------------
Semantic Analysis is a subfield of Natural Language Processing (NLP) that attempts to
understand the meaning of Natural Language. Understanding Natural Language might seem a
straightforward process to us as humans. However, due to the vast complexity and subjectivity
involved in human language, interpreting it is quite a complicated task for machines. Semantic
Analysis of Natural Language captures the meaning of the given text while taking into account
context, logical structuring of sentences and grammar roles.
Parts of Semantic Analysis
Semantic Analysis of Natural Language can be classified into two broad parts:
1. Lexical Semantic Analysis: Lexical Semantic Analysis involves understanding the meaning
of each word of the text individually. It basically refers to fetching the dictionary meaning that
a word in the text is deputed to carry.
2. Compositional Semantics Analysis: Although knowing the meaning of each word of the
text is essential, it is not sufficient to completely understand the meaning of the text.
Syntax Directed Translation
Syntax-Directed Translation (SDT) is a method used in compiler design to convert source code
into another form while analyzing its structure. It integrates syntax analysis (parsing) with
semantic rules to produce intermediate code, machine code, or optimized instructions.
In SDT, each grammar rule is linked with semantic actions that define how translation should
occur. These actions help in tasks like evaluating expressions, checking types, generating code,
and handling errors.
SDT ensures a systematic and structured way of translating programs, allowing information to
be processed bottom-up or top-down through the parse tree. This makes translation efficient
and accurate, ensuring that every part of the input program is correctly transformed into its
executable form.
SDT relies on three key elements:
1. Lexical values of nodes (such as variable names or numbers).
2. Constants used in computations.
3. Attributes associated with non-terminals that store intermediate results.
Attributes
An attribute is a characteristic that determines the value of a grammatical symbol. Semantic
functions, also referred to as attribute computation functions, are functions connected to
grammar productions that compute the values of attributes. Predicate functions are functions
that state a specific grammar's static semantic rules as well as portions of its syntax.
Types of Attributes
When parsing values from their domain, attributes can be applied to them and evaluated when
conditions are met. The attributes can be roughly separated into two groups:
1. Synthesized Attributes
2. Inherited Attributes
depending on how they receive their values.
1. Synthesized Attributes
An attribute that lies on the left-hand side of a production is called a synthesized attribute.
These attributes derive their values from the child nodes. Let's use the example of the language
P->QR. If an attribute P depends on attribute Q or R, it will be a synthesized attribute.
Example 1
G → PQRS
Here, G is referred to be a synthesized attribute if it receives values from its sibling nodes which
are P, Q, R, and S.
Example 2
G→G*P+Q
In this example, the parent node G takes its values from all the child nodes, which are P and Q.
2. Inherited Attributes
An inherited attribute belongs to a nonterminal on the right side of a production. These
characteristics borrow values from their siblings or parents. As long as the production at the
parent has the non-terminal in its body, a semantic rule connected to the production at the parent
defines them.
Inherited attributes are useful when the parse tree's structure differs from the source program's
abstract syntax tree. Since they depend on both left and right siblings, they cannot be evaluated
by a pre-order traversal of the parse tree.
Example 1
G → PQRS
Here, G, Q, R, and S can provide values for P. G, P, R, and S can all provide values for Q. G, P,
Q, and S are sources of values for R. In the end, G, P, Q, and R are input sources for S.
Example 2
E→E+S+T
In the above example, the value of S can be determined by E and T. Similarly, the value of T can
be determined with the help of E and S.
S-Attributed SDT
If an SDT uses only synthesized attributes, it is called as S-attributed SDT. These attributes are
evaluated using S-attributed SDTs that have their semantic actions written after the production
(right hand side).
S-attributed SDT As depicted above, attributes in S-attributed SDTs are evaluated in bottom-up
parsing, as the values of the parent nodes depend upon the values of the child nodes.
L-Attributed SDT
his form of SDT uses both synthesized and inherited attributes with restriction of not taking
values from right siblings.
In L-attributed SDTs, a non-terminal can get values from its parent, child, and sibling nodes. As
in the following production
S → ABC S can take values from A, B, and C (synthesized). A can take values from S only. B
can take values from S and A. C can get values from S, A, and B. No non-terminal can get values
from the sibling to its right.
Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing manner.
L-attributed SDT We may conclude that if a definition is S-attributed, then it is also L-attributed
as L-attributed definition encloses S-attributed definitions.
Intermediate code forms-AST
Intermediate Code Generation is a stage in the process of compiling a program, where the
compiler translates the source code into an intermediate representation. This representation is not
machine code but is simpler than the original high-level code. Here’s how it works:
Translation: The compiler takes the high-level code (like C or Java) and converts it into
an intermediate form, which can be easier to analyze and manipulate.
Portability: This intermediate code can often run on different types of machines without
needing major changes, making it more versatile.
Optimization: Before turning it into machine code, the compiler can optimize this
intermedi ate code to make the final program run faster or use less memory.
Polish Notation
Polish notation is also known as prefix notation. Polish notation helps compilers evaluate
mathematical expressions following the order of operations using operator precedence notation,
which defines the order in which operators should be evaluated, such as multiplication before
addition.
Types of Notation
There are two types of polish notation in compiler design. Let us look into both of these in depth.
Prefix Notation
Prefix notation is also known as polish notation. In this notation, the operators are written before
the operands, not like the infix in which the operators are in-between the operands.
For example, the infix notation (5+6)*7 will be written as *+567 in prefix notation.
Postfix Notation
Postfix notation is also known as reverse polish notation. In this notation, the operators are
written after the operands, not like the infix in which the operator is in-between the operands. For
example, the infix notation (5+6)*7 will be written as 56+7* in postfix notation.
Three address code
TAC is an intermediate representation of three-address code utilized by compilers to ease the
process of code generation. Complex expressions are, therefore, decomposed into simple steps
comprising, at most, three addresses: two operands and one result using this code. The results
from TAC are always stored in the temporary variables that a compiler generates. This design
ensures explicit ordering of the operations that come into play. Since it is simple, TAC lends
itself nicely to optimization and translation to machine code. TAC represents control flow and
data dependencies. It falls somewhere between high-level source code and machine-level
instructions.
Three Address Codes are Used in Compiler Applications
Optimization: Three-address code is often used as an intermediate representation of
code during the optimization phases of the compilation process. The three-address
code allows the compiler to analyze the code and perform optimizations that can
improve the performance of the generated code.
Code generation: Three address codes can also be used as an intermediate
representation of code during the code generation phase of the compilation process.
The three-address code allows the compiler to generate code that is specific to the
target platform, while also ensuring that the generated code is correct and efficient.
Debugging: Three address codes can be helpful in debugging the code generated by
the compiler. Since the address code is a low-level language, it is often easier to read
and understand than the final generated code. Developers can use the three address
codes to trace the execution of the program and identify errors or issues that may be
present.
Language translation: Three address codes can also be used to translate code from
one programming language to another. By translating code to a common
intermediate representation, it becomes easier to translate the code to multiple
target languages.
Implementation of Three Address Code
There are 3 representations of three address code namely
Quadruple
Triples
Indirect Triples
1. Quadruple: It is a structure which consists of 4 fields namely and result. op denotes the
operator and arg1 and arg2 denotes the two operands and result is used to store the result
of the expression.
2. Triples: This representation doesn’t make use of extra temporary variable to represent a
single operation instead when a reference to another triple’s value is needed, a pointer to
that triple is used. So, it consist of only three fields namely op, arg1 and arg2.
3. Indirect Triples : This representation makes use of pointer to the listing of all references
to computations which is made separately and stored. Its similar in utility as compared to
quadruple representation but requires less space than it. Temporaries are implicit and
easier to rearrange code.
Type Checking
Type checking is the process of checking and enforcing the constraints of types assigned to
values in a program. A compiler has to check that a source program conforms both to the
syntactic and semantic rules of the language as well as its type rules. The process also helps in
limiting the kinds of types that can be used in certain contexts, assigning types to values, and
then checking that these values are used appropriately.
Object types are checked by the type checker – a separate module in the compiler, which flags
type errors when a value is being used in an inappropriate manner or the type rules of that type
are being violated. The type checker reports a type error in such cases, and then the programmer
must fix the wrong types. Whichever compiler is used for compilation, the type rules f or a
language must be enforced.
Types of Type Checking
There are two kinds of type checking:
Static Type Checking.
Dynamic Type Checking.
Static Type Checking
Static type checking is defined as type checking performed at compile time. It checks the type
variables at compile-time, which means the type of the variable is known at the compile time.
It generally examines the program text during the translation of the program. Using the type
rules of a system, a compiler can infer from the source text that a function (fun) will be applied
to an operand (a) of the right type each time the expression fun(a) is evaluated.
Dynamic Type Checking
Dynamic Type Checking is defined as the type checking being done at run time. In Dynamic
Type Checking, types are associated with values, not variables. Implementations of dynamically
type-checked languages runtime objects are generally associated with each other through a type
tag, which is a reference to a type containing its type information. Dynamic typing is more
flexible. A static type system always restricts what can be conveniently expressed.
Design of the type-checker depends on
Syntactic Structure of language constructs.
The Expressions of languages.
The rules for assigning types to constructs (semantic rules).
The Position of the Type checker in the Compiler
Type conversion :
In type conversion, a data type is automatically converted into another data type by a compiler at
the compiler time. In type conversion, the destination data type cannot be smaller than the source
data type, that’s why it is also called widening conversion. One more important thing is that it
can only be applied to compatible data types
Equivalence of Type Expressions
If two type expressions are equal then return a certain type else return type_error.
Key Ideas:
o The main difficulty arises from the fact that most modern languages allow the
naming of user-defined types.
o For instance, in C and C++ this is achieved by the typedef statement.
o When checking equivalence of named types, we have two possibilities.
Structural Equivalence
Names Equivalence
Structural Equivalence of Type Expressions
Type expressions are built from basic types and constructors, a natural concept of
equivalence between two type expressions is structural equivalence. i.e., two expressions
are either the same basic type or formed by applying the same constructor to structurally
equivalent types. That is, two type expressions are structurally equivalent if and only if
they are identical.
For example, the type expression integer is equivalent only to integer because they are
the same basic type.
Similarly, pointer (integer) is equivalent only to pointer (integer) because the two are
formed by applying the same constructor pointer to equivalent types.
The algorithm recursively compares the structure of type expressions without checking
for cycles so it can be applied to a tree representation. It assumes that the only type
constructors are for arrays , products, pointers, and functions
Overloading Of Functions And Operators
Function Overloading
If any class have multiple functions with same names but different parameters then they are said
to be overloaded.
Function overloading allows you to use the same name for different functions, to perform, either
same or different functions in the same class.
Ways to overload a function:
1. By changing number of Arguments.
2. By having different types of argument.
Operator Overloading
Operator overloading is a compile-time polymorphism.
As the name indicated the operator is overloaded to provide the special meaning to the user-
defined data type.
The Chomsky Hierarchy and CSLs
Context-sensitive languages are located at the third level of the Chomsky hierarchy, between
context-free and recursively enumerable languages.
The relationship between different classes of languages is as follows −
Regular Languages ⊂ Context-Free Languages ⊂ Context-Sensitive Languages ⊂
Recursively Enumerable Languages.
CSLs have the added power to describe patterns that CFLs cannot.
According to Chomsky hierarchy, grammar is divided into 4 types as follows:
1. Type 0 is known as unrestricted grammar.
2. Type 1 is known as context-sensitive grammar.
3. Type 2 is known as a context-free grammar.
4. Type 3 Regular Grammar.
Type 0 Unrestricted Grammar
Type 1 Context-Sensitive Grammar
Type 2 Context-Free Grammar
Type 3 Regular Grammar
Type 0: Unrestricted Grammar
Language recognized by Turing Machine is known as Type 0 Grammar. They are also known as
Recursively Enumerable Languages.
Grammar Production for Type 0 is given by
α —> β
Type 1: Context-Sensitive Grammar
Languages recognized by Linear Bound Automata are known as Type 1 Grammar. Context-
sensitive grammar represents context-sensitive languages.
For grammar to be context-sensitive, it must be unrestricted. Grammar Production for Type 1 is
given by
α —> β (ensuring count symbol in LHS must be less than or equal to RHS)
Type 2: Context-Free Grammar
Languages recognized by Pushdown Automata are known as Type 2 Grammar. Context-free
grammar represents context-free languages.
For grammar to be context-free, it must be context-sensitive. Grammar Production for Type 2 is
given by
A —> α
Where A is a single non-terminal.
Type 3: Regular Grammar
Languages recognized by Finite Automata are known as Type 3 Grammar. Regular grammar
represents regular languages.
For grammar to be regular, it must be context-free. Grammar Production for Type 3 is given by
V → T*V / T*