Compiler Design
Compiler Design
Lexeme: In a compiler, a lexeme refers to the smallest meaningful unit in the source code of a
program. It is a sequence of characters in the source code that represents a single, indivisible
element, such as a keyword, identifier, operator, or literal. Lexemes are the building blocks of a
program's syntax and semantics, and they are used by the lexical analyzer (lexer) to generate tokens.
A lexer is a crucial component of the compiler that performs lexical analysis or scanning. Its primary
role is to read the source code character by character, identify lexemes, and categorize them into
tokens, associating each token with a specific type and, in some cases, additional attributes. Tokens
are then passed to the parser for further analysis and processing.
Here are some examples of lexemes and the corresponding tokens they may generate in a
programming language like C:
Lexeme: "while"
Description: The lexeme "while" is recognized as a keyword in C, indicating the start of a while loop.
Lexeme: "count"
Description: The lexeme "count" represents an identifier, which could be a variable or function name.
Lexeme: "+"
Lexeme: "42"
Lexeme: "3.14"
Description: The lexeme "3.14" represents a floating-point literal, a decimal number with a fractional
part.
Lexeme: "("
Description: The lexeme "(" is recognized as a left parenthesis, often used to group expressions.
In summary, a lexeme in a compiler is a sequence of characters in the source code that represents a
single, meaningful element of the program. Lexemes are identified and categorized into tokens
during the lexical analysis phase, and these tokens are then used by subsequent phases of the
compiler for parsing, semantic analysis, and code generation.
Type of grammar
CFG Simplification
[Video lecture-11: Knowledge Gate]
In a CFG, it may happen that all the production rules and symbols are not
needed for the derivation of strings. Besides, there may be some null
productions and unit productions. Elimination of these productions and
symbols is called simplification of CFGs. Simplification essentially comprises
of the following steps −
Reduction of CFG
Removal of Unit Productions
Removal of Null Productions
Reduction of CFG
CFGs are reduced in two phases −
Derivation Procedure −
Step 1 − Include all symbols, W1, that derive some terminal and
initialize i=1.
Derivation Procedure −
Step 2 − Include all symbols, Yi+1, that can be derived from Yi and include
all production rules that have been applied.
Problem
Find a reduced grammar equivalent to the grammar G, having production
rules, P: S → AC | B, A → a, C → c | BC, E → aA | e
Solution
Phase 1 −
T = { a, c, e }
W2 = { A, C, E } U { S } from rule S → AC
W3 = { A, C, E, S } U ∅
G’ = { { A, C, E, S }, { a, c, e }, P, {S}}
where P: S → AC, A → a, C → c , E → aA | e
Phase 2 −
Y1 = { S }
Y2 = { S, A, C } from rule S → AC
Y4 = { S, A, C, a, c }
G” = { { A, C, S }, { a, c }, P, {S}}
where P: S → AC, A → a, C → c
Removal of Unit Productions
Any production rule in the form A → B where A, B ∈ Non-terminal is
called unit production..
Removal Procedure −
Step 3 − Repeat from step 1 until all unit productions are removed.
Problem
S → XY, X → a, Y → Z | b, Z → M, M → N, N → a
Solution −
Y → Z, Z → M, and M → N
S → XY, X → a, Y → Z | b, Z → M, M → a, N → a
S → XY, X → a, Y → Z | b, Z → a, M → a, N → a
S → XY, X → a, Y → a | b, Z → a, M → a, N → a
S → XY, X → a, Y → a | b
Removal of Null Productions
In a CFG, a non-terminal symbol ‘A’ is a nullable variable if there is a
production A → ε or there is a derivation that starts at A and finally ends
up with
ε: A → .......… → ε
Removal Procedure
Step 3 − Combine the original productions with the result of step 2 and
remove ε - productions.
Problem
S → ASA | aB | b, A → B, B → b | ∈
Solution −
S→ASA | aB | b | a, A ε B| b | &epsilon, B → b
Now we will remove A → ε.
S→ASA | aB | b | a | SA | AS | S, A → B| b, B → b
Question : In general how many normal forms are for CFG (Context Free Grammar)
Ans. 2 Chomsky Normal Form and Greiback Normal Form
Important terms
The most general phase of structured grammar is?
Answer: a. Context Sensitive Grammar
Explanation: Context-sensitive grammar is the most general phase of
structured grammar because, in this grammar, the left-hand side and the right
side contain the terminals or non-terminals.
In the compiler, the function of using intermediate code is:
a. Operator Precedence
b. SLR
c. Canonical LR
d. LALR
Answer: c. Canonical LR
Explanation: Canonical LR (CLR) is the most powerful parser than LALR and SLR.
The value of which variable is updated inside the loop by a loop-invariant value?
a. loop
b. strength
c. induction
d. invariable
Answer: c. induction
Explanation: The value of the induction variable is updated inside the loop by a
loop-invariant value.
Explanation: The compiler does not take more time to execute. So, more
execution time is not a characteristic of the compiler.
Which method merges the multiple loops into the single one?
a. Constant Folding
b. Loop rolling
c. Loop fusion or jamming
d. None of the above
Answer: c. Loop fusion or Loop jamming
a. Bottom-up parser
b. Top-down parser
c. Both Top-down and bottom-up
d. None of the Above