Lexical_Syntax_Semantic_Analyzers_Latest
Lexical_Syntax_Semantic_Analyzers_Latest
▶ Breaking down text: The lexical analyzer reads the source code
character by character and groups these characters into meaningful
sequences called tokens (lexemes).
▶ Tagging Tokens: Each token is classified into categories such as
keyword, identifier, literal, operator.
▶ Detecting illegal tokens: Lexical analyzer detects and reports any
illegal tokens that do not match any known pattern for a token,
helping in early diagnostics of syntax errors.
▶ Removing Comments: Lexical analyzer removes comments that are
not needed for syntax analysis.
▶ Removing Whitespaces: Lexical analyzer removes any whitespace,
tabs, and newline characters that are not needed for syntax analysis.
▶ Stream of Tokens: The output from lexical analyzer is a stream of
tokens, which is input for the syntax analyzer.
▶ ϵ-closure(s): Set of NFA states that are reachable from the NFA state
s on ε transations alone.
▶ ϵ closure(T): Set of NFA states reachable from some NFA state s ∈ T
on ϵ-transation alone;
▶ move(T,a): Set of NFA states to which there is a transition on input
symbol a from some state s in T.
▶ yylex repeatedly reads the input stream, identifies the longest prefix
that matches any of the specified regular expression patterns,
executes the corresponding action, and continues until there are no
more tokens.
Tokenization: The input data needs to be converted into a format that the
parser can understand (i.e., by breaking the input into meaningful
elements called tokens).
▶ Parser checks the syntax of the given input data and verifies whether
it is derived from the rules of a specific grammar or not.
▶ Errors: When the input data does not derived from the grammar, the
parser is expected to report these errors effectively. It should provide
meaningful error messages to fix the issue.
▶ No-Error: Parser builds a parse tree or an abstract syntax tree which
represents the grammatical structure of the input.
4 Repeat until the end of input and the stack is reduced to the start
symbol.
E → E + T /T
T → T ∗ F /F
F → id
Apply the following rules until no more terminals or ϵ can be added to any
FIRST set.
Apply the following rules until no more terminals can be added to any
FOLLOW set.
S → iEtS|iEtSeS
E →b
S → iEtS|iEtSeS
E →b
S → iEtSS 1
S 1 → eS|ϵ
E →b
Algorithm 1 CLOSURE(I)
1: J=I
2: repeat
3: for(each Item A → α.Bβ in J)
4: for( each Production B → γ of G )
5: if(B → .γ is not in J )
6: add B → .γ to J;
7: until no more items are added to J on one round
8: return J;
Grammar
S →L=R |R
L → ∗R | id
R →L
Augmented Grammar
S1 → S
S →L=R |R
L → ∗R | id
R →L
FIRST(S)=FIRST(L)=FIRST(R)={*, id}
FOLLOW(S)={$}
FOLLOW(L)={$, =}
FOLLOW(R)={$, =}
State I0 :
S 1 → .S
S → .L = R | .R
L → . ∗ R | .id
R → .L
I0 on S we get state I1 :
S 1 → S.
I0 on L we get state I2 :
S → L. = R
R → L.
I0 on L we get state I2 :
S → L. = R
R → L.
Algorithm 3 CLOSURE(I)
1: J=I
2: repeat
3: for(each Item [A → α.Bβ, a] in J)
4: for( each Production B → γ in G 1 )
5: for(each terminal b in FIRST(βa))
6: add [B → .γ, b] to J;
7: until no more items are added to J on one round
8: return J;