0% found this document useful (0 votes)
69 views36 pages

2 Chomsky, Lexical Analysis and Pasing

The document discusses programming language syntax and grammars. It covers: - Chomsky hierarchy including regular, context-free, and context-sensitive grammars - Lexical and syntactic analysis in programming languages - Backus-Naur Form (BNF) and Extended BNF (EBNF) which are used to define the concrete syntax of programming languages - Parse trees which represent the structure of a program based on the grammar - How grammars define operator precedence and associativity through parse tree structure - Ambiguous grammars which allow multiple parse trees for a string

Uploaded by

sania ejaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views36 pages

2 Chomsky, Lexical Analysis and Pasing

The document discusses programming language syntax and grammars. It covers: - Chomsky hierarchy including regular, context-free, and context-sensitive grammars - Lexical and syntactic analysis in programming languages - Backus-Naur Form (BNF) and Extended BNF (EBNF) which are used to define the concrete syntax of programming languages - Parse trees which represent the structure of a program based on the grammar - How grammars define operator precedence and associativity through parse tree structure - Ambiguous grammars which allow multiple parse trees for a string

Uploaded by

sania ejaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 36

Programming Languages

Lexical and Syntactic Analysis


• Chomsky Grammar Hierarchy
• Lexical Analysis – Tokenizing
• Syntactic Analysis – Parsing Noam Chomsky

• Hmm Concrete Syntax


• Hmm Abstract Syntax

Dr. Philip Cannata 1


Chomsky Hierarchy

• Regular grammar – used for tokenizing


• Context-free grammar (BNF) – used for parsing
• Context-sensitive grammar – not really used for
programming languages

Dr. Philip Cannata 2


Regular Grammar
• Simplest; least powerful
• Equivalent to:
– Regular expression (think of perl)
– Finite-state automaton
• Right regular grammar:
  Terminal*,
A and B  Nonterminal
A→B
A→
• Example:
Integer → 0 Integer | 1 Integer | ... | 9 Integer |
0 | 1 | ... | 9

Dr. Philip Cannata 3


Regular Grammar

• Less powerful than context-free grammars


• The following is not a regular language
{ aⁿ bⁿ | n ≥ 1 }
i.e., cannot balance: ( ), { }, begin end

Dr. Philip Cannata 4


Regular Expressions

x a character x
\x an escaped character, e.g., \n
{ name } a reference to a name
M|N M or N
MN M followed by N
M* zero or more occurrences of M
M+ One or more occurrences of M
M? Zero or one occurrence of M
[aeiou] the set of vowels
[0-9] the set of digits
. any single character

Dr. Philip Cannata 5


Regular Expressions

Dr. Philip Cannata 6


Regular Expressions

Dr. Philip Cannata 7


Finite State Automaton for Identifiers

(S, a2i$) ├ (I, 2i$)


├ (I, i$)
├ (I, $)
├ (F, )

Thus: (S, a2i$) ├* (F, )

Dr. Philip Cannata 8


Deterministic Finite State Automaton Examples

Dr. Philip Cannata 9


Context-Free Grammar

Production:
α→β
α  Nonterminal
β  (Nonterminal  Terminal)*
ie, lefthand side is a single nonterminal, and righthand
side is a string of nonterminals and/or terminals
(possibly empty).

Dr. Philip Cannata


Context-Sensitive Grammar

Production:
α→β |α| ≤ |β|
α, β  (Nonterminal  Terminal)*
ie, lefthand side can be composed of strings of
terminals and nonterminals

Dr. Philip Cannata


Syntax

• The syntax of a programming language is a precise


description of all its grammatically correct programs.
• Precise syntax was first used with Algol 60, and has been
used ever since.
• Three levels:
– Lexical syntax - all the basic symbols of the language
(names, values, operators, etc.)
– Concrete syntax - rules for writing expressions,
statements and programs.
– Abstract syntax - internal representation of the program,
favoring content over form.

Dr. Philip Cannata


Grammars
Grammars: Metalanguages used to define the concrete syntax of a
language.

Backus Normal Form – Backus Naur Form (BNF)


• Stylized version of a context-free grammar (cf. Chomsky hierarchy)
• First used to define syntax of Algol 60
• Now used to define syntax of most major languages
Production:
α→β
α  Nonterminal
β  (Nonterminal  Terminal)*
ie, lefthand side is a single nonterminal, and β is a string of nonterminals
and/or terminals (possibly empty).
• Example
Integer  Digit | Integer Digit
Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Dr. Philip Cannata


Extended BNF (EBNF)

Additional metacharacters
{ } a series of zero or more
( ) must pick one from a list
[ ] pick none or one from a list

Example
Expression -> Term { ( + | - ) Term }
IfStatement -> if ( Expression ) Statement [ else Statement ]

EBNF is no more powerful than BNF, but its production rules are often simpler
and clearer.

Javacc EBNF
( … )* a series of zero or more
( … )+ a series of one or more
[ … ] optional
Dr. Philip Cannata
For more details, see Chapter 2 of
“Programming Language Pragmatics, Third Edition (Paperback)”
Michael L. Scott (Author)

Dr. Philip Cannata


Instance of a Programming
Language:
int main ()
{
Internal Parse Tree
return 0 ;
}

Program (abstract syntax):


Function = main; Return type = int
params =
Block:
Return:
Variable: return#main, LOCAL addr=0
IntValue: 0

Abstract Syntax

Dr. Philip Cannata


Now we’ll focus
on the internal
parse tree

Dr. Philip Cannata


Parse Trees

Integer  Digit | Integer Digit


Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Parse Tree for 352 as an Integer

Dr. Philip Cannata


Arithmetic Expression Grammar

Expr  Expr + Term | Expr – Term | Term


Term  0 | ... | 9 | ( Expr )

Parse of 5 - 4 + 3

Dr. Philip Cannata


Associativity and Precedence

• A grammar can be used to define associativity and


precedence among the operators in an expression.
E.g., + and - are left-associative operators in mathematics;
* and / have higher precedence than + and - .
• Consider the following grammar:
Expr -> Expr + Term | Expr – Term | Term
Term -> Term * Factor | Term / Factor | Term % Factor | Factor
Factor -> Primary ** Factor | Primary
Primary -> 0 | ... | 9 | ( Expr )

Dr. Philip Cannata


Associativity and Precedence
Parse of 4**2**3 + 5 * 6 + 7

Dr. Philip Cannata


Associativity and Precedence

Precedence Associativity Operators


3 right **
2 left * / %
1 left + -

Note: These relationships are shown by the structure


of the parse tree: highest precedence at the bottom,
and left-associativity on the left at each level.

Dr. Philip Cannata


Ambiguous Grammars

• A grammar is ambiguous if one of its strings has two


or more diffferent parse trees.

• Example:
Expr -> Expr Op Expr | ( Expr ) | Integer
Op -> + | - | * | / | % | **

• Equivalent to previous grammar but ambiguous

Dr. Philip Cannata


Ambiguous Grammars

Ambiguous Parse of 5 – 4 + 3

Dr. Philip Cannata


Dangling Else Ambiguous Grammars

IfStatement -> if ( Expression ) Statement |


if ( Expression ) Statement else Statement
Statement -> Assignment | IfStatement | Block
Block -> { Statements }
Statements -> Statements Statement | Statement

With which ‘if’ does the following ‘else’ associate

if (x < 0)
if (y < 0) y = y - 1;
else y = 0;

Dr. Philip Cannata


Dangling Else Ambiguous Grammars

Dr. Philip Cannata


Hmm BNF (i.e., Concrete Syntax)

Program : {[ Declaration ]|retType Identifier Function | MyClass | MyObject}


Function : ( ) Block
MyClass: Class Idenitifier { {retType Identifier Function}Constructor {retType Identifier Function
}}
MyObject: Identifier Identifier = create Identifier callArgs
Constructor: Identifier ([{ Parameter } ]) block
Declaration : Type Identifier [ [Literal] ]{ , Identifier [ [ Literal ] ] }
Type : int|bool| float | list |tuple| object | string | void
Statements : { Statement }
Statement : ; | Declaration| Block |ForEach| Assignment |IfStatement|WhileStatement|CallStatement|
ReturnStatement
Block : { Statements }
ForEach: for( Expression <- Expression ) Block
Assignment : Identifier [ [ Expression ] ]= Expression ;
Parameter : Type Identifier
IfStatement: if ( Expression ) Block [elseifStatement| Block ]
WhileStatement: while ( Expression ) Block

Dr. Philip Cannata


Hmm BNF (i.e., Concrete Syntax)
Expression : Conjunction {|| Conjunction }
Conjunction : Equality {&&Equality }
Equality : Relation [EquOp Relation ]
EquOp: == | !=
Relation : Addition [RelOp Addition ]
RelOp: <|<= |>|>=
Addition : Term {AddOp Term }
AddOp: + | -
Term : Factor {MulOp Factor }
MulOp: * | / | %
Factor : [UnaryOp]Primary
UnaryOp: - | !
Primary : callOrLambda|IdentifierOrArrayRef| Literal |subExpressionOrTuple|ListOrListComprehension|
ObjFunction
callOrLambda : Identifier callArgs|LambdaDef
callArgs : ([Expression |passFunc { ,Expression |passFunc}] )
passFunc : Identifier (Type Identifier { Type Identifier } )
LambdaDef : (\\ Identifier { ,Identifier } -> Expression)

Dr. Philip Cannata


Hmm BNF (i.e., Concrete Syntax)

IdentifierOrArrayRef : Identifier [ [Expression] ]


subExpressionOrTuple : ([ Expression [,[ Expression { , Expression } ] ] ] )
ListOrListComprehension: [ Expression {, Expression } ] | | Expression[<- Expression ] {, Expression[<-
Expression ] } ]
ObjFunction: Identifier . Identifier . Identifier callArgs
Identifier : (a |b|…|z| A | B |…| Z){ (a |b|…|z| A | B |…| Z )|(0 | 1 |…| 9)}
Literal : Integer | True | False | ClFloat | ClString
Integer : Digit { Digit }
ClFloat: 0 | 1 |…| 9 {0 | 1 |…| 9}.{0 | 1 |…| 9}
ClString: ” {~[“] }”

Dr. Philip Cannata


Associativity and Precedence for Hmm

Clite Operator Associativity


Unary - ! none
*/ left
+- left
< <= > >= none
== != none
&& left
|| left

Dr. Philip Cannata


Hmm Parse Tree Example
z = x + 2 * y;

Dr. Philip Cannata


Now we’ll focus
on the Abstract
Syntax

Dr. Philip Cannata


Hmm Parse Tree
z = x + 2 * y;
=

Dr. Philip Cannata


Very Approximate Hmm Abstract Syntax

Dr. Philip Cannata


Very Approximate Hmm Abstract Syntax

Assignment = Variable target; Expression source


Expression = VariableRef | Value | Binary | Unary
VariableRef = Variable | ArrayRef
Variable = String id
ArrayRef = String id; Expression index
Value = IntValue | BoolValue | FloatValue | CharValue
Binary = Operator op; Expression term1, term2
Unary = UnaryOp op; Expression term
Operator = ArithmeticOp | RelationalOp | BooleanOp
IntValue = Integer intValue

Dr. Philip Cannata


Hmm Abstract Syntax – Binary Example
z=x+2*y

Binary

Operator Variable Binary


+ x

Operator Value Variable


* 2 y
Dr. Philip Cannata

You might also like