unit-1-syntax-and-semantics-ppl (1)
unit-1-syntax-and-semantics-ppl (1)
The evolu琀椀on of programming languages can be examined through various principles that have
guided their development over 琀椀me.
Advantages: Low-level control over hardware, essen琀椀al for early computer systems.
Advantages: Improved readability, portability, and maintenance. Enabled handling complex tasks
with higher-level constructs.
Advantages: Enhanced code organiza琀椀on, readability, and debugging. Facilitated large-scale so昀琀
ware development.
interac琀椀ng objects.
Explana琀椀on: OOP languages like Smalltalk and later, Java and C++, introduced the concept of
classes and objects, promo琀椀ng encapsula琀椀on, inheritance, and polymorphism. This facilitated
be琀琀er code organiza琀椀on, reuse, and maintenance.
Explana琀椀on: Declara琀椀ve languages like SQL and Prolog shi昀琀ed the focus to expressing the
desired outcome, leaving the implementa琀椀on details to the language run琀椀me. This abstrac琀
椀on simpli昀椀ed coding for speci昀椀c tasks like database queries and AI.
Advantages: Simpli昀椀ed coding for speci昀椀c domains like databases and AI. Increased abstrac琀
椀on and reduced complexity in certain applica琀椀ons.
Explana琀椀on: DSLs, such as MATLAB for numerical compu琀椀ng or HTML for web development,
emerged to address par琀椀cular domains. These languages are designed with features and syntax
op琀椀mized for speci昀椀c tasks, improving e昀케ciency.
Explana琀椀on: Func琀椀onal languages like Haskell and Lisp focus on trea琀椀ng func琀椀ons as
values, suppor琀椀ng immutability and avoiding side e昀昀ects. This paradigm facilitates concise,
expressive code and helps in handling complex computa琀椀ons.
Advantages: Concise, expressive code. Facilitated parallel programming and eased reasoning about
complex computa琀椀ons.
Explana琀椀on: With the rise of mul琀椀core processors and distributed systems, languages like
Erlang, Go, and Rust incorporate features to simplify concurrent programming and enhance
performance in parallel environments.
Explana琀椀on: Modern languages like Python, Ruby, and Swi昀琀 focus on readability and
simplicity, aiming to make code more understandable and maintainable. They o昀琀en incorporate
features that enhance expressiveness and reduce boilerplate code.
Advantages: Reduced boilerplate code, enhanced developer produc琀椀vity, and improved code
maintainability. A琀琀racted a broader audience to programming.
SYNTAX
Syntax and seman琀椀cs are important terms in any computer programming language.
What is Syntax?
● Usually, grammar is rewri琀椀ng rules whose aim is to recognize and make programs.
● Grammar doesn’t rely on the computa琀椀on model but rather on the descrip琀椀on of the
language’s structure.
● The grammar includes a 昀椀nite number of gramma琀椀cal categories (including noun
phrases, ar琀椀cles, nouns, verb phrases, etc.), single words (alphabet elements), and
well-formed rules that govern the order in which the gramma琀椀cal categories may appear.
Techniques of Syntax:
There are several formal and informal techniques that may help to understand the syntax of computer
programming language.
1. Lexical Syntax
2. Concrete Syntax
It describes the real representa琀椀on of programs u琀椀lizing lexical symbols such as their
alphabet.
3. Abstract Syntax
Types of Grammars:
1. Context-free Grammar
2. Regular Expressions
3. A琀琀ribute Grammars
The syntax elements of a programming language (variable names, keywords, operators, etc.) are
coloured in a text editor or IDE that supports syntax highligh琀椀ng, making the code easier to
understand.
● Syntax errors occur when a command is typed incorrectly in a command line or when a bug
is discovered in a program or script.
● The command or code should be wri琀琀en without any syntax errors to avoid syntax errors.
Example :
Example of a “for
2, 3, 4, 5] For num in
numbers:
Squared = num ** 2
● Numbers = [1, 2, 3, 4, 5]: This line creates a list called numbers containing 昀椀ve integers.
● For num in numbers: This is the beginning of a “for loop,” itera琀椀ng through each element
in the numbers list.
● Squared = num ** 2: This line calculates the square of the current number in the loop.
● Print(f”The square of {num} is: {squared}”): It prints a forma琀琀ed string displaying the
original number and its square.
● This example demonstrates the syntax of a for loop in Python, itera琀椀ng through a list of
numbers and calcula琀椀ng their squares.
Grammar :
● Context Free Grammar is formal grammar, the syntax or structure of a formal language can
be described using context-free grammar (CFG), a type of formal grammar.
● It is de昀椀ned as four tuples G=(V,T,P,S)
Where,
And the le昀琀-hand side of the G, here in the example can only be a Variable, it cannot be a terminal.
But on the right-hand side here it can be a Variable or Terminal or both combina琀椀on of Variable and
Terminal.
Produc琀椀on rules:
● S → aSa
● S → bSb
● S→c
Now check that abbcbba string can be derived from the given
CFG. S ⇒ aSa
S ⇒ abSba
S⇒
abbSbba
S⇒
abbcbba
By applying the produc琀椀on S → aSa, S → bSb recursively and 昀椀nally applying the produc琀椀
on S → c, we get the string abbcbba.
Capabili琀椀es of CFG:
● CFGs are less expressive, and neither English nor programming language can be expressed
using Context-Free Grammar.
● Context-Free Grammar can be ambiguous means we can generate mul琀椀ple parse trees of
the same input.
● For some grammar, Context-Free Grammar can be less e昀케cient because of the exponen
琀椀al 琀椀me complexity.
● And the less precise error repor琀椀ng as CFGs error repor琀椀ng system is not that precise
that can give more detailed error messages and informa琀椀on.
● Context Free Grammars are used in Compilers (like GCC) for parsing. In this step, it takes a
program (a set of strings).
● Context Free Grammars are used to de昀椀ne the High Level Structure of a Programming
Language.
● Every Context Free Grammar can be converted to a Parser which is a component of a
Compiler that iden琀椀昀椀es the structure of a Program and converts the Program into a
Tree.
● Document Type De昀椀ni琀椀on in XML is a Context Free Grammar which describes the
HTML tags and the rules to use the tags in a nested fashion.
ATTRIBUTE GRAMMAR
A琀琀ributes:
Basis of A琀琀ributes:
Name: The name of these variables can be altered whenever a sub-program is called.
Components: Data objects from other data items are called components. A pointer is used to
represent this binding, which is then modi昀椀ed.
A琀琀ribute Grammar:
Example:
● The right part of the CFG contains the seman琀椀c rules that specify how the grammar
should be interpreted. Here, the values of non-terminals E and T are added together and the
result is copied to the non-terminal E.
● Seman琀椀c a琀琀ributes may be assigned to their values from their domain at the 琀椀me
of parsing and evaluated at the 琀椀me of assignment or condi琀椀ons. Based on the way
the a琀琀ributes get their values, they can be broadly divided into two categories :
synthesized a琀琀ributes and inherited a琀琀ributes.
Synthesized a琀琀ributes:
● These a琀琀ributes get values from the a琀琀ribute values of their child nodes. To illustrate,
assume the following produc琀椀on:
S → ABC
● If S is taking values from its child nodes (A,B,C), then it is said to be a synthesized a琀琀
ribute, as the values of ABC are synthesized to S.
● As in our previous example (E → E + T), the parent node E gets its value from its child node.
Synthesized a琀琀ributes never take values from their parent nodes or any sibling nodes.
Inherited a琀琀ributes:
● In contrast to synthesized a琀琀ributes, inherited a琀琀ributes can take values from parent
and/or siblings. As in the following produc琀椀on,
S → ABC
● A can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take
values from S, A, and B
S-a琀琀ributed SDT:
L-a琀琀ributed SDT:
● This form of SDT uses both synthesized and inherited a琀琀ributes with restric琀椀on of
not taking values from right siblings.
● In L-a琀琀ributed SDTs, a non-terminal can get values from its parent, child, and sibling
nodes. As in the following produc琀椀on
S → ABC
● S can take values from A, B, and C (synthesized). A can take values from S only. B can take
values from S and A. C can get values from S, A, and B. No non-terminal can get values from
the sibling to its right.
● A琀琀ributes in L-a琀琀ributed SDTs are evaluated by depth-昀椀rst and le昀琀-to-right parsing
manner.
SEMANTICS
Techniques of Seman琀椀cs:
There are several techniques that may help to understand the seman琀椀cs of computer programming
language. Some of those techniques are as follows:
1. Algebraic seman琀椀cs
It analyzes the program by specifying algebra.
2. Opera琀椀onal Seman琀椀cs
A昀琀er comparing the languages to the abstract machine, it evaluates the program as a series
of state transi琀椀ons.
3. Transla琀椀onal Seman琀椀cs
It mainly concentrates on the methods that are u琀椀lized for transla琀椀ng a program into
another computer language.
4. Axioma琀椀c Seman琀椀cs
5. Denota琀椀onal Seman琀椀cs
It represents the meaning of the program by a set of func琀椀ons that work on the program state.
Types of Seman琀椀cs:
1. Formal Seman琀椀cs
● Words and meanings are analyzed philosophically or mathema琀椀cally in formal seman琀椀cs.
● It constructs models to help de昀椀ne the truth behind words instead of considering only
real-world instances.
2. Lexical Seman琀椀cs
● It is the most well-known sort of seman琀椀cs.
● It searches for the meaning of single words by taking into considera琀椀on the context and
text surrounding them.
3. Conceptual Seman琀椀cs
● The dic琀椀onary de昀椀ni琀椀on of the word is analyzed before any context is applied in
conceptual seman琀椀cs.
● A昀琀er examina琀椀on of the de昀椀ni琀椀on, the context is explored by searching for
linking terms, how meaning is assigned, and how meaning may change over 琀椀me.
● It can be referred to as a sign that the word conveys context.
Example:
A=5
B=3
C=a+b
LEXICAL ANALYSIS
Terminologies:
● Token
● Pa琀琀ern
● Lexeme
Token: It is a sequence of characters that represents a unit of informa琀椀on in the source code.
Example of tokens:
(keywords)
Example of Non-Tokens:
Lexeme: A sequence of characters in the source code, as per the matching pa琀琀ern of a token, is known
as lexeme. It is also called the instance of a token.
● The lexical analyzer is responsible for removing the white spaces and comments from the
source program.
● It corresponds to the error messages with the source program.
● It helps to iden琀椀fy the tokens.
● The input characters are read by the lexical analyzer from the source code.
How Lexical Analyzer works:
Input preprocessing:
● This stage involves cleaning up the input text and preparing it for lexical analysis.
● This may include removing comments, whitespace, and other non-essen琀椀al characters
from the input text.
Tokeniza琀椀on:
● This is the process of breaking the input text into a sequence of tokens.
● This is usually done by matching the characters in the input text against a set of pa琀琀erns
or regular expressions that de昀椀ne the di昀昀erent types of tokens.
Token classi昀椀ca琀椀on:
Token valida琀椀on:
● In this stage, the lexer checks that each token is valid according to the rules of the
programming language.
● For example, it might check that a variable name is a valid iden琀椀昀椀er, or that an
operator has the correct syntax.
Output genera琀椀on:
● In this 昀椀nal stage, the lexer generates the output of the lexical analysis process, which is
typically a list of tokens.
● This list of tokens can then be passed to the next stage of compila琀椀on or interpreta琀椀on.
● Lexical analysis helps the browsers to format and display a web page with the help of parsed
data.
● It is responsible to create a compiled binary executable code.
● It helps to create a more e昀케cient and specialised processor for the task.
● It requires addi琀椀onal run琀椀me overhead to generate the lexer table and construct the
tokens.
● It requires much e昀昀ort to debug and develop the lexer and its token descrip琀椀on.
● Much signi昀椀cant 琀椀me is required to read the source code and par琀椀琀椀on it into
tokens.
PARSING
What is parser?
● The parser is that phase of the compiler which takes a token string as input and with the
help of exis琀椀ng grammar, converts it into the corresponding Intermediate Representa琀
椀on(IR). The parser is also known as Syntax Analyzer.
● The process of transforming the data from one format to another is called Parsing. This
process can be accomplished by the parser.
Types of Parsing:
Top-Down Parser:
● The top-down parser is the parser that generates parse for the given input string with the
help of grammar produc琀椀ons by expanding the non-terminals
● It starts from the start symbol and ends on the terminals. It uses le昀琀 most deriva琀椀on.
● Further Top-down parser is classi昀椀ed into 2 types: A recursive descent parser, and
Non-recursive descent parser.
1. Recursive descent parser:
● Recursive descent parser is also known as the Brute force parser or the backtracking parser.
● It basically generates the parse tree by using brute force and backtracking.
Bo琀琀om-up Parser:
● Bo琀琀om-up Parser is the parser that generates the parse tree for the given input string
with the help of grammar produc琀椀ons by compressing the terminals.
● It starts from terminals and ends on the start symbol. It uses the reverse of the rightmost
deriva琀椀on.
● Further Bo琀琀om-up parser is classi昀椀ed into two types: LR parser, and Operator precedence
parser.
1. LR Parser:
● LR parser is the bo琀琀om-up parser that generates the parse tree for the given string by using
unambiguous grammar.
● It follows the reverse of the rightmost deriva琀椀on.
● LR parser is of four
types: (a)LR(0)
(b) SLR(1)
(c) LALR(1)
(d) CLR(1)
Operator grammar: A grammar is said to be operator grammar if there does not exist any produc琀椀
on rule on the right-hand side.
1. as ε(Epsilon)
2. Two non-terminals appear consecu琀椀vely, that is, without any terminal b/w them
operator precedence parsing is not a simple technique to apply to most the language constructs, but
it evolves into an easy technique to implement where a suitable grammar may be produced.
● It means, if one deriva琀椀on of a produc琀椀on fails, syntax analyzer restarts process using
di昀昀erent rules of same produc琀椀on.
● This technique may process input string more than once to determine right produc琀椀on.
● Top- down parser start from root node (start symbol) and match input string against produc
琀椀on rules to replace them (if matched).
Example:
Bene昀椀ts of Recursive descent parser:
1. Ease of use: Because recursive descent parsing closely mimics the grammar rules of the
language being parsed, it is simple to comprehend and use.
2. Readability: The parsing code is usually set up in a structured and modular way, which makes
it easier to read and maintain.
3. Error repor琀椀ng: Recursive descent parsers can produce descrip琀椀ve error m essages,
which make it simpler to 昀椀nd and detect syntax mistakes in the input.
4. Predictability: The predictable behavior of recursive descent parsers makes the parsing
process determinis琀椀c and clear.
Drawbacks of Recursive descent parser:
2. Ine昀케ciency: In some cases, RDP can su昀昀er from e昀케ciency issues when processing
large grammar or complex input text, especially when compared to other parsing techniques such as
LR parsing.
3. Limita琀椀ons in error recovery: Since RDP generates syntax errors immediately upon
encountering an unexpected token, it can be less e昀昀ec琀椀ve than other parsing techniques at
recovering from errors and con琀椀nuing parsing.
4. Di昀케culty handling ambiguous grammars: RDP may struggle to handle grammars that are
inherently ambiguous or have mul琀椀ple valid parse trees.
Example:
E→T
T→T
*FT
→ id
id
● Shi昀琀 reduce parsing is a process of reducing a string to the start symbol of a grammar.
● Shi昀琀 reduce parsing uses a stack to hold the grammar and an input tape to hold the string.
● Si昀琀 reduce parsing performs the two ac琀椀ons: shi昀琀 and reduce. That’s why it is
known as shi昀琀 reduces parsing.
● At the shi昀琀 ac琀椀on, the current symbol in the input string is pushed to a stack.
● At each reduc琀椀on, the symbols will replaced by the non-terminals. The symbol is the
right side of the produc琀椀on and non-terminal is the le昀琀 side of the produc琀椀on.
LR parsing:
● LR parser is the bo琀琀om-up parser that generates the parse tree for the given string by
using unambiguous grammar.
● It follows the reverse of the rightmost deriva琀椀on.
● LR parser is of four
types: (a)LR(0)
(b)SLR(1)
(C)LALR(1)
(d)CLR(1)
LR(1) – LR Parser:
• Operator precedence parser generates the parse tree from given grammar and string
but the only condi琀椀on is two consecu琀椀ve non-terminals and epsilon never
appears on the right-hand side of any produc琀椀on.
• The operator precedence parsing techniques can be applied to Operator grammars.
Operator grammar: A grammar is said to be operator grammar if there does not exist any produc琀椀
on rule on the right-hand side.
1. As ε(Epsilon)
2. Two non-terminals appear consecu琀椀vely, that is, without any terminal b/w them operator
precedence parsing is not a simple technique to apply to most the language constructs, but it
evolves into an easy technique to implement where a suitable grammar may be produced.