0% found this document useful (0 votes)
66 views

PROGRAMMING LANGUAGE

Uploaded by

caliburnrosewood
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

PROGRAMMING LANGUAGE

Uploaded by

caliburnrosewood
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 161

SA 1 PROGRAMMING LANGUAGE

FA 1 PROGRAMMING LANG – 13/15

1. FORTRAN is originally developed by IBM in the 1950s for scientific and engineering applications.

Group of answer choices

True

2. A program that converts high-level language to assembly language

COMPILER

3. In C programming language , help you understand unions, arrays & pointers, separate compilation, var args, catch and
throw.

True
4. An interpreter, like a compiler, translates high-level language into low-level machine language

True

5. A preprocessor is generally considered as a part of compiler, is a tool that produces input for compilers.

True

6. What is the Language developed for backing of a powerful sponsor?

(COBOL, PL/1, Ada, Visual Basic)

7. A programming language that is wide dissemination at minimal cost like Pascal and Java.

True

8. An Interpreter reads the whole source code at once, creates tokens, checks semantics, generates intermediate code,
executes the whole program and may involve many passes.

False (COMPILER)

9. A compiler hides further steps; a pre-processor does not hide steps.

False

10. The recursive rules known as parsing define the ways in which these constituents combine.

False (context-free grammar)

11. The main C language compiler produces .NET Common Intermediate Language (CIL), which is then translated into
machine code immediately prior to execution.

True

12. The interpreter can be figured out at compile time.

False (COMPILER)

13. The parsing organizes tokens into a parse tree that represents higher-level constructs in terms of their constituents.

True

14. The __________Analyzer breaks the sentence into tokens.

LINEAR

15. Compilation is translation from one language into another, with full analysis of the meaning of the input. TRUE
FA 2 PROGRAMMING LANG – 15/15
FA 3 PROGRAMMING LANGUAGE
PROGRAMMING
LANGUAGES
Module 1
INTRODUCTION TO
LANGUAGES
What makes a language successful?
• easy to learn (BASIC, Pascal, LOGO, Scheme)
• easy to express things, easy use once fluent, "powerful” (C,
Common Lisp, APL, Algol-68, Perl)
• easy to implement (BASIC, Forth)
• possible to compile to very good (fast/small) code (Fortran)
• backing of a powerful sponsor (COBOL, PL/1, Ada, Visual
Basic)
• wide dissemination at minimal cost (Pascal, Turing, Java)
Why do we have programming languages? What is a
language for?
• way of thinking -- way of expressing algorithms
• languages from the user's point of view
• abstraction of virtual machine -- way of specifying
what you want
• the hardware to do without getting down into the bits
• languages from the implementor's point of view
Help you choose a language.

• C vs. Modula-3 vs. C++ for systems programming


• Fortran vs. APL vs. Ada for numerical computations
• Ada vs. Modula-2 for embedded systems
• Common Lisp vs. Scheme vs. ML for symbolic data
manipulation
• Java vs. C/CORBA for networked PC programs
Ada is a modern programming language designed for large, long-lived
applications – and embedded systems in particular – where reliability and
efficiency are essential. It was originally developed in the early 1980s (this
version is generally known as Ada 83).

Fortran (formerly FORTRAN, derived from "Formula Translation") is a


general-purpose, imperative programming language that is especially suited
to numeric computation and scientific computing. Originally developed by
IBM in the 1950s for scientific and engineering applications.
Modula-3 is a programming language conceived as a successor to an
upgraded version of Modula-2 known as Modula-2+. While it has been
influential in research circles (influencing the designs of languages such
as Java, C#, and Python) it has not been adopted widely in industry.

Pascal is an imperative and procedural programming language, which


Niklaus Wirth designed in 1968–69 and published in 1970, as a small,
efficient language intended to encourage good programming practices
using structured programming and data structuring.
Scheme is a functional programming language and one of the two
main dialects of the programming language Lisp.

Cobra is a general-purpose, object-oriented programming


language. It supports both static and dynamic typing.

ALGOL 68 (short for ALGOrithmic Language 1968) is an imperative


computer programming language that was conceived as a successor to
the ALGOL 60 programming language, designed with the goal of a much
wider scope of application and more rigorously defined syntax and
semantics.
Common Lisp (CL) is a dialect of the Lisp
programming language, published in ANSI standard
document.

ML is a general-purpose functional programming


language developed by Robin Milner and others in the early
1970s at the University of Edinburgh. ML stands
for MetaLanguage.
Make it easier to learn new languages some languages are
similar; easy to walk down family tree
• concepts have even more similarity; if you think in terms of
iteration, recursion, abstraction (for example), you will find it
easier to assimilate the syntax and semantic details of a new
language than if you try to pick it up in a vacuum. Think of an
analogy to human languages: good grasp of grammar makes
it easier to pick up new languages.
Help you make better use of whatever
language you use
• understand unintelligible features:
• In C, help you understand unions, arrays & pointers, separate compilation, var
args, catch and throw
• In Common Lisp, help you understand first-class functions/closures, streams,
catch and throw, symbol internals
Help to make better use of whatever language you use
• understand implementation costs: choose between alternative
ways of doing things, based on knowledge of what will be done
underneath:
• use simple arithmetic equal
• avoid call by value with large data items in Pascal
• avoid the use of call by name in Algol 60
• choose between computation and table lookup (e.g. for cardinality
operator in C or C++)
Help you make better use of whatever language you use
• figure out how to do things in languages that don't support
them clearly:
• lack of suitable control structures in Fortran
• use comments and programmer discipline for control
structures
• lack of recursion in Fortran
Help to make better use of whatever language you use
• figure out how to do things in languages that don't support them
clearly:
• lack of named constants and enumerations in Fortran
• use variables that are initialized once, then never changed
• lack of modules in C and Pascal use comments and
programmer discipline
A Compiler is a program that converts high-level language to assembly
language.
An Assembler is a program that converts the assembly language to
machine-level language. An assembler translates assembly language
programs into machine code. The output of an assembler is called an object
file, which contains a combination of machine instructions as well as the data
required to place these instructions in memory.
A Preprocessor, generally considered as a part of compiler, is a tool that
produces input for compilers. It deals with macro-processing, augmentation,
file inclusion, language extension, etc.
An interpreter, like a compiler, translates high-level language into low-level
machine language.
The difference lies in the way they read the source code or input.
A compiler reads the whole source code at once, creates tokens, checks
semantics, generates intermediate code, executes the whole program and
may involve many passes.
In contrast, an interpreter reads a statement from the input, converts it to
an intermediate code, executes it, then takes the next statement in
sequence. If an error occurs, an interpreter stops execution and reports it;
whereas a compiler reads the whole program even if it encounters several
errors.
Linker is a computer program that links and merges various object files
together in order to make an executable file.

All these files might have been compiled by separate assemblers.

The major task of a linker is to search and locate referenced


module/routines in a program and to determine the memory location where
these codes will be loaded, making the program instruction to have absolute
references.
Pure Interpretation
• Interpreter stays around for the execution of the program
• Interpreter is the point of control during execution
Interpretation:
• Greater flexibility
• Better diagnostics (error messages)

Compilation
• Better performance
Most language implementations include a mixture of
both compilation and interpretation
Note that compilation does NOT have to produce machine
language for some sort of hardware
Compilation is translation from one language into another,
with full analysis of the meaning of the input
Compilation entails semantic understanding of what is being
processed;
A pre-processor will often let errors through. A compiler
hides further steps; a pre-processor does not
Implementation strategies:
• Preprocessor
• Looks for compiler directives (e.g., #include )
• Removes comments and white space
• Groups characters into tokens (keywords, identifiers,
numbers, symbols)
• Identifies higher-level syntactic structures (loops,
subroutines)
Implementation strategies:

• Library of Routines and Linking


• Compiler uses a linker program to merge the appropriate
library of subroutines (e.g., math functions such as sin,
cos, log, etc.) into the final program:
Compilation refers to the processing of source code files (.c, .cc, or .cpp)
and the creation of an 'object' file.
This step doesn't create anything the user can actually run. Instead, the
compiler merely produces the machine language instructions that
correspond to the source code file that was compiled.
For instance, if you compile (but don't link) three separate files, you will have
three object files created as output, each with the name <filename>.o or
<filename>.obj (the extension will depend on your compiler).
Each of these files contains a translation of your source code file into a machine language
file -- but you can't run them yet! You need to turn them into executables your operating
system can use. That's where the linker comes in.
Implementation strategies:
• Post-compilation Assembly
• Facilitates debugging (assembly language easier for
people to read)
• Isolates the compiler from changes in the format of
machine language files (only assembler must be changed,
is shared by many compilers)
Implementation strategies:
• The C Preprocessor (conditional compilation)
• Preprocessor deletes portions of code, which allows
several versions of a program to be built from the same
source
Implementation strategies:
• DYNAMIC AND JUST-IN-TIME COMPILATION
• In some cases a programming system may deliberately delay compilation
until the last possible moment.
• The Java language definition defines a machine-independent
intermediate form known as byte code. Byte code is the standard
format for distribution of Java programs.

• The main C# compiler produces .NET Common Intermediate


Language (CIL), which is then translated into machine code
immediately prior to execution.
Implementation strategies:

• Assembly-level instruction set is not implemented in


hardware; it runs on an interpreter.
• Interpreter is written in low-level instructions (microcode or
firmware), which are stored in read-only memory and
executed by the hardware.
TOOLS
Analysis of Language1
Synthesis of Language 2
ANALYSIS SYNTHESIS

INTERMEDIATE
CODE
LEXICAL GENERATION
ANALYSIS

CODE
OPTIMIZATION
SYNTAX
ANALYSIS

TARGET CODE
GENERATION
SEMANTIC
ANALYSIS
Overview:
*
Scanning:
• divides the program into "tokens", which are the smallest meaningful
units; this saves time, since character-by-character processing is slow
• we can tune the scanner better if its job is simple; it also saves
complexity (lots of it) for later stages
• you can design a parser to take characters instead of tokens as input,
but it isn't pretty
• scanning is recognition of a regular language, e.g., via DFA
Parsing is recognition of a context-free language, e.g.,
via PDA
• Parsing discovers the "context free" structure of the
program
• Informally, it finds the structure you can describe with
syntax diagrams (the "circles and arrows" in a Pascal
manual)
Lexical Analyzer or Linear Analyzer breaks the sentence into
tokens. For Example following assignment statement :-
position = initial + rate * 60
Would be grouped into the following tokens:
1. The identifier position.
2. The assignment symbol =.
3. The identifier initial.
4. The plus sign.
5. The identifier rate.
6. The multiplication sign.
7. The number 60
SYMBOL TABLE: POSITION Id1 & attributes

INITIAL Id2 & attributes

RATE Id3 & attributes

An expression of the form :


Position =Initial +60*Rate
gets converted to → id1 = id2 +60*id3
So the Lexical Analyzer symbols to an array of easy
to use symbolic constants (TOKENS). Also, it removes
spaces and other unnecessary things like comments etc.
Semantic analysis is the discovery of meaning in the
program
• The compiler actually does what is called STATIC semantic
analysis. That's the meaning that can be figured out at
compile time
• Some things (e.g., array subscript out of bounds) can't be
figured out until run time. Things like that are part of the
program's DYNAMIC semantics
Intermediate form (IF) done after semantic analysis (if the
program passes all checks)
• IFs are often chosen for machine independence, ease of
optimization, or compactness (these are somewhat
contradictory)
• They often resemble machine code for some imaginary
idealized machine; e.g. a stack machine, or a machine with
arbitrarily many registers
• Many compilers actually move the code through more than
one IF
Optimization takes an intermediate-code program and
produces another one that does the same thing faster, or
in less space
• The optimization phase is optional
Code generation phase produces assembly language or
(sometime) relocatable machine language
Certain machine-specific optimizations (use of special
instructions or addressing modes, etc.) may be performed
during or after target code generation
Symbol table: all phases rely on a symbol table that keeps
track of all the identifiers in the program and what the
compiler knows about them
• This symbol table may be retained (in some form) for use by
a debugger, even after compilation has completed
Lexical and Syntax Analysis
• GCD Program (Pascal)
Lexical and Syntax Analysis
• GCD Program Tokens
• Scanning (lexical analysis) and parsing recognize the structure of
the program, groups characters into tokens, the smallest
meaningful units of the program
Lexical and Syntax Analysis
• Context-Free Grammar and Parsing
• Parsing organizes tokens into a parse tree that represents
higher-level constructs in terms of their constituents
• Potentially recursive rules known as context-free grammar
define the ways in which these constituents combine
McGrath, Mike (2017). C++ Programming. In Easy Steps Limited.
Perkins, Benjamin (2016). Beginning Visual C# 2015 programming.
WroxSteve, Tale (2016). C++.
Chopra, R. (2015). Principles of Programming Languages. New Delhi: I.K. International
Publishing.
Kumar, Sachin (2015).Principles of programming Languages. S. K. Kataria& Sons.

https://www.computerhope.com/jargon/p/programming-language.htm
https://www.tutorialspoint.com/compiler_design/compiler_design_syntax_analysis.htm
https://www.cs.iusb.edu/~dvrajito/teach/c311/c311_3_scope.html
https://www.tutorialspoint.com/compiler_design/compiler_design_semantic_analysis.htm
https://cs.lmu.edu/~ray/notes/controlflow/
https://teachcomputerscience.com/programming-data-types/
https://algs4.cs.princeton.edu/12oop/
PROGRAMMING
LANGUAGES
Module 2
LEXICAL AND SYNTAX
ANALYSIS
•The word “lexical” in the traditional sense means “pertaining to
words”. In terms of programming languages, words are objects like
variable names, numbers, keywords etc.

•Such words are traditionally called tokens.


•A lexical analyser, or lexer for short, will as its input take a string of
individual letters and divide this string into tokens.
•Additionally, it will filter out whatever separates the tokens (the so-
called white-space), i.e., lay-out characters (spaces, newlines etc.) and
comments.
• Lexical analyzer: reads input characters and produces a
sequence of tokens as output (nexttoken()).
– Trying to understand each element in a program.
– Token: a group of characters having a collective meaning.
const pi = 3.14159;
Token 1: (const)
Token 2: (identifier, 'pi’)
Token 3: (=)
Token 4: (realnumber, 3.14159)
Token 5: (;)
Figure:

Source
token
program
Lexical parser
analyzer
Nexttoken()

symbol
table
– Token: a group of characters having a collective
meaning.
– A lexeme is a particular instant of a token.
• E.g. token: identifier, lexeme: pi, etc.
– pattern: the rule describing how a token can be
formed.
• E.g: identifier: ([a-z]|[A-Z]) ([a-z]|[A-Z]|[0-9])*
• Lexical analyzer does not have to be an individual
phase. But having a separate phase simplifies the
design and improves the efficiency and portability.
– How to specify tokens (patterns)?
– How to recognize the tokens giving a token specification (how to
implement the nexttoken() routine)?
• How to specify tokens:
– all the basic elements in a language must be
tokens so that they can be recognized.
• Token types: constant, identifier, reserved word,
operator and misc. symbol.

– Tokens are specified by regular expressions.


For lexical analysis, specifications are traditionally written using
regular expressions:
Regular Expressions is an algebraic notation for describing sets of
strings. The generated lexers are in a class of extremely simple
programs called finite automata.
Regular expressions have the capability to express finite languages by
defining a pattern for finite strings of symbols. The grammar defined by
regular expressions is known as regular grammar. The language
defined by regular grammar is known as regular language.
Examples of regular definitions for letters, digits and identifiers
There are a number of algebraic laws that are obeyed by regular
expressions, which can be used to manipulate regular expressions into
equivalent forms.
1. Operations:
The various operations on languages are:
• Union of two languages L and M is written as
L U M = {s | s is in L or s is in M}
• Concatenation of two languages L and M is written as LM = {st | s is in
L and t is in M}
• The Kleene Closure of a language L is written as
L* = Zero or more occurrence of language L.
2. Notations
If r and s are regular expressions denoting the languages L(r) and L(s),
then
✓ Union : (r)|(s) is a regular expression denoting L(r) U L(s)
✓ Concatenation : (r)(s) is a regular expression denoting L(r)L(s)
✓ Kleene closure : (r)* is a regular expression denoting (L(r))*
✓ (r) is a regular expression denoting L(r)
3. Precedence and Associativity
✓ *, concatenation (.), and | (pipe sign) are left associative
✓ * has the highest precedence
✓ Concatenation (.) has the second highest precedence.
✓ | (pipe sign) has the lowest precedence of all.
4. Representing valid tokens of a language in regular expression
If x is a regular expression, then:
• x* means zero or more occurrence of x.
i.e., it can generate { e, x, xx, xxx, xxxx, … }
• x+ means one or more occurrence of x.
i.e., it can generate { x, xx, xxx, xxxx … } or x.x*
• x? means at most one occurrence of x
i.e., it can generate either {x} or {e}.
• A|B| · · · |Z|a|b| · · · |z or [a-z] or [A-Z] is all lower-case alphabets of
English language.
• 0|1| · · · |9 or [0-9] is all natural digits used in mathematics.
• ident = letter(letter|digit)∗
The only problem left with the lexical analyzer is how to verify the validity
of a regular expression used in specifying the patterns of keywords of a
language. A well-accepted solution is to use finite automata for
verification.
Finite Automata is a state machine that takes a string of symbols as input
and changes its state accordingly.
• Finite automata is a recognizer for regular expressions.
• When a regular expression string is fed into finite automata, it changes
its state for each literal. processed and the automata reaches its final
state, it is accepted,
• i.e., the string just fed was said to be a valid token of the language in
hand.
The mathematical model of finite automata consists of:
✓ Finite set of states (Q) (q0, q1, q2…)
✓ Finite set of input symbols (Σ)
✓ One Start state (q0)
✓ Set of final states (qf)
✓ Transition function (δ)
Vertices – represents the states
Edges – represents transitions
Labels on vertices – name of the states
Labels on edges – are current values of the input symbol
M DFA M = (Q, Σ, δ, q0, f)
Example :
1. We assume FA accepts any three digit binary value ending in digit 1.

FA = {Q(q0, qf), Σ(0,1), q0, qf, δ}

Derive the Transition table (?)


2. Strings ending with 100
M = {Q, Σ, δ, q0, qf}
q0 = initial state
q1 = strings ending with 1
q2 = strings ending with 10
q3 = strings ending with 100
L = { 100, 10100, 111001100, 00011101100 . . .)
(draw the DFA diagram and the transition table)
3. DFA that accepts a’s and b’s as input and aa will be
substring.
L = {ababaab, abaab, aabb, bbaaab…)

4.{w │ w has even length}

5.The language 0* 1* 0+ with three states.


– alphabet : a finite set of symbols. E.g. {a, b, c}
– A string over an alphabet is a finite sequence of
symbols drawn from that alphabet (sometimes a
string is also called a sentence or a word).
– A language is a set of strings over an alphabet.
– Operation on languages (a set):
union of L and M, L U M = {s|s is in L or s is in M}
A sentence is a noun <S> ::= <NP> <V> <NP>
phrase, a verb, and a
noun phrase.

A noun phrase is an <NP> ::= <A> <N>


article and a noun.

A verb is… <V> ::= loves | hates|eats

An article is… <A> ::= a | the

A noun is... <N> ::= dog | cat | rat


• The grammar is a set of rules that say how to build a tree—
a parse tree
• You put <S> at the root of the tree
• The grammar’s rules say how children can be added at any
point in the tree
• For instance, the rule

says you can add nodes <NP>, <V>, and <NP>, in that
order, as children of <S>
ParseTree:
<S>

<NP> <V> <NP>

<A> <N> loves <A> <N>

the dog the cat


<exp> ::= <exp> + <exp> | <exp> * <exp> | ( <exp> )| a | b | c

• An expression can be the sum of two expressions, or the


product of two expressions, or a parenthesized
subexpression
• Or it can be one of the variables a, b or c
<exp>
Parse Tree:
( <exp> )

((a+b)*c)
<exp> * <exp>

( <exp> ) c

<exp> + <exp>

a b
Start symbol
<S> ::= <NP> <V> <NP>

a production <NP> ::= <A> <N>

<V> ::= loves | hates|eats

<A> ::= a | the

<N> ::= dog | cat | rat


non-terminal
symbols

tokens
Syntax - It is defined as the grammar in each
programming language and it is set of valid grammar
in each language.
A syntax analyzer or parser takes the input from a
lexical analyzer in the form of token streams.
The parser analyzes the source code (token stream)
against the production rules to detect any errors in
the code. The output of this phase is a parse tree.
A syntax analyzer or parser is a program that groups
sequences of tokens from the lexical analysis phase
into phrases each with an associated phrase type.
A phrase is a logical unit with respect to the rules of
the source language.
For example, consider:

a := x * y + z
After lexical analysis, this statement has the
structure:

id1 assign id2 binop1 id3 binop2 id4


*

This way, the parser accomplishes two tasks, i.e., parsing the code, looking for errors, and
generating a parse tree as the output of the phase.
Parsers are expected to parse the whole code even if some errors exist in the program.
Parsers use error recovering strategies.
• A parser should be able to detect and report any error in the
program.

• It is expected that when an error is encountered, the parser should


be able to handle it and carry on parsing the rest of the input.

• Mostly it is expected from the parser to check for errors but errors
may be encountered at various stages of the compilation process.
A program may have the following kinds of errors at various
stages:
Lexical : name of some identifier typed incorrectly

Syntactical : missing semicolon or unbalanced parenthesis

Semantical : incompatible value assignment

Logical : code not reachable, infinite loop


ERROR RECOVERY STRATEGIES
Panic mode

When a parser encounters an error anywhere in the


statement, it ignores the rest of the statement by not
processing input from erroneous input to delimiter, such
as semi-colon. This is the easiest way of error-recovery
and also, it prevents the parser from developing infinite
loops.
Panic mode

When a parser encounters an error anywhere in the


statement, it ignores the rest of the statement by not
processing input from erroneous input to delimiter, such as
semi-colon. This is the easiest way of error-recovery and
also, it prevents the parser from developing infinite loops.
Statement mode

When a parser encounters an error, it tries to take


corrective measures so that the rest of inputs of
statement allow the parser to parse ahead. For example,
inserting a missing semicolon, replacing comma with a
semicolon etc. Parser designers have to be careful here
because one wrong correction may lead to an infinite
loop.
Error productions

Some common errors are known to the compiler


designers that may occur in the code. In addition, the
designers can create augmented grammar to be used, as
productions that generate erroneous constructs when
these errors are encountered.
Global correction

The parser considers the program in hand as a whole


and tries to figure out what the program is intended to
do and tries to find out a closest match for it, which is
error-free.

Choose a minimal sequence of changes to obtain a


global least- cost correction.
*
One of the major roles of the parser is to produce an
intermediate representation (IR) of the source
program using syntax-directed translation methods
•Possible IR output:
–Abstract syntax trees (ASTs)
A good compiler should assist in identifying and locating errors
– Lexical errors: important, compiler can easily recover
and continue
– Syntax errors: most important for compiler, can almost
always recover
– Static semantic errors: important, can sometimes
recover
– Dynamic semantic errors: hard or impossible to detect
at compile time, runtime checks are required
– Logical errors: hard or impossible to detect
The viable-prefix property of parsers allows
early detection of syntax errors
–Goal: detection of an error as soon as possible without
further consuming unnecessary input
–How: detect an error as soon as the prefix of the input
does not match a prefix of any string in the language
Top-down (C-F grammar with restrictions)
–Recursive descent (predictive parsing)
–LL (Left-to-right, Leftmost derivation) methods
Bottom-up (C-F grammar with restrictions)
–Operator precedence parsing
–LR (Left-to-right, Rightmost derivation)
methods
A CFG is ambigiuous
if there is at least one string
in L(G) which has more than
one derivation tree.
McGrath, Mike (2017). C++ Programming. In Easy Steps Limited.
Perkins, Benjamin (2016). Beginning Visual C# 2015 programming.
WroxSteve, Tale (2016). C++.
Chopra, R. (2015). Principles of Programming Languages. New Delhi: I.K. International
Publishing.
Kumar, Sachin (2015).Principles of programming Languages. S. K. Kataria& Sons.

https://www.computerhope.com/jargon/p/programming-language.htm
https://www.tutorialspoint.com/compiler_design/compiler_design_syntax_analysis.htm
https://www.cs.iusb.edu/~dvrajito/teach/c311/c311_3_scope.html
https://www.tutorialspoint.com/compiler_design/compiler_design_semantic_analysis.htm
https://cs.lmu.edu/~ray/notes/controlflow/
https://teachcomputerscience.com/programming-data-types/
https://algs4.cs.princeton.edu/12oop/
PROGRAMMING
LANGUAGES
Module 3
NAME, SCOPE AND
BINDING
A name is a mnemonic character string representing
something else
Name: Identifiers that allow us to refer to variables,
constants, functions, types, operations, and so on
x, sin, f, prog1, null? are names
• Most names are identifiers
Names enable programmers to refer
to variables, constants, operations, and types using
identifier names rather than low-level hardware components
A binding is an association between two things, such as a
name and the thing it names
• e.g. Name of an object and the object
Many properties of a programming language are defined
during its creation.
For instance:
• the meaning of key words such as while or for in C,
• the size of the integer data type in Java, are properties
defined at language design time.
Binding:

• A BINDING is an association, such as between an attribute


and an entity or between an operation and a symbol
• Name and memory location (for variables)
• Name and function
• Typically a binding is between a name and the object it
refers to.
The scope of a binding
is the part of the program
or time interval(s) in the
program’s execution during
which the binding is active.
Binding Time is the point at which a binding is
created or, more generally, the point at which any
implementation decision is made.
The time at which a binding takes place is called binding
time
• language design time
the design of specific program constructs
(syntax), primitive types, and meaning
(semantics)
• - language implementation time
• fixation of implementation constants such as
numeric precision
• max identifier name length
• run-time memory sizes
- program writing time
the programmer's choice of algorithms and data
structures
- compile time
• the time of translation of high-level constructs to
machine code and choice of memory layout for
object
-link time
• the time at which multiple object codes (machine
code files) and libraries are combined into one
executable
- load time
• the time at which the operating system loads the
executable in memory
• choice of physical addresses

-run time
• the time during which a program executes (runs)
Binding Time Sample:
Language feature Binding time
Syntax, e.g. if (a>0) b:=a; in C or
Language design
if a>0 then b:=a end if in Ada
Keywords, e.g. class in C++ and Java Language design
Reserved words, e.g. main in C and
Language design
writeln in Pascal
Meaning of operators, e.g. + (add) Language design
Primitive types, e.g. float
Language design
and struct in C
Internal representation of literals,
Language implementation
e.g. 3.1 and "foo bar"
The specific type of a variable in a
Compile time
C or Pascal declaration
Binding Time Sample:
Language design,
Storage allocation method for a variable language implementation,
and/or compile time
Linking calls to static library routines,
Linker
e.g. printf in C
Merging multiple object codes
Linker
into one executable
Loading executable in memory
Loader (OS)
and adjusting absolute addresses
Nonstatic allocation of space for variable Run time
The terms STATIC and DYNAMIC are generally used to refer
to things bound before run time and at run time,
respectively
• “static” is a coarse term; so is "dynamic“

IT IS DIFFICULT TO OVERSTATE THE IMPORTANCE OF


BINDING TIMES IN PROGRAMMING LANGUAGES
BINDING OF ATTRIBUTES TO VARIABLES

•A binding is static if it occurs before run time and


remains unchanged throughout program execution.

• If it occurs during run time or can change in the course


of program execution, it is called dynamic.
In general, early binding times are associated with greater
efficiency
Later binding times are associated with greater flexibility
Compiled languages tend to have early binding times
Interpreted languages tend to have later binding times
Today, it talk about the binding of identifiers to the variables
they name
Importance of Binding Time:
Early binding
• Faster code
• Typical in compiled languages
Late binding
• Greater flexibility
• Typical in interpreted languages
Scope Rules - control bindings
• Fundamental to all programming languages is the ability
to name data, i.e., to refer to data using symbolic
identifiers rather than addresses
• Not all data is named! For example, dynamic storage in
C or Pascal is referenced by pointers, not names
* •An object has to be stored in memory during its lifetime
•Static objects have an absolute storage address that is retained
throughout the execution of the program
• Global variables
• Subroutine code
• Class method code
•Heap objects may be allocated and deallocated at arbitrary times, but
require an expensive storage management algorithm
•Stack objects are allocated in last-in first-out order, usually in
combination with subroutine calls and returns
•Local variables of a subroutine
•e.g. Java class instances are always stored on the heap
* Heap – is a large area of memory from which the programmer can
allocate blocks as needed, and deallocate them (or allow them to be
garbage collected) when no longer needed.

Object - any entity in the program. Variables, functions..

•Object lifetime - (or life cycle) of an object is the period between the
object creation and destruction.
•Binding lifetime - the period between the creation and destruction
of the binding.
•Usually the binding lifetime is a subset of the object lifetime.
• In many object-oriented languages (OOLs), particularly those
that use garbage collection (GC) – objects are allocated on
the heap.

• the value of a variable holding an object actually corresponds


to a reference to the object, not the object itself, and
destruction of the variable just destroys the reference, not the
underlying object.
Object Destruction
*
• It is generally the case that after an object is used, it is removed
from memory to make room for other programs or objects to take
that object's place.

• However, if there is sufficient memory or a program has a short run


time, object destruction may not occur, memory simply being
deallocated at process termination.

• In some cases object destruction simply consists of deallocating


the memory, particularly in garbage-collected languages, or if the
"object" is actually a plain old data structure.
The period of time from creation to
destruction is called the
LIFETIME of a binding

• If object outlives binding it's garbage

• If binding outlives object it's a dangling reference


Key events
• creation of objects
• creation of bindings
• references to variables (which use bindings)
• (temporary) deactivation of bindings
• reactivation of bindings
• destruction of bindings
• destruction of objects
Storage Allocation mechanisms
• Static
• Stack
• Heap
Static allocation for
• code
• globals
• static or own variables
• explicit constants (including strings, sets, etc)
*
Central stack for
• parameters
• local variables

Why a stack?
• allocate space for recursive routines
(not necessary in FORTRAN – no recursion)
• reuse space
(in all programming languages)
Central stack for
• parameters
• local variables

Why a stack?
• allocate space for recursive routines
(not necessary in FORTRAN – no recursion)
• reuse space
(in all programming languages)
Stack
•very fast access
•don't have to explicitly de-allocate variables
•space is managed efficiently by CPU, memory will not become
fragmented
•local variables only
•limit on stack size (OS-dependent)
•variables cannot be resized
Heap
•variables can be accessed globally
•no limit on memory size
•(relatively) slower access
•no guaranteed efficient use of space, memory may become
fragmented over time as blocks of memory are allocated,
then freed
•you must manage memory (you're in charge of allocating and
freeing variables)
•variables can be resized using realloc()
Local Variables: Stack Allocation
• When we have a declaration of the form “int a;”:
– a variable with identifier “a” and some memory allocated to it
is created in the stack. The attributes of “a” are:
• Name: a
• Data type: int
• Scope: visible only inside the function it is defined,
disappears once we exit the function
• Address: address of the memory location reserved for it.
Note: Memory is allocated in the stack for a even before it
is initialized.
Local Variables: Stack Allocation
• Size: typically 4 bytes
• Value: Will be set once the variable is initialized

•Since the memory allocated for the variable is set in the


beginning itself, we cannot use the stack in cases where
the amount of memory required is not known in advance.
This motivates the need for HEAP
Maintenance of stack
is responsibility of calling sequence
and subroutine prolog and epilog

• space is saved by putting as much in the prolog and epilog


as possible
The epilogue and prologue of a function are simply the set of instructions
that 'set up' the context for the function when it's called and clean up when it
returns.
The prologue typically performs such tasks as:
•saves any registers that the function might use (that are required by the
platform's standard to be preserved across function calls)
•allocates storage on the stack that the function might require for local
variables
•sets up any pointer (or other linkage) to parameters that might be passed
on the stack
The epilogue generally only needs to restore any save registers
and restore the stack pointer such that any memory reserved by
the function for its own use is 'freed'.

The exact mechanisms that might be used in a prologue/epilogue


are dependent on the CPU architecture, the platforms standard,
the arguments and return values of the function, and the
particular calling convention the function might be using.
Heap for dynamic allocation
A scope is a maximal region of the program where no
bindings are destroyed
(e.g., body of a procedure).
In most languages with subroutines, we OPEN a new scope
on subroutine entry:
• create bindings for new local variables,
• make references to variables
A scope in any programming is a region
of the program where a defined variable can
have its existence and beyond that variable
can not be accessed.
Scope of an identifier is the part of the program where the
identifier may directly be accessible. In C, all identifiers
are lexically (or statically) scoped. C scope rules can be covered
under following two categories.

Global Scope: Can be accessed anywhere in a program.


// filename: file1.c
int a;
int main(void)
{
a = 2;
}
int main()
{
{
int x = 10, y = 20;
{ Block Scope: A variable
// The outer block contains declaration of x and y, so
declared in a block is
// following statement is valid and prints 10 and 20
printf("x = %d, y = %d\n", x, y); accessible in the block and all
{ inner blocks of that block, but
// y is declared again, so outer block y is not accessible not accessible outside the
// in this block
int y = 40; block

x++; // Changes the outer block variable x to 11


y++; // Changes this block's variable y to 41

printf("x = %d, y = %d\n", x, y);


}

// This statement accesses only outer block's variables


printf("x = %d, y = %d\n", x, y);
}
}
return 0;
}
Algol 68:
• Algol 68 introduced the term elaboration for the process of
creating bindings when entering a scope. Elaboration time is
a useful concept.

Ada (re-popularized the term elaboration):


• storage may be allocated, tasks started, even exceptions
propagated as a result of the elaboration of declarations
With STATIC (LEXICAL) SCOPE RULES, a scope is defined in terms of
the physical (lexical) structure of the program
• The determination of scopes can be made by the compiler
• Lexical scoping (sometimes known as static scoping ) is a convention
used with many programming languages that sets the scope (range of
functionality) of a variable so that it may only be called (referenced) from
within the block of code in which it is defined.

• The scope is determined when the code is compiled.


Figure:
Programming languages implement
• Static Scoping: active bindings are determined using the text of the
program. The determination of scopes can be made by the compiler.
• Most recent scan of the program from top to bottom
• Most compiled languages employ static scope rules

• Dynamic Scoping: active bindings are determined by the flow of


execution at run time (i.e., the call sequence).
• The determination of scopes can NOT be made by the compiler.
• Dynamic scope rules are usually encountered in interpreted
languages; in particular, early LISP dialects assumed dynamic
scope rules.
The key idea in static scope rules is that bindings are
defined by the physical (lexical) structure of the program.
With dynamic scope rules, bindings depend on the current
state of program execution
• cannot always be resolved by examining the program because
they are dependent on calling sequences
• To resolve a reference, it use the recent, active binding made at
run time
Dynamic scope rules are usually encountered in
interpreted languages
• early LISP dialects assumed dynamic scope rules.

Such languages do not normally have type checking at


compile time because type determination isn't always
possible when dynamic scope rules are in effect
EXAMPLE: STATIC VS. DYNAMIC
program scopes (input, output );
var a : integer;
procedure first;
begin a := 1; end;
procedure second;
var a : integer;
begin first; end;
begin
a := 2; second; write(a);
end.
EXAMPLE: STATIC VS. DYNAMIC

If static scope rules are in effect (as would be the case in


Pascal), the program prints a 1
If dynamic scope rules are in effect, the program prints a 2
Why the difference? At issue is whether the assignment to
the variable a in procedure first changes the variable a
declared in the main program or the variable a declared in
procedure second
EXAMPLE: STATIC VS. DYNAMIC
Static scope rules require that the reference resolve to the most
recent, compile-time binding, namely the global variable a
Dynamic scope rules, on the other hand, require that we choose the
most recent, active binding at run time
• Perhaps the most common use of dynamic scope rules is to provide
implicit parameters to subroutines
• This is generally considered bad programming practice nowadays
• Alternative mechanisms exist
• static variables that can be modified by auxiliary routines
• default and optional parameters
EXAMPLE: STATIC VS. DYNAMIC
At run time it create a binding for a when it enter the main
program.
Then, create another binding for a when it enter procedure
second
• This is the most recent, active binding when procedure first is
executed
• Thus, we modify the variable local to procedure second, not the
global variable
• However, we write the global variable because the variable a local
to procedure second is no longer active
Accessing variables with dynamic scope:
• (1) keep a stack (association list) of all active variables
• When you need to find a variable, hunt down from top
of stack
• This is equivalent to searching the activation records
on the dynamic chain
Accessing variables with dynamic scope:
• (2) keep a central table with one slot for every variable name
• If names cannot be created at run time, the table layout
(and the location of every slot) can be fixed at compile
time
• Otherwise, you'll need a hash function or something to do
lookup
• Every subroutine changes the table entries for its locals at
entry and exit.
(1) gives a slow access but fast calls
(2) gives aslow calls but fast access
In effect, variable lookup in a dynamically-scoped
language corresponds to symbol table lookup in a
statically-scoped language
Because static scope rules tend to be more complicated,
however, the data structure and lookup algorithm also
have to be more complicated
REFERENCING ENVIRONMENT of a statement at run time is the
set of active bindings
A referencing environment corresponds to a collection of scopes
that are examined (in order) to find a binding
SCOPE RULES determine that collection and its order
BINDING RULES determine which instance of a scope should be
used to resolve references when calling a procedure that was
passed as a parameter
• it govern the binding of referencing environments to formal procedures
ALIASING
• What are aliases good for? (consider uses of FORTRAN
equivalence)
• space saving - modern data allocation methods are
better
• multiple representations - unions are better
• linked data structures - legit
• Also, aliases arise in parameter passing as an unfortunate
side effect
• Euclid scope rules are designed to prevent this
OVERLOADING
• some overloading happens in almost all languages
• integer + v. real +
• read and write in Pascal
• function return in Pascal
• some languages get into overloading in a big way
• Ada
• C++
It's worth distinguishing between some closely related concepts

• overloaded functions - two different things with the same name; in C++
• overload norm
int norm (int a){return a>0 ? a : -a;)
complex norm (complex c ) { // ...

• polymorphic functions -- one thing that works in more then one way
• in Modula-2: function min (A : array of integer); …
• in Smalltalk
It's worth distinguishing between some closely related
concepts (2)
• generic functions (modules, etc.) - a syntactic template that
can be instantiated in more than one way at compile time
• via macro processors in C++
• built-in in C++
• in Clu
• in Ada
A language that is easy to compile often leads to
• a language that is easy to understand
• more good compilers on more machines (compare Pascal and Ada!)
• better (faster) code
• fewer compiler bugs
• smaller, cheaper, faster compilers
• better diagnostics
McGrath, Mike (2017). C++ Programming. In Easy Steps Limited.
Perkins, Benjamin (2016). Beginning Visual C# 2015 programming.
WroxSteve, Tale (2016). C++.
Chopra, R. (2015). Principles of Programming Languages. New Delhi: I.K. International
Publishing.
Kumar, Sachin (2015).Principles of programming Languages. S. K. Kataria& Sons.

https://www.computerhope.com/jargon/p/programming-language.htm
https://www.tutorialspoint.com/compiler_design/compiler_design_syntax_analysis.htm
https://www.cs.iusb.edu/~dvrajito/teach/c311/c311_3_scope.html
https://www.tutorialspoint.com/compiler_design/compiler_design_semantic_analysis.htm
https://cs.lmu.edu/~ray/notes/controlflow/
https://teachcomputerscience.com/programming-data-types/
https://algs4.cs.princeton.edu/12oop/

You might also like