C
C
ORIENTED PROGRAMMING 3 0 0 3
UNIT I PRINCIPLES OF OBJECT ORIENTED PROGRAMMING 9
Introduction- Tokens-Expressions-contour Structures Functions in C++, classes and
objects, constructors and destructors, operators overloading and type conversions.
UNIT II ADVANCED OBJECT ORIENTED PROGRAMMING 9
Inheritance, Extending classes, Pointers, Virtual functions and polymorphism, File
Handling, Templates, Exception handling, Manipulating strings.
UNIT III DATA STRUCTURES & ALGORITHMS 9
Algorithm, Analysis, Lists, Stacks and queues, Priority queues-Binary Heap-Application,
Heaps, skew heaps, Binomial hashing-hash tables without linked lists
UNIT IV NONLINEAR DATA STRUCTURES 9
Trees-Binary trees, search tree ADT, AVL trees splay Trees, B-trees, Sets and maps in
standard Library, Graph Algorithms-Topological sort, shortest path algorithm network
flow problems-minimum spanning tree applications of depth-first-search-Introduction to
NP - completeness.
UNIT V SORTING AND SEARCHING 9
Sorting Insertion sort, Shell sort, Heap sort, Merge sort, Quick sort, Indirect sorting,
Bucket sort, External sorting, Disjoint set class, Algorithm Design Techniques Greedy
algorithm, Divide and Conquer, Dynamic Programming, Randomized Algorithm, Back
tracking algorithm.
TOTAL: 45 PERIODS
TEXT BOOKS:
1. Mark Allen Weiss, Data Structures and Algorithm Analysis in C, 3rd ed,
Pearson Education Asia, 2007.
2. E. Balagurusamy, Object Oriented Programming with C++, McGraw Hill
Company Ltd., 2007.
REFERENCES:
1. Michael T. Goodrich, Data Structures and Algorithm Analysis in C++, Wiley
2. Sahni, Data Structures Using C++, The McGraw-Hill, 2006
3. Sourav Sahay, object oriented programming with C++, Oxford University Press,.
4. Seymour, Data Structures, The McGraw-Hill, 2007.
CS2068Unit 1 Notes
Page 1 of 16
Object Oriented Programming (OOP)
OOP is an approach that provides a way of modularizing programs by creating partitioned
memory area for both data and functions that can be used as templates for creating copies of
such modules on demand. The striking features of OOP are:
Data structures characterize real time objects
Data and functions that operate on data are tied together
Data is hidden from external access, hence secure.
Bottom-up approach in program design
Advantage of OOP vs Structured
In structured programming, critical data items are placed as global. Global data is more
vulnerable to an inadvertent change by a function as data moves around from function to
function. In OOP, data is treated as a critical element and is tied to a set of functions that
operate on it, thereby secured from external access. OOP model the real world far better than
structured programming.
Core OOP Features
1. Classes and ObjectsData and functions are put together in a single unit called class
exhibiting data hiding. Objects are runtime entities and instance of a class.
2. InheritanceProcess by which objects of one class acquire the properties of another
class enabling reusability with the possibility of adding new features or redefining
existing features.
3. PolymorphismAn operation exhibiting different behaviors in different instances
Some of the OOP languages are C++, J ava, Objective C, Objective Pascal, Turbo Pascal, etc.
C++
C++was developed by Bjarne Stroustrup at Bell laboratories in 1980s. C++was standardized
in 1997 by ANSI. C++is a superset of C, i.e., all C programs are also C++programs.
Therefore C keywords, operators, control structure, etc., are also valid in C++. C++programs
have file extension .cpp. The structure of a C++program looks like
Comments
Include files
Class declaration
Member function definitions
Main function
Tokens
Smallest individual units of a program are known as tokens. They are
KeywordsReserved words. Eg: class, const, bool, long
IdentifiersNames of variables, functions, arrays, classes. Rules as in C
Constantsfixed values that do not change during program execution
Operators >> << :: ::* ->* .* new delete endl setw
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 1 Notes
Page 2 of 16
Data types
C++addsclass, bool andreference to C data types. The storage requirements are as in C.
The qualifiers short, long, signed and unsigned are also applicable.
Built-in Derived User-defined
int
char
void
float
double
bool
array
function
pointer
reference
structure
union
enum
class
Expressions
1. Arithmetic expressions 20+5/2.0 m*n2 x*y/2
2. Pointer expressions ptr++
3. Relational expressions x <=y
4. Logical expressions a>b && a>c
5. Bitwise expressions x <<3
Comments
C++introduces a single line comment symbol //. Any text after // till the end of line is treated
as a comment.
Variable Declaration
Variables can be declared anywhere but must be prior to their usage. Dynamic initialization
of variables is possible in C++. Variables declared within a block are local variables and are
known only to that block. Variables that are not declared within any block are global
variables. Global/Outer block variables are accessed within inner block using scope
resolution :: operator.
:: variable
Symbolic Constants
The qualifier const is used to create symbolic constants in C++.
const datatype vaname = value;
const f l oat pi = 3. 14;
Console I/O
The standard output stream is represented by predefined object cout and contents are
streamed using theinsertion operator <<.
cout << " Sumi s " << sum<< " \ n";
The input stream is represented by object cin and input is obtained by using >>extraction
operator. The objects cin and cout are defined in header fileiostream
ci n >> a >> b;
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 1 Notes
Page 3 of 16
Type casting
Explicit conversion of variables or expression from one built-in data type to another is known
as type casting.
datatype (expression)
aver age = sum/ f l oat ( n) ;
main() function
/ / Si mpl e C++ pr ogr am
#i ncl ude <i ost r eam>
usi ng namespace st d;
i nt mai n( )
{
i nt a, b;
cout << " Ent er val ues of a and b : ";
ci n >> a >> b;
f l oat avg = f l oat ( a+b) / 2;
cout << " \ nSumi s : " << sum;
r et ur n 0;
}
Memory management operators
C++supports unary operators new and delete that perform dynamic memory allocation and
free memory after use. It takes the following general form
ptrvar =newdatatype ;
ptrvar =newdatatype [size] ;
i nt *p = new i nt [ 10] ; / / Cr eat es an ar r ay of 10 i nt eger s
When data object is no longer needed, it is destroyed to release memory space for reuse.
deleteptrvar ;
delete[] ptrvar ;
del et e [ ] p; / / del et es t he ent i r e ar r ay
The advantages of usingnew over malloc are:
Automatically computes size of the data object
Automatically returns the correct pointer type
new and delete can be overloaded
i nt mai n( )
{
r et ur n 0;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 1 Notes
Page 4 of 16
Control Structure
Sequence
Selection (if, switch)
Looping (while, do while, for)
The syntax and usage of if, switch, while, do while and for is same as in C.
if-else switch while do-while for
if (condition)
{
...
}
else
{
...
}
switch (expression)
{
caselabel1:
...
break;
...
default:
...
}
while (condition)
{
}
do
{
} while (condition) ;
for (init; condition; inc/dec)
{
.
}
Function Prototyping
Functions enable modularity and enhance debugging. Function prototyping was introduced in
C++and is mandatory. Prototyping describes the function interface to the compiler.
Reference Variable
A reference variable is an alias for an already defined variable and must be initialized at the
time of declaration itself. It is preceded by an& in the declaration.
datatype &refname = varname
f l oat t ot al =3. 7;
f l oat &sum= t ot al ;
cout << sum;
Passing Parameter
Parameters can be passed to a function in three ways namely by value, by reference and
through pointers. The default is pass by value. Any change on the formal arguments will be
not reflected on the actual arguments
/ / Par amet er passi ng
#i ncl ude <i ost r eam. h>
voi d swapval ( i nt , i nt ) ;
voi d swappt r ( i nt *, i nt *) ;
voi d swapr ef ( i nt &, i nt &) ;
i nt mai n( )
{
i nt a=10, b=20;
swapval ( a, b) ; / / Pass by val ue
cout << " \ nA = " << a << "\ t B = " << b ;
swappt r ( &a, &b) ; / / Pass usi ng poi nt er s
cout << " \ nA = " << a << "\ t B = " << b ;
swapr ef ( a, b) ; / / Pass by r ef er ence
cout << " \ nA = " << a << "\ t B = " << b ;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 1 Notes
Page 5 of 16
voi d swapval ( i nt x, i nt y)
{
i nt t = x;
x = y;
y = t ;
}
voi d swappt r ( i nt *x, i nt *y)
{
i nt t = *x;
*x = *y;
*y = t ;
}
voi d swapr ef ( i nt &x, i nt &y)
{
i nt t = x;
x = y;
y = t ;
}
inline functions
inline functions eliminate the cost of function calls for small routines by way of substitution.
An inline function is a function preceded by keyword inline and is expanded inline when
invoked. It is preferred over macro, since macro is not compiled.
#i ncl ude <i ost r eam. h>
i nl i ne i nt squar e( i nt a)
{
r et ur n ( a*a) ;
}
i nt mai n( )
{
cout << " Squar e = " << squar e( 5) ;
r et ur n 0;
}
The keyword inline is a request and would be ignored if the function contains a loop, goto or
switch statement or if it is recursive.
Default arguments
C++allows calling a function without specifying all its arguments by mentioning the default
value for arguments in the prototype. When a function call is made with insufficient
parameters, the compiler checks the prototype and uses the default value
#i ncl ude <i ost r eam. h>
voi d r epchar ( char =' *' , i nt =45) ;
i nt mai n( )
{
r epchar ( ) ;
r epchar ( ' =' ) ;
r epchar ( ' +' , 30) ;
r et ur n 0;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 1 Notes
Page 6 of 16
voi d r epchar ( char ch, i nt n)
{
f or ( i nt j =0; j <n; j ++)
cout << ch;
cout << endl ;
}
It is not mandatory to specify default value for all parameters but should be assigned from
right to left in sequence. Default values are useful when a parameter takes the same value.
Function overloading
C++permits designing a family of functions with one function name but with different
argument lists known as function overloading or function polymorphism. The list should
differ on type and count. When a call is made, the function to be invoked is determined by
checking the number and type of arguments.
#i ncl ude <i ost r eam. h>
i nt vol ume( i nt ) ; / / cube vol ume
i nt vol ume( i nt , i nt , i nt ) ; / / box vol ume
f l oat vol ume( i nt , i nt ) ; / / cyl i nder vol ume
const f l oat pi = 3. 14;
i nt mai n( )
{
cout << " Cube vol ume : " << vol ume( 5) << "\ n" ;
cout << " Box vol ume : " << vol ume( 9, 3, 4) << " \ n";
cout << " Cyl i nder vol ume : " << vol ume( 5, 6) << " \ n";
r et ur n 0;
}
i nt vol ume( i nt a)
{
r et ur n ( a*a*a) ;
}
i nt vol ume( i nt l , i nt b, i nt h)
{
r et ur n ( l * b * h) ;
}
f l oat vol ume( i nt r , i nt h)
{
r et ur n ( pi * r * r * h) ;
}
In case if there is no exact match for a function call then the compiler tries to find a match by
making integral promotions to actual arguments (char to int, float to double)
using built-in conversion routines
If it does not result in a unique match then ambiguity error is reported.
Classes and Objects
Classes are extension to C structures. A class is a user-defined data type that binds the data
and its associated functions together. Data members are generally declared under private
section and member functions under public section. The class members are private by default
whereas structure members are public by default. Class supports polymorphism & inheritance
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 1 Notes
Page 7 of 16
classclassname
{
private:
datamembers;
public:
memberfunction;
}
The wrapping up of data and functions into a single unit called class is known as data
encapsulation. The private class members can be accessed only from within the class
whereas public members can be accessed from outside the class. The data members under
private section can be accessed only by member functions of that class and not externally is
known as data hiding. Member functions can be defined outside the class using scope
resolution:: operator.
returntype classname :: functionheader
{
}
Objects are instance of a class and are runtime entities. The dot . operator is used by an object
to access public members of its class. Memory for data members are allocated separately for
each object of that class and only one copy of member function resides in memory.
classname objectname;
objectname.publicmember
/ / cl ass and obj ect
#i ncl ude <i ost r eam. h>
#i ncl ude <st di o. h>
cl ass Rect angl e {
i nt x, y;
publ i c:
voi d set _val ues ( i nt , i nt ) ;
i nt ar ea ( voi d) {r et ur n ( x*y) ; }
};
voi d Rect angl e: : set _val ues ( i nt a, i nt b) {
x = a;
y = b;
}
i nt mai n ( )
{
Rect angl e r ect ;
r ect . set _val ues ( 3, 4) ;
cout << " ar ea: " << r ect . ar ea( ) ;
r et ur n 0;
}
Static Data members
When a data member is qualified static, it is initialized to zero when the first object is created.
Only one copy of static data member exists and is shared by all objects. It's lifetime is the
entire program. Static member functions can access only static data members.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 1 Notes
Page 8 of 16
/ / st at i c dat a member
#i ncl ude <i ost r eam>
cl ass MyCl ass {
st at i c i nt num;
i nt dat a;
publ i c:
voi d set I ( i nt i ) { dat a = i ; num++; }
voi d di spl ay( )
{
cout << "\ nSt at i c member : " << num<< " \ nNon- st at i c : " << dat a;
}
};
i nt MyCl ass: : num; / / def i ne num
i nt mai n( )
{
MyCl ass a, b;
a. set I ( 10) ;
b. set I ( 5) ;
a. di spl ay( ) ; / / pr i nt s 10, 1
b. di spl ay( ) ; / / pr i nt s 5, 1
r et ur n 0;
}
Friend Functions
C++permits non-member functions of a class to access its data members by means of
keyword friend. Friend functions are normal functions but declared within a class preceded
by the keyword friend. Friend functions possess certain special characteristics. They are:
Not within the scope of a class in which it has been declared as a friend
Invoked like a normal function without the help of an object
Friend functions usually take objects as arguments
Can access private/public members using the dot . operator
A function can be friend for more than one class
/ / f r i end f unct i on
#i ncl ude <i ost r eam. h>
cl ass sampl e
{
i nt a;
i nt b;
publ i c:
voi d set val ue( ) { a = 25; b = 40; }
f r i end f l oat mean( sampl e s) ;
};
f l oat mean( sampl e s)
{
r et ur n f l oat ( s. a+s. b) / 2;
}
i nt mai n( )
{
sampl e X;
X. set val ue( ) ;
cout << " Mean val ue = " << mean( X) ;
r et ur n 0;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 1 Notes
Page 9 of 16
Constructor
A constructor is a member function whose name is same as class name and is used to
initialize data members and allocate memory dynamically. A constructor is automatically
executed whenever objects are created. Constructor can be classified into the following types:
Do notingProvided by the compiler when there is no constructor
DefaultConstructor that takes no arguments
ParameterizedConstructor that takes arguments
CopyConstructor that takes object reference as argument. Used to initialize data
members with the data member of another object
DynamicConstruct that allocates memory dynamically to data members using new
/ / Const r uct or t ypes
#i ncl ude <i ost r eam. h>
cl ass Di st ance
{
i nt f eet ,
i nt i nch;
publ i c:
Di st ance( ) / / Def aul t const r uct or
{
f eet = 0; i nch = 0;
}
Di st ance ( i nt n, i nt d = 0) / / Par amet er i zed const r uct or
{
f eet = n; i nch = d;
}
Di st ance ( const Di st ance &a) / / Copy const r uct or
{
f eet = a. f eet ; i nch = a. i nch;
}
voi d pr i nt ( )
{
cout << f eet << "' " << i nch << "\ "\ n" ;
}
};
i nt mai n( )
{
Di st ance d1, d2( 4) , d3( 22, 7) ;
Di st ance d4( d2) ;
d1. pr i nt ( ) ;
d3. pr i nt ( ) ;
d4. pr i nt ( ) ;
}
Some special characteristics of constructors are:
Declared under public section
Invoked automatically and cannot be explicitly called
No return type, not even void
Can be overloaded
Cannot be inherited
Cannot be virtual
Make implicit calls to new for memory allocation
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 1 Notes
Page 10 of 16
Destructor
A destructor is used to destroy objects when they go out of scope. Like constructor,
destructor has the same name as class, but preceded by a tilde~. A destructor never takes any
arguments nor does it return. There can be only one destructor for a class. Any memory
allocated dynamically using constructor should be released using delete in the destructor.
/ / Dest r uct or
#i ncl ude <i ost r eam. h>
#i ncl ude <st r i ng. h>
cl ass myst r i ng
{
char *st r ;
i nt l en;
publ i c:
myst r i ng( )
{ st r =' \ 0' ; l en=0; }
myst r i ng ( char *s) / / Dynami c const r uct or
{
l en = st r l en( s) ;
st r = new char [ l en+1] ;
st r cpy( st r , s) ;
cout << " \ nDynami c memor y al l ocat i on" ;
}
~myst r i ng( )
{
del et e st r ;
cout << " \ nMemor y r el eased" ;
}
};
i nt mai n( )
{
myst r i ng s1( " SMKFI T") ;
{
myst r i ng s2( " BME" ) ;
}
}
Operator Overloading
Operator overloading is a way of defining additional task to an existing operator so that it
could be applied to user-defined types. An operator's semantic can be extended but its syntax
cannot be altered. Overloading is done with the help of a special operator function.
returntype classname :: operator op(arguments)
{
}
Operators that cannot be overloaded are . : : ?: si zeof . * Friend function cannot be
used for operators = ( ) [ ] - >
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 1 Notes
Page 11 of 16
Overloading Unary Operators
Unary operator overloaded using member function takes no arguments and return type is void
/ / Unar y oper at or - usi ng member f unct i on
#i ncl ude <i ost r eam. h>
cl ass space
{
i nt x;
i nt y;
i nt z;
publ i c:
space( ) { x=y=z=0; }
space( i nt a, i nt b, i nt c) { x=a; y=b; z=c; }
voi d di spl ay( )
{
cout << " \ n( " << x << " , " << y << ", " << z << " ) " ;
}
voi d oper at or - ( ) ;
};
voi d space: : oper at or - ( )
{
x = - x; y = - y; z = - z;
}
i nt mai n( )
{
space S( 1, 2, - 3) ;
S. di spl ay( ) ;
- S;
S. di spl ay( ) ;
r et ur n 0;
}
Unary operator when overloaded using friend functions take object reference as argument and
return type is void.
/ / Fr i end f unct i on unar y over l oad
#i ncl ude <i ost r eam. h>
cl ass Rat i o
{
i nt num;
i nt den;
publ i c:
Rat i o ( i nt n, i nt d = 1) / / Par amet er i zed const r uct or
{
num= n; den = d;
}
voi d pr i nt ( ) { cout << num << "/ " << den << endl ; }
f r i end voi d oper at or ++( Rat i o &) ;
};
voi d oper at or ++( Rat i o &r )
{
r . num+= r . den ;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 1 Notes
Page 12 of 16
i nt mai n( )
{
Rat i o r 1( 22, 7) ;
r 1. pr i nt ( ) ;
++r 1;
r 1. pr i nt ( ) ;
}
Overloading Binary Operators
Binary operators overloaded using member function takes a user-defined argument and do
return. The left operand is passed implicitly to the operator function and only the right
operand is explicitly passed.
/ / Member f unct i on bi nar y + < over l oad
#i ncl ude <i ost r eam. h>
cl ass Di st ance
{
i nt f eet ;
i nt i nches;
publ i c:
Di st ance( ) { f eet =i nches=0; }
Di st ance( i nt f t , i nt i n) { f eet =f t ; i nches=i n; }
voi d showdi st ( ) const / / di spl ay di st ance
{
cout << f eet << \ << i nches << "\ "" ;
}
Di st ance oper at or + ( Di st ance ) ;
bool oper at or < ( Di st ance)
{
f l oat f 1 = f eet + f l oat ( i nches) / 12;
f l oat f 2 = d2. f eet + f l oat ( d2. i nches) / 12;
r et ur n ( f 1 < f 2) ? t r ue : f al se;
}
};
Di st ance Di st ance: : oper at or + ( Di st ance d2)
{
i nt f = f eet + d2. f eet ;
i nt i = i nches + d2. i nches;
i f ( i >= 12. 0)
{
i - = 12. 0;
f ++;
}
r et ur n Di st ance( f , i ) ;
}
i nt mai n( )
{
Di st ance di st 1( 10, 6. 5) , di st 2( 11, 6. 25) , di st 3;
di st 3 = di st 1 + di st 2;
cout << di st 1 = ; di st 1. showdi st ( ) ; cout << endl ;
cout << di st 2 = ; di st 2. showdi st ( ) ; cout << endl ;
cout << di st 3 = ; di st 3. showdi st ( ) ; cout << endl ;
i f ( di st 1 < di st 2)
cout << " Dest i nat i on1 i s near est " ;
r et ur n 0;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 1 Notes
Page 13 of 16
Binary operators overloaded using friend function takes two argument and do return. One of
the constraints in using member functions is that the left operand must be a user defined type
and not a basic type. This restriction is overcome by using friend function.
/ / Over l oad oper at or s * >> << usi ng f r i end
#i ncl ude <i ost r eam. h>
const i nt si ze = 5;
cl ass vect or
{
i nt ar r [ si ze] ;
publ i c:
vect or ( )
{
f or ( i nt i =0; i <si ze; i ++)
ar r [ i ] = 0;
}
f r i end vect or oper at or * ( i nt , vect or ) ;
f r i end i st r eam& oper at or >> ( i st r eam&, vect or &) ;
f r i end ost r eam& oper at or << ( ost r eam&, vect or &) ;
};
i st r eam& oper at or >> ( i st r eam&i n, vect or &v)
{
f or ( i nt i =0; i <si ze; i ++)
i n >> v. ar r [ i ] ;
r et ur n ( i n) ;
}
vect or oper at or * ( i nt x, vect or v)
{
vect or t mp;
f or ( i nt i =0; i <si ze; i ++)
t mp. ar r [ i ] = x * v. ar r [ i ] ;
r et ur n t mp;
}
ost r eam& oper at or << ( ost r eam&out , vect or &v)
{
f or ( i nt i =0; i <si ze; i ++)
out << v. ar r [ i ] << " " ;
r et ur n ( out ) ;
}
i nt mai n( )
{
vect or v1;
cout << " V1 = " ;
ci n >> v1;
vect or v2;
v2 = 2 * v1;
cout << " 2 * V1 = " << v2;
r et ur n 0;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 1 Notes
Page 14 of 16
Overloading strings
/ / St r i ng concat enat i on usi ng +
#i ncl ude <i ost r eam. h>
#i ncl ude <st r i ng. h>
cl ass myst r i ng
{
char *p;
i nt l en;
publ i c:
myst r i ng( ) { l en=0; p=0; }
myst r i ng( const char * s) ;
f r i end myst r i ng oper at or +( const myst r i ng &s, const myst r i ng &t ) ;
voi d show( ) { cout << p << "\ n" ; }
};
myst r i ng : : myst r i ng( const char *s)
{
l en = st r l en( s) ;
p = new char [ l en+1] ;
st r cpy( p, s) ;
}
myst r i ng oper at or +( const myst r i ng &s, const myst r i ng &t )
{
myst r i ng t emp;
t emp. l en = s. l en + t . l en;
t emp. p = new char [ t emp. l en+1] ;
st r cpy( t emp. p, s. p) ;
st r cat ( t emp. p, t . p) ;
r et ur n ( t emp) ;
}
i nt mai n( )
{
myst r i ng s1 = " New ";
myst r i ng s2 = " Yor k";
myst r i ng s3;
s3 = s1 + s2;
s3. show( ) ;
r et ur n 0;
}
Operator Overloading Rules
1. Only existing operators can be overloaded.
2. The overloaded operator must have at least one operand of user-defined type.
3. Overloaded operators follow the syntax of original operators
4. For unary operators there is no return and for binary operators a class type is returned
5. Operators that cannot be overloaded are. : : ?: si zeof . *
6. Friend function cannot be used for operators= ( ) [ ] - >
Basic to User defined and User-defined to Basic Type conversions
The conversion from basic to class type is accomplished using a constructor whereas the
conversion from class type to basic is done using a casting operator function.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 1 Notes
Page 15 of 16
/ / Basi c t o cl ass and vi ce- ver sa
#i ncl ude <i ost r eam. h>
cl ass myt i me
{
i nt hr s;
i nt mi n;
publ i c:
myt i me( ) { hr s = mi n = 0; }
myt i me( i nt h, i nt m) { hr s = h; mi n = m; }
myt i me( i nt t ) / / Basi c t o Cl ass
{
hr s = t / 60;
mi n = t %60;
}
voi d di spl ay( )
{
cout << hr s << ' : ' ;
i f ( mi n < 10)
cout << ' 0' ;
cout << mi n << "\ n" ;
}
oper at or i nt ( ) / / Cl ass t o basi c
{
i nt t = hr s *60;
t += mi n;
r et ur n t ;
}
};
i nt mai n( )
{
i nt dur = 65;
myt i me T1;
T1 = dur ; / / Basi c t o cl ass
T1. di spl ay( ) ;
myt i me T2( 2, 15) ;
i nt t = T2; / / Cl ass t o basi c
cout << " Dur at i on = " << t << endl ;
r et ur n 0;
}
Conversion between different Class types
If the conversion routine is implemented in the source class then it is implemented using the
operator function otherwise as constructor in the destination class.
/ / conver t s f r omt i me24 t o t i me12 wi t h r out i ne i n sour ce
#i ncl ude <i ost r eam. h>
#i ncl ude <st r i ng. h>
cl ass t i me12
{
bool pm; / / t r ue = pm, f al se = am
i nt hr s; / / 1 t o 12
i nt mi ns; / / 0 t o 59
publ i c:
t i me12( )
{
pm=t r ue; hr s=0; mi ns=0;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 1 Notes
Page 16 of 16
t i me12( bool ap, i nt h, i nt m)
{
pm=ap; hr s=h; mi ns=m;
}
/ * t i me12( t i me24 t 24)
{
i nt hr s24 = t 24. get hr ( ) ;
mi n = t 24. get mi n( ) ;
hr s = ( hr s24 < 13) ? hr s24 : hr s24 - 12;
pm= hr s < 12 ? f al se : t r ue; / / f i nd am/ pm
i f ( hr s==0) / / 00 i s 12 a. m.
{ hr s=12; pm=f al se; }
} */
voi d di spl ay( ) const / / f or mat : 11: 59 p. m.
{
cout << hr s << ' : ' ;
i f ( mi ns < 10)
cout << ' 0' ; / / ext r a zer o f or 01
cout << mi ns << " " ;
pm? cout << "pm" : cout << " am";
}
};
cl ass t i me24
{
i nt hour s; / / 0 t o 23
i nt mi nut es; / / 0 t o 59
publ i c: / / no- ar g const r uct or
t i me24( ) { hour s = mi nut es = 0; }
t i me24( i nt h, i nt m) { hour s = h; mi nut es = m; }
i nt get hr ( ) { r et ur n hour s; }
i nt get mi n( ) { r et ur n mi nut es; }
oper at or t i me12( ) const ; / / conver si on oper at or
};
t i me24: : oper at or t i me12( ) const / / conver si on oper at or
{
i nt hr s12 = ( hour s < 13) ? hour s : hour s- 12;
bool pm= hour s < 12 ? f al se : t r ue; / / f i nd am/ pm
i f ( hr s12==0) / / 00 i s 12 a. m.
{ hr s12=12; pm=f al se; }
r et ur n t i me12( pm, hr s12, mi nut es) ;
}
i nt mai n( )
{
i nt h, m, s;
cout << " Ent er 24- hour t i me: \ n";
cout << " Hour s ( 0 t o 23) : "; ci n >> h;
cout << " Mi nut es ( 0 t o 59) : " ; ci n >> m;
t i me24 t 24( h, m) ; / / make a t i me24
t i me12 t 12 = t 24; / / conver t t i me24 t o t i me12
cout << " \ n12- hour t i me: "; / / di spl ay equi val ent t i me12
t 12. di spl ay( ) ;
cout << " \ n\ n";
r et ur n 0;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 1 of 28
Inheritance
Inheritance is the process by which a class acquires the features of another class. Inheritance is
the process of creating new classes called derived classes from existing or base classes. The
derived class inherits all the capabilities of the base class but can add embellishments and
refinements of its own. Inheritance permits code reusability. Reusing existing code saves time
and money and increases a programs reliability. An example is the ease of distributing class
libraries. A derived class is defined by specifying its relationship with base class as follows
class derived_classname : visibility_mode base_classname
{
. . .
};
The visibility mode could be private, public or protected. It specifies what features of base
class would be available for inheritance and its access status in the derived class. Private
members are not inheritable. To have the data members inheritable, a new access specifier called
protected is available.
Base Class visibility Derived Class Visibility
Private derivation Protected derivation Public derivation
Private Not Inheritable Not Inheritable Not Inheritable
Protected Private Protected Protected
Public Private Protected Public
Private derivation is done in cases when the derived class acts as a final class and is no more
inheritable. In general under inheritance, data members are declared as protected, member
functions as public and the mode of derivation is public.
The different types of inheritance are:
Single Hierarchical Multiple Multi-level
A combination of hierarchical / multiple / multilevel inheritance is known as hybrid inheritance.
Single Inheritance
This is the simplest form of inheritance. In this inheritance, the derived class inherits features
from the base class and acts as a final class i.e., inheritance is not proceeded further. In such
cases, private derivation is mostly followed.
Base
Derived1 Derived2
Base1 Base2
Derived
Base
Derived
Derived
Base
Derived
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 2 of 28
/ / Si ngl e I nher i t anceBox f r omr ect angl e
#i ncl ude <i ost r eam. h>
cl ass r ect
{
pr ot ect ed:
i nt l engt h;
i nt br eadt h;
publ i c:
voi d r ect di m( )
{
cout << "Ent er l engt h and br eadt h : ";
ci n >> l engt h >> br eadt h;
}
i nt r ect ar ea( )
{
r et ur n( l engt h*br eadt h) ;
}
};
rect
cl ass box : pr i vat e r ect
{
i nt dept h;
publ i c:
voi d get dat a( )
{
r ect di m( ) ;
cout << "Ent er dept h : ";
ci n >> dept h;
}
voi d di spar ea( )
{
cout << "\ nBox ar ea : "
<< dept h*r ect ar ea( ) ;
}
};
private
public
box
private
i nt mai n( )
{
box B;
B. get dat a( ) ;
B. di spar ea( ) ;
r et ur n 0;
}
public
Hierarchical Inheritance
In hierarchical inheritance, features of the base class are inherited by more than one derived
class. It provides a powerful way to extend the capabilities of existing classes, and to design
programs using hierarchical relationships.
baseclass { . . . };
derivedclass1 : visibility baseclass { . . . };
derivedclass1 : visibility baseclass { . . . };
rectdim()
rectarea()
getdata()
disparea()
length
breadth
Height
rectdim()
rectarea()
rect
box
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 3 of 28
An employee possesses certain attributes. Further in an
institution, he could be classified as teaching or non-
teaching and each have their own unique characteristics.
This classification is depicted below using a hierarchical
inheritance.
/ / Hi er ar chi cal i nher i t anceempl oyee
#i ncl ude <i ost r eam. h>
cl ass empl oyee
{
pr ot ect ed:
char *name;
i nt age;
char gender ;
char *qual f n;
l ong sal ar y;
publ i c:
voi d get det ai l s( ) {
cout << "Ent er empl oyee det ai l s\ n";
ci n >> name >> age >> gender
>> qual f n >> sal ar y;
}
voi d di spdet ai l s( ) {
cout <<"\ nName : "<<name<<"\ nGender : "
<< gender <<"\ nAge : "<< age
<< "\ nQual i f i cat i on : " << qual f n
<< "\ nSal ar y : " << sal ar y << endl ;
}
};
employee
protected
public
cl ass t eachi ng : publ i c empl oyee
{
pr ot ect ed:
char t heor y[ 4] [ 30] ;
char l ab[ 2] [ 30] ;
publ i c:
voi d subi nf o( ) {
cout << "Ent er t heor y subj ect s: ";
f or ( i nt i =0; i <4; i ++)
ci n >> t heor y[ i ] ;
cout << "Ent er l ab subj ect s: ";
f or ( i nt i =0; i <2; i ++)
ci n >> l ab[ i ] ;
}
voi d di spi nf o( ) {
cout <<"\ nTheor y Subj ect s Handl ed\ n";
f or ( i nt i =0; i <4; i ++)
cout << t heor y[ i ] ;
cout << "\ nLab subj ect s Handl ed\ n: ";
f or ( i nt i =0; i <2; i ++)
cout << l ab[ i ] ;
}
};
teaching
protected
public
name
age
gender
qualfn
salary
getdetails()
dispdetails()
name
age
gender
qualfn
salary
theory
lab
getdetails()
dispdetails()
subinfo()
dispinfo()
employee
teaching nonteach
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 4 of 28
cl ass nont each : publ i c empl oyee
{
pr ot ect ed:
char ski l l s[ 5] [ 30] ;
publ i c:
voi d get ski l l s( )
{
cout << "Ent er t echni cal ski l l s : ";
f or ( i nt i =0; i <5; i ++)
ci n >> ski l l s[ i ] ;
}
voi d di spski l l s( )
{
cout << "\ nTechni cal Ski l l s\ n";
f or ( i nt i =0; i <4; i ++)
cout << ski l l s[ i ] ;
}
};
non-teach
protected
public
i nt mai n( )
{
t eachi ng T;
T. get det ai l s( ) ;
T. subi nf o( ) ;
T. di spdet ai l s( ) ;
T. di spi nf o( ) ;
nont each NT;
NT. get det ai l s( ) ;
NT. get ski l l s( ) ;
NT. di spdet ai l s( ) ;
NT. di spski l l s( ) ;
r et ur n 0;
}
Multiple Inheritance
A derived class that inherits features from two or more base classes is known as multiple
inheritance. The base classes are separated by comma in the derivation list.
derivedclass : visibility baseclass1, visibility baseclass2,
{
. . .
}
Cutoff mark for engineering counseling is computed by
considering both the board exam result and an entrance
exam conducted by the university. The result class
implements this using multiple inheritance as follows:
name
age
gender
qualfn
salary
skills
getskills()
dispskills()
getdetails()
dispdetails()
boardexam entrance
result
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 5 of 28
/ / Mul t i pl e I nher i t anceCut of f mar k
#i ncl ude <i ost r eam. h>
cl ass boar dexam
{
pr ot ect ed:
i nt mat h;
i nt phy;
i nt chem;
publ i c:
voi d get mar ks( )
{
cout << "Ent er M/ P/ C mar ks: ";
ci n >> mat h >> phy >> chem;
}
};
boardexam
protected
public
cl ass ent r ance
{
pr ot ect ed:
i nt scor e;
publ i c:
voi d get scor e( )
{
cout << "Ent r ance mar ks : ";
ci n >> scor e;
}
};
entrance
protected
public
cl ass r esul t : publ i c boar dexam, publ i c ent r ance
{
pr ot ect ed:
f l oat cut of f ;
publ i c:
voi d pr ocess( )
{
get mar ks( ) ;
get scor e( ) ;
cut of f =f l oat ( mat h/ 2+phy/ 4+chem/ 4+scor e) ;
}
voi d di spl ay( )
{
cout << "Cut - of f mar ks : "<< cut of f ;
}
};
result
protected
public
i nt mai n( )
{
r esul t R;
R. pr ocess( ) ;
R. di spl ay( ) ;
r et ur n 0;
};
math
phy
chem
getmarks()
score
getmarks()
cutoff
math
phy
chem
score
getmarks()
getscore()
process()
display()
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 6 of 28
Multilevel inheritance
Classes can be derived from classes that are themselves derived. There is no limit on the number
of levels. Such type of inheritance is known as multilevel inheritance. The sequence of class
involved in the inheritance forms the inheritance path.
grandparentclass { . . . };
parentclass : visibility grandparentclass { . . . };
childclass : visibility parentclass { . . . };
. . .
/ / Mul t i l evel i nher i t anceCl assi f i cat i on of Bei ngs
#i ncl ude <i ost r eam. h>
cl ass ver t ebr at e
{
publ i c:
voi d eat ( )
{
cout << "\ nHave spi ne and do eat ";
}
};
vertebrate
public
cl ass mammal : publ i c ver t ebr at e
{
publ i c:
voi d suckl e( )
{
cout << "\ nFeeded wi t h mi l k";
}
};
mammal
public
cl ass pr i mat e : publ i c mammal
{
publ i c:
voi d peel ( )
{
cout <<"\ nCan peel f r ui t & t hen eat ";
}
};
primate
public
cl ass human : publ i c pr i mat e
{
publ i c:
voi d t hi nk( )
{
cout << "\ nUse my si xt h sense";
}
};
human
public
i nt mai n( )
{
human H;
H. eat ( ) ;
H. suckl e( ) ;
H. peel ( ) ;
H. t hi nk( ) ;
r et ur n 0;
}
eat()
eat()
suckle()
eat()
suckle()
peel()
eat()
suckle()
peel()
think()
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 7 of 28
Hybrid Inheritance / Virtual Base class
Consider a special case of hybrid inheritance, in
which all the three inheritances are involved as
shown in the figure. Thechild class (result) has two
direct parent classes namely hslc and curricular.
The child inherits the traits of grandparent class
(applicant) through two separate paths. Thereby the
child has two copies of the grandparent
public/protected members. This ambiguity is
resolved using virtual base class. The duplication
of inherited members due to multiple paths is
resolved by making the common base as virtual
base class. The use of keyword virtual in parent's
derivation list causes them to share a single common copy of their base class Grandparent.
grandparent { . . . };
parent1 : virtual visibility grandparent { . . . };
parent2 : virtual visibility grandparent { . . . };
child : visibility parentclass1, visibility parentclass2 { . . . };
/ / Vi r t ual base cl assAr t s col l ege admi ssi on scor e
#i ncl ude <i ost r eam. h>
cl ass appl i cant
{
pr ot ect ed:
i nt appi d;
char name[ 30] ;
publ i c:
voi d get det ai l ( )
{
cout << "Appl no and name : ";
ci n >> appi d >> name;
}
voi d di spdet ai l ( )
{
cout << "\ nAppl i cat i on No : " << appi d
<< "\ nName : " << name;
}
};
cl ass hsl c : publ i c vi r t ual appl i cant
{
pr ot ect ed:
i nt sub1;
i nt sub2;
i nt sub3;
i nt sub4;
i nt sub5;
i nt sub6;
f l oat per cent ;
applicant
hslc curricular
result
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 8 of 28
publ i c:
voi d get mar ks( )
{
cout << "\ nEnt er 6 subj ect mar ks : ";
ci n >> sub1 >> sub2 >> sub3 >> sub4 >> sub5 >> sub6;
}
voi d di spmar ks( )
{
per cent = f l oat ( sub1 + sub2 + sub3 + sub4 + sub5 + sub6) / 12;
cout << "\ nPer cent age : " << per cent ;
}
};
cl ass cur r i cul ar : publ i c vi r t ual appl i cant
{
pr ot ect ed:
i nt spor t ;
i nt ncc;
i nt nss;
i nt ot her ;
f l oat ext r a;
publ i c:
voi d get ext r a( )
{
cout << "\ nEnt er scor e f or ext r a- cur r i cul ar act i vi t i es : ";
ci n >> spor t >> ncc >> nss >> ot her ;
}
voi d di spext r a( )
{
ext r a = f l oat ( spor t + ncc + nss + ot her ) / 4;
cout << "\ nExt r a cur r i cul ar scor e : " << ext r a;
}
};
cl ass r esul t : publ i c hsl c, publ i c cur r i cul ar
{
pr ot ect ed:
f l oat t ot al ;
publ i c:
voi d pr ocess( )
{
get det ai l ( ) ;
get mar ks( ) ;
get ext r a( ) ;
}
voi d di spr esul t ( )
{
di spdet ai l ( ) ;
di spmar ks( ) ;
di spext r a( ) ;
t ot al = per cent + ext r a;
cout << "\ nTot al scor e : " << t ot al ;
}
};
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 9 of 28
i nt mai n( )
{
r esul t R;
R. pr ocess( ) ;
R. di spr esul t ( ) ;
r et ur n 0;
}
Overriding Member Functions
To redefine base class member functions, have member functions of the same prototype in the
derived class. This redefinition is also known as overriding. When the member function is
invoked with a derived object, only the derived class member function gets executed. The base
class member function can be executed using the scope resolution operator.
/ / Over r i di ng Base cl ass f unct i ons
#i ncl ude <i ost r eam. h>
cl ass base
{
publ i c:
voi d di spl ay( )
{
cout << "\ nBase cl ass member f unct i on";
}
};
cl ass der i ved : publ i c base
{
publ i c:
voi d di spl ay( ) / /
{
cout << "\ nDer i ved cl ass member f unct i on";
/ / base: : der i ved( ) t o execut e base member f unct i on
}
};
i nt mai n( )
{
der i ved D;
D. di spl ay( ) ;
r et ur n 0;
}
Constructors in Base class
Constructors in base class needs to be taken care of. If the base class contains default constructor
then there is no need for the derived class to have a constructor function. If the base class has a
parameterized constructor, then it is mandatory for the derived class to have a parameterized
constructor and pass arguments to the base class constructor. The base class constructor is
executed first.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 10 of 28
derivedclassname (argument list) : baseclassname(args)
{
. . .
};
/ / Const r uct or s i n I nher i t ance
#i ncl ude <i ost r eam. h>
cl ass Per son
{
pr ot ect ed:
char * name;
publ i c:
Per son( const char * s)
{
name = new char [ st r l en( s) + 1] ;
st r cpy( name, s) ;
}
~Per son( ) { del et e[ ] name; }
};
cl ass St udent : publ i c Per son
{
char * maj or ;
publ i c:
St udent ( const char * s, const char * m) : Per son( s)
{
maj or = new char [ st r l en( m) +1 ] ;
st r cpy( maj or , m) ;
}
voi d di spl ay( )
{
cout << "\ nName : " << name << "\ nMaj or : " << maj or ;
}
~St udent ( ) { del et e[ ] maj or ; }
};
i nt mai n( )
{
St udent S( "Sar ah", "Bi ol ogy") ;
S. di spl ay( ) ;
r et ur n 0;
}
In case of multiple inheritance, the base class constructors are executed in order of their
appearance in the derivation list. In case of multi-level inheritance, parameters have to be passed
up the inheritance chain and the ancestor constructor is executed prior to its successor class.
Class Nesting
Another way of inheriting class members is to have objects of another class as its member. This
kind of relationship is also known as containership or nesting
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 11 of 28
class alpha { . . . };
class beta
{
alpha A;
. . .
};
Pointers to Objects
A class pointer can point to objects created by that class. Members can be accessed by using->
operator. Objects can also be created in using the new operator and class pointer at runtime.
/ / Poi nt er s t o obj ect s.
#i ncl ude <i ost r eam. h>
cl ass pr oduct
{
i nt code;
f l oat pr i ce;
publ i c:
voi d get dat a( )
{
cout << "\ nEnt er pr oduct code and pr i ce : ";
ci n >> code >> pr i ce;
}
voi d di spl ay( )
{
cout << "\ nPr oduct Code : " << code << "\ t Rat e : " << pr i ce;
}
};
i nt mai n( )
{
pr oduct P1, *pt r ;
pt r = &P1;
pt r - >get dat a( ) ;
pt r - >di spl ay( ) ;
pt r = new pr oduct ;
pt r - >get dat a( ) ;
pt r - >di spl ay( ) ;
r et ur n 0;
}
this pointer
this is a pointer passed to a member function when it is invoked. It is used by the member
function to identify the object that invoked the function. this acts as an implicit argument to all
member function. A more practical use for this is in returning values from member functions and
overloaded operators. Returning by reference the object of which a function is a member is better
than returning a temporary object created in a member function. this pointer makes this easy.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 12 of 28
/ / t hi s poi nt er
#i ncl ude <i ost r eam. h>
#i ncl ude <st r i ng. h>
cl ass per son
{
char name[ 25] ;
f l oat exp;
publ i c:
per son ( char *s, f l oat e) {
st r cpy( name, s) ;
exp = e;
}
per son& gr eat er ( per son &p) / / r et ur n by r ef er ence
{
i f ( p. exp > t hi s- >exp)
r et ur n p;
el se
r et ur n *t hi s;
}
voi d di spl ay( )
{
cout << "\ nName : " << name << "\ nExper i ence : " << exp;
}
};
i nt mai n( )
{
per son P1( "J ohn", 38) ;
per son P2( "Ram", 30) ;
per son P3 = P1. gr eat er ( P2) ;
cout << "\ nSeni or i t y";
P3. di spl ay( ) ;
r et ur n 0;
}
Pointer to Derived Classes
A base class pointer can point to both base class objects as well as derived class objects. The rule
is that pointers to objects of a derived class are type compatible with pointers to objects of the
base class and all base class member functions must be overridden by the derived class. When
the pointer points to base class, base member functions get executed. But when it points to
derived class, only base member function gets executed. The problem is that compiler ignores
contents of the pointer and chooses the member function that matches the type of the pointer.
/ / Poi nt er s t o der i ved cl ass
#i ncl ude <i ost r eam. h>
cl ass base
{
publ i c:
voi d show( )
{
cout << "\ nBase cl ass member f unct i on : show";
}
};
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 13 of 28
cl ass der i ved : publ i c base
{
publ i c:
voi d show( ) / / Over r i dden member f unct i on
{
cout << "\ nDer i ved cl ass member f unct i on : show";
}
};
i nt mai n( )
{
base *bpt r ;
base B;
der i ved D;
bpt r = &B;
bpt r - >show( ) ;
bpt r = &D; / / poi nt s t o der i ved cl ass
bpt r - >show( ) ; / / base cl ass member f unct i on execut ed
r et ur n 0;
}
Runtime Polymorphism
Polymorphism can be classified into static or
dynamic. In static or compile time polymorphism,
the function code to be executed for a function call
is known in advance, i.e., the binding is done during
compilation. Function overloading and Operator
overloading fall in this category. In dynamic or run-
time polymorphism the binding of function to a call
is deferred until runtime, i.e., dynamic or late
binding. Runtime polymorphism is achieved
through virtual functions. Virtual function allow
objects of different types to respond differently to
the same function call. The rule that the pointers statically defined type determines which
member function gets invoked is overruled by declaring the base class member function virtual.
When a function is made virtual, which function to be executed is decided at runtime based on
the type of object pointed to by the base pointer. Thus, by making the base pointer to point to
different objects, different versions of virtual functions can be executed.
In a class, there are two types of person namely student and faculty. Each of them is assessed
using different performance measures. This is implemented using virtual function.
/ / vi r t ual f unct i ons wi t h per son cl assSt udent / Facul t y
#i ncl ude <i ost r eam. h>
#i ncl ude <ct ype. h>
cl ass per son
{
pr ot ect ed:
char name[ 40] ;
char t ype;
Polymorphism
Static Dynamic
Operator
Overload
Function
Overload
Virtual
Function
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 14 of 28
publ i c:
voi d get Name( )
{
cout << " Ent er name: "; ci n >> name;
}
voi d put Name( )
{
cout << "\ n"<< name << "\ t ";
i f ( t ype==' s' )
cout << "St udent \ t ";
el se
cout << "Facul t y\ t ";
}
vi r t ual voi d get Dat a( ) = 0; / / pur e vi r t ual f unc
vi r t ual bool i sOut st andi ng( ) = 0; / / pur e vi r t ual f unc
};
cl ass st udent : publ i c per son
{
pr i vat e:
f l oat gpa;
publ i c:
voi d get Dat a( )
{
get Name( ) ;
cout << " Ent er st udent ' s GPA: "; ci n >> gpa;
t ype=' s' ;
}
bool i sOut st andi ng( )
{
r et ur n ( gpa > 8. 5) ? t r ue : f al se;
}
};
cl ass f acul t y : publ i c per son
{
pr i vat e:
f l oat passout ;
publ i c:
voi d get Dat a( )
{
get Name( ) ;
cout << " Ent er passout per cent age: ";
ci n >> passout ;
t ype=' f ' ;
}
bool i sOut st andi ng( )
{
r et ur n ( passout > 95) ? t r ue : f al se;
}
};
i nt mai n( )
{
per son* per sPt r [ 100] ; / / Ar r ay of poi nt er s
i nt n = 0;
char choi ce;
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 15 of 28
do {
cout << "Ent er st udent or f acul t y ( s/ f ) : ";
ci n >> choi ce;
swi t ch( t ol ower ( choi ce) )
{
case ' s' :
per sPt r [ n] = new st udent ;
per sPt r [ n++] - >get Dat a( ) ;
br eak;
case ' f ' :
per sPt r [ n] = new f acul t y;
per sPt r [ n++] - >get Dat a( ) ;
br eak;
def aul t :
cout << "I nval i d choi ce\ n";
}
cout << " Ent er anot her ( y/ n) ? ";
ci n >> choi ce;
} whi l e( t ol ower ( choi ce) ==' y' ) ; / / cycl e unt i l not ' y'
cout << "Name\ t Cat egor y\ t Remar ks";
f or ( i nt j =0; j <n; j ++) / / pr i nt al l per sons
{
per sPt r [ j ] - >put Name( ) ;
i f ( per sPt r [ j ] - >i sOut st andi ng( ) )
cout << "\ t Out st andi ng";
}
r et ur n 0;
}
The rules that are applicable to virtual functions are:
1. Must be member of a class
2. Cannot be static
3. Accessed by using object pointers
4. Can be a friend of another class
5. The prototype of the base class and all derived class versions must be identical
6. Constructors cannot be virtual but destructors can be
7. A base pointer can point to a derived object, whereas the vice versa is not true
8. A virtual function need not be redefined in the derived class. In such cases, base class
functions would be executed
Pure Virtual function & Abstract Base class
Virtual functions are seldom used to perform any task. They can be madepure virtual function
that does nothing as
virtual returntype functionname (arguments) =0;
A class that contains one or more pure virtual functions cannot be used to instantiate objects.
Such classes are calledabstract base classes. Abstract class is used to provide some traits to the
concrete derived classes and to create a base pointer for runtime polymorphism.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 16 of 28
Templates
One of the features recently added to C++is templates that support generic programming. It
enables to define generic classes and functions. The compiler uses the template to generate the
code. The same template can be used to generate many different instances. This is done by
means of template parameters. As a mechanism for automatic code generation, it improves
programming efficiency and allows the programmer to defer more of the work to the compiler.
Function Template
Function templates are a direct generalization of function overloading. Function templates avoid
code redundancy among structurally similar families of functions. A function template works
like an outline. The compiler uses the template to generate each version of the function that is
needed. The individual versions are called instances of the function template. A function that is
an instance of a template is also called a template function. The general format is
template <classT>
returntype functionname (arguments of type T)
{
. . .
}
Thetemplate keyword signals the compiler that a function template is to be defined. The use of
the word class means any type and the parameter T known as template argument may be
substituted by any type.
One of the common operations performed on a sequence of numbers/characters is sorting. The
simple task is often done by having separate functions, each to sort integer, float and other types.
The only difference is the type of data they operate on. Further, the sorting algorithm also
includes a swapping routine to re-arrange elements which are not in order. This redundancy is
avoided by using a function template as follows:
/ / Funct i on Templ at eBubbl e Sor t
#i ncl ude <i ost r eam. h>
t empl at e <cl ass T> / / Sor t usi ng Templ at e f unct i on
voi d Ar r ange( T *Ar r ay, i nt count )
{
f or ( i nt i =0; i <count - 1; i ++)
f or ( i nt j =i +1; j <count ; j ++)
i f ( *( Ar r ay+i ) > *( Ar r ay+j ) ) / / Ar r ay pr ocessi ng usi ng poi nt er
Swap( *( Ar r ay+i ) , *( Ar r ay+j ) ) ;
}
t empl at e <cl ass T> / / I nt er change Templ at e f unct i on
voi d Swap( T &a, T &b)
{
T t emp;
t emp = a;
a = b;
b = t emp;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 17 of 28
i nt mai n( )
{
i nt X[ 6] = {1, - 4, 8, 0, 3, 5};
f l oat Y[ 6] = {2. 2, - 0. 4, 3. 5, 1. 9, 2. 7, 0. 7};
char Z[ 7] = {' S' , ' M' , ' K' , ' F' , ' I ' , ' T' };
Ar r ange( X, 6) ; / / I nt eger sor t
Ar r ange( Y, 6) ; / / Fl oat sor t
Ar r ange( Z, 6) ; / / Char act er sor t
cout << "\ nSor t ed Ar r ays\ nI nt \ t Fl oat \ t Char \ n";
f or ( i nt i =0; i <6; i ++)
cout << X[ i ] << "\ t " << Y[ i ] << "\ t " << Z[ i ] << "\ n";
r et ur n 0;
}
For each call, the compiler generates the complete function, replacing the type parameter with
the argument type.
Class Template
The template concept is extended to classes for defining generic classes. Class templates are
generally used for data storage (container) classes. Class templates are sometimes called
parameterized types.
template <classT>
class
{
class member of anonymous type T
. . .
};
Class templates differ from function templates in the way they are instantiated. Classes are
instantiated by defining an object using the template argument. A class created from a class
template is called template class.
classname <types>objectname(constructor args);
The member functions of a class template are themselves function templates with the same
template header as their class when defined externally.
template <classT>
returntype classname <T>:: functionname (arguments of type T)
{
. . .
};
A stack is a simple data structure that simulates an ordinary stack of objects of the same type
with the restrictions that an object can be inserted into the stack only at the top (push) and an
object can be removed from the stack only at the top (pop). A stack class abstracts this notion by
hiding the implementation of the data structure, allowing access only by means of public
functions that simulate the limited operations described above. A class template for generating
Stack classes of different types using array is given below:
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 18 of 28
/ / Cl ass Templ at eSt ack
#i ncl ude <i ost r eam. h>
t empl at e <cl ass T>
cl ass St ack / / Gener i c St ack Cl ass of t ype T
{
T* Dat a;
i nt Top;
i nt Si ze;
publ i c:
St ack( i nt s) {
Top = - 1;
Si ze = s;
Dat a = new T[ Si ze] ;
}
voi d Push( T) ;
T Pop( ) ;
bool i sEmpt y( ) {
i f ( Top < 0)
r et ur n t r ue;
el se
r et ur n f al se;
}
bool i sFul l ( ) {
i f ( Top == Si ze - 1)
r et ur n t r ue;
el se
r et ur n f al se;
}
~St ack( ) { del et e [ ] Dat a; }
};
t empl at e <cl ass T>
voi d St ack<T>: : Push( T el ement )
{
Dat a[ ++Top] =el ement ;
}
t empl at e <cl ass T> T St ack<T>: : Pop( )
{
r et ur n ( Dat a[ Top- - ] ) ;
}
i nt mai n( )
{
St ack <char > S1( 26) ; / / Char act er St ack
St ack <i nt > S2( 10) ; / / I nt eger st ack
f or ( char x=' a' ; x<=' z' ; x++)
{
i f ( S1. i sFul l ( ) )
cout << "\ nChar St ack Over f l ow";
el se
S1. Push( x) ;
}
cout << "\ nChar act er St ack Pop : ";
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 19 of 28
f or ( i nt i =0; i <26; i ++) {
i f ( S1. i sEmpt y( ) )
cout << "\ nChar St ack Undef l ow";
el se
cout << S1. Pop( ) << " ";
}
f or ( i nt i =5; i <15; i ++) {
i f ( S2. i sFul l ( ) )
cout << "\ n\ nI nt St ack Over f l ow";
el se
S2. Push( i ) ;
}
cout << "\ nI nt eger St ack Pop : ";
f or ( i nt i =0; i <10; i ++) {
i f ( S2. i sEmpt y( ) )
cout << "\ nI nt St ack Undef l ow";
el se
cout << S2. Pop( ) << " ";
}
r et ur n 0;
}
Exceptions
Exceptions are runtime anomalies or unusual conditions that a program may encounter while
executing. Exceptions may besynchronous such asout-of-range index, divide by zero, etc., that
could be handled or asynchronous such askeyboard interrupt that is beyond user control.
Exception Handling Mechanism
try The mechanism involves the usage of try, throw andcatch. The keyword
try is used to preface a block of statement that may throw exception.
When an exception is detected, it is raised using athrow statement in the
try block. A catch block handles the exception and appropriate action is
taken. The sequence of events when an exception occur are:
1. Control enters the try block.
2. A statement in the try block causes an error.
3. An exception is thrown.
4. Control transfers to the exception handler (catch block) following
the try block.
Detect & throw
exception
catch
Catch & handle
the exception
Generally the try block encompasses either part/all of the statements in the main function. The
catch block should immediately follow the try block. Catch block can be overloaded i.e., a try
block can have more than one catch block. If the type thrown matches the argument type of a
catch block, then that catch block is executed and thereafter control goes to the statement
immediately after the catch block. Exceptions that arise out of functions invoked from the try
block are also caught. Due to mismatch if an exception is not caught by the handlers then
abnormal program termination occurs. Therefore it is recommended to include a generic catch
handler catch() { } as the last catch block.
throw
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 20 of 28
{
.
try
{
. . .
throw (variable);
. . .
}
catch (type1 arg)
{
. . .
}
catch (type2 arg)
{
. . .
}
. . .
}
A simple program demonstrates how exceptions such as division by zero and invalid array access
are handled using try and catch.
/ / Except i ons
#i ncl ude <i ost r eam. h>
#i ncl ude <st dl i b. h>
const i nt max=5;
i nt mai n( )
{
i nt a, b, x;
i nt ar r [ max] = {1, 2, 3, 4, 5};
cout << "\ nEnt er val ues f or a & b : ";
ci n >> a >> b;
i nt di f f = abs( a - b) ;
cout << "\ nEnt er ar r ay i ndex : ";
ci n >> x;
t r y
{
i f ( di f f == 0)
t hr ow( di f f ) ;
el se
cout << "\ nResul t = " << f l oat ( a) / di f f ;
i f ( x < max)
cout << "\ nar r [ " << x << "] = " << ar r [ x] ;
el se
t hr ow( ar r ) ;
t hr ow( "xyz") ; / / Unhandl ed except i on
}
cat ch( i nt i )
{
cout << "\ nDi vi de by zer o";
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 21 of 28
cat ch( i nt *x)
{
cout << "\ nAr r ay i ndex out of bounds";
}
cat ch( . . . ) / / Gener i c handl er
{
cout << "\ nUncaught except i on";
}
r et ur n 0;
}
Rethrowing
A handler may rethrow the exception without processing it by invoking throw without argument.
This causes the current exception to be thrown to the next enclosing try/catch sequence.
t hr ow;
Specifying exceptions
The type of exceptions to be thrown by a function could be explicitly specified as follows:
type function(arg-list) throw (type-list)
{
. . .
}
String Class
Standard C++provides a new class called string. The main advantage is that string class takes
care of memory management, allows use of overloaded operators and provides a set of member
functions by including<string> header file. A string object can be defined in several ways as:
string( )
string(char *)
string(string &)
string(int, char);
The operators that are applicable to string class are:
= assignment
+ concatenation
+= concatenation & assignment
== equal to
! = not equal to
< less than
<= less than or equal to
> greater than
>= greater than or equal to
[ ] subscription
<< output
>> input (a word only)
Some of the functions supported by string class are:
append( ) appends full/part of a string
empty( ) returns true if the string is empty
erase( ) removes no. of characters specified from the given position
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 22 of 28
find( ) searches for the occurrence of a substring
insert( ) inserts no. of characters specified at the given position
length( ) returns length of the string
replace( ) replaces characters of a string with another string/substring
swap( ) swaps the string contents with the invoked string
substr( ) returns a substring of the given string
find_first_of( )returns the position of first occurrence of the given character
find_last_of( ) returns the position of last occurrence of the given character
/ / St r i ng Mani pul at i on
#i ncl ude <i ost r eam>
#i ncl ude <st r i ng>
usi ng namespace st d;
i nt mai n( )
{
st r i ng s1( "ANSI ") , s2( "St andar d ") , s3, s4;
cout << "Ent er st r i ng : ";
get l i ne( ci n, s3, ' \ n' ) ; / / r ead st r i ng unt i l ent er key i s pr essed
s4=s3;
i f ( s3 == s4)
cout << "\ nSt r i ngs s3 and s4 ar e i dent i cal ";
s4 = s1 + s2 + s3;
cout << "\ ns4 = s1 + s2 + s3 : " << s4;
st r i ng s5 = "I SO ";
cout << "\ ns2. i nser t ( 0, s5) : " << s2. i nser t ( 0, s5) ;
cout << "\ ns2. append( s3) : " << s2. append( s3) ;
cout << "\ ns4. er ase( 7, 3) : " << s4. er ase( 7, 3) ;
cout << "\ ns2. r epl ace( 0, 3, s1) : " << s2. r epl ace( 0, 4, s1) ;
s1. swap( s3) ;
cout << "\ ns1 : " << s1 << "\ t s3: " << s3;
cout << "\ ns2. f i nd( C ++) : " << s2. f i nd( "C ++") ;
cout << "\ ns2. subst r ( 7, 3) : " << s2. subst r ( 7, 3) ;
cout << "\ ns2. f i nd_f i r st _of ( ' a' ) : " << s2. f i nd_f i r st _of ( ' a' ) ;
cout << "\ ns1. empt y( ) : " << ( s1. empt y( ) ? "yes" : "no") ;
f or ( i nt i =0; i <s2. l engt h( ) ; i ++)
cout << s2[ i ] ;
r et ur n 0;
}
File Handling
A file is a collection of related data stored on an auxiliary storage device. In disk I/O, input refers
to reading from a file and output is writing contents to a file. The file <fstream> should be
included for disk I/O. C++provides the following classes for high-level file handling.
ifstream Provides input operations. Contains functions such as open( ), get( ), read( ), tellg( )
seekg( ), getline( ) inherited from classesfstreambase andistream
ofstream Provides output operations. Contains functions such as open( ), put( ), write( ),
tellp( ), seekp( ) inherited from classesfstreambase andostream
fstream Provides input / output operations and inherits functions from iostream and
fstreambase. Contains open( ) with default input mode.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 23 of 28
File Opening
A file should be opened prior to any I/O operations. It could be done either using the stream's
constructor or byopen( ) method preferably.
filestreamclass fsobj;
fsobj.open("filename", mode);
Themode parameter need not be specified if the object is of class either ifstream or ofstream. It
outlines the purpose for which the file is opened. It is a constant defined inios class. The mode
can combine two or more constants separated by bitwise operator (| ).
ios::in Open for reading (default for ifstream)
ios::out Open for writing (default for ofstream)
ios::ate Start reading or writing at end of file
ios::app Start writing at end of file
ios::trunc Truncate file to zero length if it exists
ios::nocreate Error when opening if file does not already exist
ios::noreplace Error when opening for output if file already exists, unless ate or app is set
ios::binary Open file in binary (not text) mode
Detecting end-of-file
When a file is read little by little, eventually an end-of-file (EOF) condition will be encountered.
The EOF is a signal sent to the program from the operating system when there is no more data to
read. Detecting EOF is necessary to prevent any further reading using while loop. Thefileobject
returns0 when end-of-file is reached causing the loop to terminate.
while( fsobj)
{
. . .
}
ios
istream ostream
iostream
ifstream ofstream
fstream
fstreambase
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 24 of 28
Error Handling
Errors may occur during file handling due to the following:
To open a non-existent file
To create a file with a name that already exists
Reading past end-of-file
Insufficient disk space
Attempt to perform operation on a file for which it is not opened
The followingios error status function can be used to check file stream status for specific error.
eof( ) returns true if end-of-file is encountered
fail( ) returns true if the input or output operation has failed
bad( ) returns true if an invalid operation is attempted
good( ) returns true if no error has occurred
Closing files
When the program terminates, the fileobject goes out of scope. This calls the destructor that
closes the file. However, it is recommended to close the files using close( ). A closedfileobject
can be opened in another mode.
fsobj.close( );
Command-line arguments
The arguments supplied to main( ) function from the command prompt is known as command-
line arguments. The prototype of main function to facilitate command-line arguments is
int main(int argc, char* argv[])
{
. . .
}
The system stores the command-line arguments as strings in memory, and creates an array of
pointers to these strings argv[ ]. The first command-line argument is always the name of the
program (argv[0]) and the arguments are delimited by space. The argument argc value is total
number of arguments, computed implicitly as shown in the example
>cp f r i end. cpp a. cpp
argc =3
argv[0] =cp
argv[1] =friend.cpp
argv[2] =a.cpp
Character I/O
The get( ) and put( ) function, members of istream and ostream, handle single character at a
time. The get( ) function reads a single character from the associated stream whereas the put( )
function is used to write a single character onto the stream. Both these functions are generally
executed within a loop to perform the I/O operation in a sequential manner.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 25 of 28
fsobj.get(charvar);
fsobj.put(charvar);
The following program imitates the DOS command copy through get( ) and put( ) function with
arguments supplied from the command line.
/ / Fi l e Copy usi ng Char act er I / O
#i ncl ude <f st r eam. h>
#i ncl ude <i ost r eam. h>
i nt mai n( i nt ar gc, char *ar gv[ ] ) / / Command l i ne ar gument s
{
i f st r eamSr c;
of st r eamDest ;
char ch;
t r y / / Moni t or t he code segment f or except i on
{
i f ( ar gc ! = 3) t hr ow ( ar gc) ; / / Check f or pr oper ar gument s
Sr c. open( ar gv[ 1] ) ;
i f ( Sr c. f ai l ( ) ) t hr ow ar gv[ 1] ; / / Non- avai l abi l i t y of sour ce
Dest . open( ar gv[ 2] ) ;
i f ( Dest . f ai l ( ) ) t hr ow ar gv[ 2] ; / / Run- out of memor y
whi l e ( Sr c) / / Loop t i l l end- of - f i l e
{
Sr c. get ( ch) ; / / Read a char act er f r omsour ce
Dest . put ( ch) ; / / Wr i t e t he same ont o dest i nat i on
}
cout << "\ nFi l e Copi ed Successf ul l y! ! ! \ n";
Sr c. cl ose( ) ; / / Fi l es cl osed & buf f er s f l ushed
Dest . cl ose( ) ;
}
cat ch ( i nt c) / / Except i on handl er s over l oaded
{
cer r << "\ nI nval i d Ar gument s f or Fi l e copy! ! ! ";
cer r << "\ nUsage: cp <sr cf i l e> <dest f i l e>\ n";
}
cat ch ( char *name)
{
i f ( st r cmp( name, ar gv[ 1] ) == 0)
cer r << "\ nSour ce Fi l e Not Acccessi bl e! ! ! \ n";
el se
{
cer r << "\ nI nsuf f i ci ent memor y! ! ! \ n";
Sr c. cl ose( ) ;
}
}
cat ch( . . . ) / / Gener al handl er
{
cer r << "Unhandl ed except i on! ! ! ";
}
r et ur n 0;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 26 of 28
File Pointers
Each file object has two associated file pointers namely theget (input) andput (output) pointer.
These pointers point to file locations where read/write operations is done. For each file I/O, the
appropriate pointer is automatically advanced. The default action for pointers on opening a file in
ios::in get pointer is moved to beginning of the file
ios::out put pointer is moved to beginning of the file
ios::app put pointer is moved to end of the file
To facilitate random access, the file pointers can be moved to the desired location using the
functions listed below:
seekg(abspos)
seekg(offset, refpos)
f obj . seekg( 10, i os: : cur )
Moves the get pointer to the specified position. The positional
value is in bytes and may beabsolute or relative. If relative, the
refposition is a constant that may be ios::beg, ios::cur, or
ios::end referring to start, current, and end of the file.
seekp(abspos)
seekp(offset, refpos)
f obj . seekp( 100)
Like seekg, seekp( ) moves the put pointer to the desired
location. In the given example, the put pointer is moved to the
100
th
byte.
tellg( ) Returns the current position of the get pointer.
tellp( ) Returns the current position of the put pointer.
Object I/O
The functionsread( ) andwrite( ) treats object as a single unit and helps to perform object I/O in
binary form i.e. the computer's internal representation of data. Hence no conversion of data is
required during file I/O. The read( ) and write( ) take two arguments, address of the object type-
casted to achar pointer and size of the object.
fobj.read( (char *) &obj, sizeof(obj) );
fobj.write( (char *) &obj, sizeof(obj) );
The following program presents a user-friendly menu that facilitates addition, updation, removal
and listing products in the inventory file.
/ / Obj ect f i l e I / O
#i ncl ude <f st r eam. h>
#i ncl ude <i ost r eam. h>
#i ncl ude <i omani p. h>
cl ass pr oduct
{
i nt code;
char name[ 20] ;
f l oat r at e;
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 27 of 28
publ i c:
voi d get dat a( )
{
cout << "\ nEnt er pr oduct code, name, pr i ce : ";
ci n >> code >> name >> r at e;
}
voi d put dat a( )
{
cout << code << "\ t " << name << "\ t " << set i osf l ags( i os: : f i xed)
<< set pr eci si on( 2) << r at e << "\ n";
}
i nt pr odcode( ) { r et ur n code; }
};
i nt mai n( )
{
f st r eamf obj , t mp;
pr oduct P1, P2;
i nt ch, pi d, f l g, x, c;
whi l e( 1)
{
cout << "\ n1. Append \ n2. Updat e \ n3. Del et e \ n4. Di spl ay \ n5. Qui t ";
cout << "\ nEnt er your choi ce : ";
ci n >> ch;
swi t ch( ch)
{
case 1:
f obj . open( "i nvent . dat ", i os: : app| i os: : bi nar y) ;
P1. get dat a( ) ;
f obj . wr i t e( ( char *) &P1, si zeof ( P1) ) ;
f obj . cl ose( ) ;
br eak;
case 2:
f obj . open( "i nvent . dat ",
i os: : i n| i os: : out | i os: : at e| i os: : bi nar y) ;
f l g = c = 0;
cout << "\ nEnt er pr oduct code : ";
ci n >> pi d;
f obj . cl ear ( ) ;
f obj . seekg( 0) ;
whi l e( f obj . r ead( ( char *) &P1, si zeof ( P1) ) )
{
x=P1. pr odcode( ) ;
c++;
i f ( pi d == x)
{
cout << "\ nExi st i ng det ai l s ar e\ n";
P1. put dat a( ) ;
f l g=1;
cout << "\ nEnt er new det ai l s\ n";
P2. get dat a( ) ; / / Randomaccess
f obj . seekp( ( c- 1) *si zeof ( P1) , i os: : beg) ;
f obj . wr i t e( ( char *) &P2, si zeof ( P2) ) ;
cout << "\ nPr oduct Updat ed";
br eak;
}
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 2 Notes
Page 28 of 28
f obj . cl ose( ) ;
i f ( f l g == 0)
cout << "\ n The pr oduct does not exi st ";
br eak;
case 3:
f obj . open( "i nvent . dat ", i os: : i n| i os: : bi nar y) ;
t mp. open( "Temp. dat ", i os: : out | i os: : bi nar y) ;
f l g=0;
cout << "\ nEnt er pr oduct code : ";
ci n >> pi d;
whi l e( f obj . r ead( ( char *) &P1, si zeof ( P1) ) )
{
x=P1. pr odcode( ) ;
i f ( pi d == x)
{
cout << "\ nPr oduct det ai l s ar e\ n";
P1. put dat a( ) ;
f l g=1;
cout << "\ nPr oduct Del et ed";
}
el se
t mp. wr i t e( ( char *) &P1, si zeof ( P1) ) ;
}
f obj . cl ose( ) ;
t mp. cl ose( ) ;
i f ( f l g == 0)
cout << "\ n The pr oduct does not exi st ";
el se
{
f obj . open( "i nvent . dat ", i os: : out | i os: : bi nar y) ;
t mp. open( "Temp. dat ", i os: : i n| i os: : bi nar y) ;
whi l e( t mp. r ead( ( char *) &P1, si zeof ( P1) ) )
{
f obj . wr i t e( ( char *) &P1, si zeof ( P1) ) ;
}
f obj . cl ose( ) ;
t mp. cl ose( ) ;
}
br eak;
case 4:
cout << "\ nCode\ t Name\ t Rat e\ n";
f obj . open( "i nvent . dat ", i os: : i n| i os: : bi nar y) ;
whi l e ( f obj . r ead( ( char *) &P1, si zeof ( P1) ) )
P1. put dat a( ) ;
f obj . cl ose( ) ;
br eak;
def aul t :
r et ur n 0;
}
}
}
S.K.Vijai Anand cseannauniv.blogspot.com
Algorithm Analysis
Algorithm
An algorithm is a clearly specified set of simple instructions to be followed to solve a problem. It
can also be defined as a sequence of instructions that transform the given input to the desired
output. Also how much resources, such as time or space, the algorithm will require, needs to be
determined.
The essential properties of an algorithm are:
Finiteness
Unambiguous
Well-defined sequence
Feasibility
Input
Output
The qualities of good algorithm are:
Simple but powerful and general solutions
Could be easily understood
Clearly defined solutions
Economical in use of computer time, storage and other resources
Well documented
Not dependent on any specific hardware
Can be used as sub procedures for other programs
General Rules
RULE 1-FOR LOOPS:
The running time of a for loop is at most the running time of the statements inside the for loop
(including tests) times the number of iterations.
The following code is of O(N).
unsi gned i nt sum( i nt n )
{
unsi gned i nt i , par t i al _sum;
par t i al _sum= 0;
f or ( i =1; i <=n; i ++ )
par t i al _sum+= i *i *i ;
r et ur n( par t i al _sum) ;
}
S.K.Vijai Anand cseannauniv.blogspot.com
RULE 2-NESTED FOR LOOPS:
The total running time of a statement inside a group of nested for loops is the running time of the
statement multiplied by the product of the sizes of all for loops.
As an example, the following program fragment is O(N
2
):
f or ( i =0; i <n; i ++ )
f or ( j =0; j <n; j ++ )
k++;
RULE 3-CONSECUTIVE STATEMENTS:
These just add i.e., the maximum is the one that counts
As an example, the following program fragment, which has O(N) work followed by O(N
2
)
work, is also O(N
2
)
f or ( i =0; i <n; i ++)
a[ i ] = 0;
f or ( i =0; i <n; i ++ )
f or ( j =0; j <n; j ++ )
a[ i ] += a[ j ] + i + j ;
RULE 4-lF/ELSE:
if ( cond )
S1
else
S2
The running time of an if/else statement is never more than the running time of the test plus the
larger of the running times of S1 and S2.
If the recursion is really just a thinly veiled for loop, the analysis is usually trivial. For instance,
the following function is really just a simple loop and is obviously O(N):
unsi gned i nt f act or i al ( unsi gned i nt n )
{
i f ( n <= 1 )
r et ur n 1;
el se
r et ur n( n * f act or i al ( n- 1) ) ;
}
When recursion is properly used, it is difficult to convert the recursion into a simple loop
structure.
S.K.Vijai Anand cseannauniv.blogspot.com
Logarithms in the Running Time
An algorithm is O(log N) if it takes constant O(1) time to cut the problem size by a fraction
(which is usually ).
Binary search
Given an integer x and integers A
0
, A
1
, A
2
, A
n-1
, which are presorted and already in memory.
Find I such that Ai =X, returnI =-1 if X is not found.
The obvious solution consists of scanning through the list from left to right and runs in linear
time. However, this algorithm does not take advantage of the fact that the list is sorted and is thus
not likely to be best. The best strategy is to check if X is the middle element. If so, the answer is
at hand. If X is smaller than the middle element, apply the same strategy to the sorted subarray to
the left of the middle element; otherwise look to the right half. Thus running time is O(log N).
i nt Bi nar ySear ch( const El ement Type A[ ] , El ement Type X, i nt N )
{
i nt Low, Mi d, Hi gh;
Low = 0; Hi gh = N - 1;
whi l e( Low <= Hi gh )
{
Mi d = ( Low + Hi gh ) / 2;
i f ( A[ Mi d ] < X )
Low = Mi d + 1;
el se
i f ( A[ Mi d ] > X )
Hi gh = Mi d - 1;
el se
r et ur n Mi d; / * Found */
}
r et ur n Not Found; / * Not Found i s def i ned as - 1 */
}
Euclid's Algorithm
A second example is Euclid's algorithm for computing the greatest common divisor.
unsi gned i nt Gcd( unsi gned i nt M, unsi gned i nt N )
{
unsi gned i nt Rem;
whi l e( N > 0 )
{
Rem= M %N;
M = N;
N = Rem;
}
r et ur n M;
}
S.K.Vijai Anand cseannauniv.blogspot.com
The greatest common divisor (gcd) of two integers is the largest integer that divides both. The
algorithm computes Gcd(M, N), assuming M>N. (If N >M, the first iteration of the loop swaps
them). The algorithm works by continually computing remainders until 0 is reached. The last
nonzero remainder is the answer. The number of iterations is at most 2 log N =O(log N).
Exponentiation
It deals with raising an integer to a power (which is also an integer). Numbers that result from
exponentiation are generally quite large.
l ong i nt Pow( l ong i nt X, unsi gned i nt N )
{
i f ( N == 0 )
r et ur n 1;
i f ( N == 1 )
r et ur n X;
i f ( I sEven( N ) )
r et ur n Pow( X * X, N / 2 ) ;
el se
r et ur n Pow( X * X, N / 2 ) * X;
}
If N is even, we have X
N
=X
N/2
* X
N/2
, and if N is odd, X
N
=X
(N-1)/2
* X
(N-1)/2
* X.
The obvious algorithm to compute X
N
uses N-1 multiplication. The recursive algorithm does
better. The number of multiplications required is clearly at most 2 log N, the running time is no
longer O(log N).
Best, Worst and Average-case Complexity
Best Case: It is the function defined by the minimum number of steps taken on any instance of
size n.
Worst Case: It is the function defined by the maximum number of steps taken on any instance of
size n.
Average Case: It is the function defined by the average number of steps taken on any instance of
size n.
Worst Case
Average Case
Best Case
Problem size
N
o
.
o
f
S
t
e
p
s
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page1 of 23
List, Stack & Queue ADT
Abstract Data Type (ADT)
It is defined as a mathematical model that makes up the data type as well as the functions that
operate on these objects.
The basic difference between ADT and primitive data types is that primitive data types allow
us to look at the representation, whereas ADT hide the representation.
Examples: lists, sets, stacks, queue, graphs. For the set ADT, operations are union,
intersection, complement, etc
Benefits:
1. Code is easier to understand
2. Implementation can be changed without the need to change program that uses it
3. ADT is a user-defined data type in which data and a basic set of operations on the
data are bundled together under one name
List Data Type
A list is a sequence of zero or more elements of a given type. It is of the form
A
1
, A
2
, A
3
, . . A
i - 1 ,
A
i ,
A
i +1
. , A
n
In this case the size is the list is n. The list of size 0 is called a null list.
Operations
1. Insertion
2. Deletion
3. Locate
4. Retrieve
5. IsEmpty
Implementation Types
1. Array
2. Linked list
3. Cursor
Array Implementation
In arrays the elements are stored adjacently. Arrays are generally not used to implement lists
for the following reasons:
1. The maximum size of the list should be known (even if array is dynamically
allocated). It is possible to have lists of unknown size.
2. Insertion and deletion areexpensive. For example, inserting at position 0 requires first
pushing the entire array down one spot to make room, whereas deleting the first
element requires shifting all the elements in the list up one. So the worst case of these
operations isO(n).
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page2 of 23
Linked List Implementation
The linked list consists of a series of structures called nodes that are not necessarily stored
adjacently in memory. Each node contains two fields; Element that stores data andNext that
contains address of another node as shown diagrammatically.
Example
It is stored in memory as
Data Structure
st r uct node
{
El ement Type El ement ;
Node Next ;
};
/* Insert after legal Position P. Header implementation assumed. */
voi d i nser t ( El ement Type X, Li st L, Posi t i on P )
{
Posi t i on TmpCel l ;
TmpCel l = mal l oc( si zeof ( st r uct node) ) ;
i f ( TmpCel l == NULL )
Fat al Er r or ( "Out of space! ! ! " ) ;
TmpCel l - >El ement = X;
TmpCel l - >Next = P- >Next ;
P- >Next = TmpCel l ;
}
/* Uses a header. If element is not found, then returned value is NULL */
Posi t i on Fi ndPr evi ous( El ement Type X, Li st L )
{
Posi t i on P;
P = L;
whi l e( ( P- >Next ! = NULL) && ( P- >Next - >El ement ! = X) )
P = P- >Next ;
r et ur n P;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page3 of 23
/* Return true if P is last node in the list */
i nt I sLast ( Posi t i on P, Li st L )
{
r et ur n( P- >Next == NULL ) ;
}
/* Delete first occurrence of X from a list */
voi d del et e( el ement _t ype X, Li st L )
{
Posi t i on P, TmpCel l ;
P = Fi ndPr evi ous( X, L ) ;
i f ( ! I sLast ( P, L) )
{
TmpCel l = P- >Next ;
P- >Next = TmpCel l - >next ;
f r ee( TmpCel l ) ;
}
}
/* Return position of x in L; NULL if not found */
Posi t i on Fi nd ( el ement _t ype x, Li st L )
{
Posi t i on P;
P = L- >next ;
whi l e( ( P ! = NULL) && ( P- >El ement ! = x) )
P = P- >Next ;
r et ur n P;
}
/* Return true if L is empty */
i nt I sEmpt y( Li st L )
{
r et ur n( L- >next == NULL ) ;
}
/* Empty the given list */
Li st MakeEmpt y ( Li st L )
{
i f ( L ! = NULL)
Del et eLi st ( L ) ;
L = mal l oc ( si zeof ( st r uct Node) ) ;
i f ( L == NULL )
Fat al Er r or ( Out of Memor y) ;
L- >Next = NULL;
r et ur n L;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page4 of 23
/* Header Position */
Posi t i on Header ( Li st L)
{
r et ur n L;
}
/* Free nodes and release memory */
voi d Del et eLi st ( Li st L )
{
Posi t i on P, Tmp;
P = L- >Next ;
L- >Next = NULL;
whi l e ( P ! = NULL )
{
Tmp = P- >Next ;
f r ee ( P) ;
P = Tmp;
}
}
/* First node of the list */
Posi t i on Fi r st ( Li st L )
{
r et ur n L- >Next ;
}
/* Move to next node */
Posi t i on Advance ( Posi t i on P )
{
r et ur n P- >Next ;
}
/* Retrieve node data */
El ement Type Ret r i eve ( Posi t i on P )
{
r et ur n P- >El ement ;
}
The linked list avoids linear cost of insertion and deletion as in arrays. All operations except
Find and FindPrevious take O(1) since only fixed operations are performed. Since for Find
and FindPrevious routine, the entire list has to be traversed, the running time isO(n).
Doubly Linked List
Doubly linked list is a collection of nodes like singly linked list but has an extra field
containing address of the previous node. Therefore the traversal can be forward as well as
backward. The node structure is shown below:
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page5 of 23
ThePrevious of first node andNext of last node is NULL in a doubly linked list.
The additional field adds to space requirement and doubles the cost of insertions because the
additional pointer also has to taken care of. It simplifies deletion for the information about
location of previous node is available in the node itself.
All operations for single linked list are also applicable to doubly linked list.
Circular Linked List
In circular linked list, the last node points back to the first node. Circular linked list can be
implemented as either single or doubly linked circular list. The advantages are:
1. It allows the list to be traversed easily as any node can be the starting point.
2. Quick access to the first and last nodes
Application of Linked List
1. Polynomial ADT
2. Radix sort
3. Multilist
Polynomial ADT
A single variable Polynomial with non-negative exponents can be expressed using a list. It is
of the form
P(x) =a
1
x
n
+a
2
x
n-1
+ +a
n
x +k =
The Polynomial consist of two parts namely coefficient and exponent. The two operations on
Polynomial are addition and multiplication. Polynomials can be implemented using arrays or
linked lists
Array Implementation
The polynomial data structure is represented as
t ypedef st r uct
{
i nt Coef f Ar r ay[ MaxDegr ee+1 ] ;
unsi gned i nt Hi ghPower ;
} *Pol ynomi al ;
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page6 of 23
/* Initialize Polynomial */
voi d Zer oPol ynomi al ( Pol ynomi al Pol y )
{
i nt i ;
f or ( i = 0; i <= MaxDegr ee; i ++ )
Pol y- >Coef f Ar r ay[ i ] = 0;
Pol y- >Hi ghPower = 0;
}
Polynomial Addition is done by adding the coefficients of corresponding exponents
voi d AddPol ynomi al ( const Pol ynomi al Pol y1, const Pol ynomi al
Pol y2, Pol ynomi al Pol ySum)
{
i nt i ;
Zer oPol ynomi al ( Pol ySum) ;
Pol ySum- >Hi ghPower = Max( Pol y1- >Hi ghPower ,
Pol y2- >Hi ghPower ) ;
f or ( i = Pol ySum- >Hi ghPower ; i >= 0; i - - )
Pol ySum- >Coef f Ar r ay[ i ] = Pol y1- >Coef f Ar r ay[ i ]
+ Pol y2- >Coef f Ar r ay[ i ] ;
}
Polynomial multiplications is done by adding the exponents and multiply the coefficients of
each term of one polynomial with the other one.
voi d Mul t Pol ynomi al ( Pol ynomi al Pol y1, Pol ynomi al Pol y2,
Pol ynomi al Pol yPr od )
{
unsi gned i nt i , j ;
Zer oPol ynomi al ( Pol yPr od ) ;
Pol yPr od- >Hi ghPower = Pol y1- >Hi ghPower
+ Pol y2- >Hi ghPower ;
i f ( Pol yPr od- >Hi ghPower > MaxDegr ee )
er r or ( "Exceeded ar r ay si ze") ;
el se
f or ( i =0; i <=pol y- >Hi ghPower ; i ++ )
f or ( j =0; j <=Pol y2- >Hi ghPower ; j ++ )
Pol yPr od- >Coef f Ar r ay[ i +j ] +=
Pol y1- >Coef f Ar r ay[ i ] * Pol y2- >Coef f Ar r ay[ j ] ;
}
Polynomial ADTLinked List
If the polynomial is dense (most terms are present), then arrays would suffice. However, if
most terms are missing then running time would be spent mostly on processing non-existent
parts of the polynomials, thereby unacceptable. Linked lists provide a viable alternate. The
polynomial node in a linked list consist of three parts namelyCoefficient, Exponent andNext.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page7 of 23
The polynomial ADT structure is given below:
t ypedef st r uct Node *Pt r t oNode;
st r uct node
{
i nt Coef f i ci ent ;
i nt Exponent ;
Pt r t oNode Next ;
} ;
t ypedef Pt r t oNode Pol ynomi al ;
In linked lists, the polynomial nodes are stored in decreasing order of their exponents. A
sparse polynomials and their linked list representation is given below
P
1
(X) =10X
1000
+5X
14
+1 and P
2
(X) =3X
1990
- 2X
1492
+11X +5
/* Polynomial Linked list addition */
voi d AddPol ynomi al ( Pol ynomi al Pol y1, Pol ynomi al Pol y2,
Pol ynomi al Pol y3)
{
whi l e ( Pol y1 ! = NULL && Pol y2 ! = NULL)
{
NewNode = mal l oc ( si zeof ( st r uct Pol ynomi al ) ) ;
i f ( Pol y1- >Exponent = Pol y2- >Exponent )
{
NewNode- >Coef f i ci ent = Pol y1- >Coef f i ci ent +
Pol y2- >Coef f i ci ent ;
NewNode- >Exponent = Pol y1- >Exponent ;
NewNode- >Next = NULL;
Pol y3 = Cr eat ePol y ( Pol y3, NewNode) ;
Pol y1 = Pol y1- >Next ;
Pol y2 = Pol y2- >Next ;
}
el se
{
i f ( Pol y1- >Exponent > Pol y2- >Exponent )
{
NewNode- >Coef f i ci ent = Pol y1- >Coef f i ci ent
NewNode- >Exponent = Pol y1- >Exponent ;
NewNode- >Next = NULL;
Pol y3 = Cr eat ePol y ( Pol y3, NewNode) ;
Pol y1 = Pol y1- >Next ;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page8 of 23
el se
{
NewNode- >Coef f i ci ent = Pol y2- >Coef f i ci ent
NewNode- >Exponent = Pol y2- >Exponent ;
NewNode- >Next = NULL;
Pol y3 = Cr eat ePol ( Pol y3, NewNode) ;
Pol y2 = Pol y2- >Next ;
}
}
}
/* Appending nodes to Polynomial */
Pol ynomi al Cr eat ePol y ( Pol ynomi al Pol y, Pol ynomi al New)
{
Pol ynomi al *Pt r ;
i f ( Pol y == NULL)
{
Pol y- >Next = New;
r et ur n Pol y;
}
el se
{
Pt r = Pol y;
whi l e ( Pt r - >Next ! = NULL)
Pt r = Pt r - >Next ;
Pt r - >Next = New;
r et ur n Pol y;
}
}
Radix Sort
Radix sort / bucket sort / card sort was used to sort old-style punch cards. Radix sort is the
generalized term of bucket sort.
1. It is performed using buckets 0 to 9.
2. The trick is to use several passes of the bucket sort and bucket-sorting according to
least significant digit.
3. In the first pass, all elements are sorted according to least significant digit
4. In second pass, all numbers are arranged according to the next least significant
digit and so on
5. The passes are repeated until it reaches the most significant digit.
The number of passes in a radix sort depends upon the number of digits in the given number.
The running time isO(P(N+B)) where P is the number of passes, N is the number of elements
to sort and B is number of buckets. The algorithm would fail, if two numbers are out of the
bucket in the wrong order.
As an example consider 10 numbers in the range 0 to 999 randomly arranged as 64, 8, 216,
512, 27, 729, 0, 1, 343, 125.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page9 of 23
Pass1
Pass2
Pass3
Multilists
Multilist is a combination of two or more lists into one. Consider the example, in which a
university with 40,000 students and 2,500 courses, needs to generate a report that lists
registration for each class and the classes that each student has registered for.
If the above is implemented using two dimensional array then it would result in 100 million
entries, of which roughly 0.1% has meaningful data.
This could be solved by using two singly linked circular lists with header node: one for each
class containing students and another for each student containing classes the student has
registered for. Using a circular list saves space, but at the expense of time.
For instance, to list all students in class C3, start at C3 and traverse the list, to find S1, S3, S4
and S5 as members of the class. Similarly, the classes for which a student has registered can
be determined.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page10 of 23
Cursor Implementation
The cursor implementation of linked list is used by languages that do not support pointers
such as BASIC and FORTRAN. The cursor implementation simulates linked list as follows:
1. Data is stored in a global array of structure known asCursorSpace array. The array
index is used to specify the cell address.
2. A list calledfreelist, not part of any list is maintained. The list uses cell 0 as a header.
/* Cursor implementation of Linked lists */
st r uct Node
{
El ement Type El ement ;
Posi t i on Next ;
};
st r uct Node Cur sor Space[ SpaceSi ze ] ;
A value of 0 for Next is equivalent of a NULL pointer. To perform a malloc, the first element
after the header is removed from the freelist. The free operation is implemented by placing
the cell in front of the freelist.
An initialized CursorSpace array and an example of cursor implementation is shown below.
In the given example, if L=5, then L represents the list a, b, e. If M=3, it represents the list c,
d, f.
The following code implements malloc and free operation of linked list using cursors.
/* Cursor Alloc - If space is not available, then P is set to 0. */
st at i c Posi t i on Cur sor Al l oc( voi d )
{
Posi t i on P;
P = Cur sor Space[ 0 ] . Next ;
Cur sor Space[ 0 ] . Next = Cur sor Space[ P ] . Next ;
r et ur n P;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page11 of 23
/* Cursor Free */
st at i c voi d Cur sor Fr ee( Posi t i on P )
{
Cur sor Space[ P ] . Next = Cur sor Space[ 0 ] . Next ;
Cur sor Space[ 0 ] . Next = P;
}
/* Initialize CursorSpace array */
voi d I ni t i al i zeCur sor Space( voi d )
{
i nt i ;
f or ( i = 0; i < SpaceSi ze; i ++ )
Cur sor Space[ i ] . Next = i + 1;
Cur sor Space[ SpaceSi ze - 1 ] . Next = 0;
}
/* Whether a list is empty*/
i nt I sEmpt y( Li st L )
{
r et ur n Cur sor Space[ L ] . Next == 0;
}
/* Whether current position is last in the list*/
i nt I sLast ( Posi t i on P, Li st L )
{
r et ur n Cur sor Space[ P ] . Next == 0;
}
/* Insert after position P*/
voi d I nser t ( El ement Type X, Li st L, Posi t i on P )
{
Posi t i on TmpCel l ;
TmpCel l = Cur sor Al l oc( ) ;
i f ( TmpCel l == 0 )
Fat al Er r or ( " Out of space! ! ! " ) ;
Cur sor Space[ TmpCel l ] . El ement = X;
Cur sor Space[ TmpCel l ] . Next = Cur sor Space[ P ] . Next ;
Cur sor Space[ P ] . Next = TmpCel l ;
}
/* Header Position*/
Posi t i on Header ( Li st L )
{
r et ur n L;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page12 of 23
/* Delete from a list*/
voi d Del et e( El ement Type X, Li st L )
{
Posi t i on P, TmpCel l ;
P = Fi ndPr evi ous( X, L ) ;
i f ( ! I sLast ( P, L ) )
{
TmpCel l = Cur sor Space[ P ] . Next ;
Cur sor Space[ P ] . Next = Cur sor Space[ TmpCel l ] . Next ;
Cur sor Fr ee( TmpCel l ) ;
}
}
The freelist represents an interesting data structure in its own right. The cell that is removed
from the freelist is the one that was most recently placed there. The last cell placed on the
freelist is the first cell taken off (similar to stack).
/* If X is not found then returned value is zero */
Posi t i on Fi ndPr evi ous( El ement Type X, Li st L )
{
Posi t i on P;
P = L;
whi l e( Cur sor Space[ P ] . Next &&
Cur sor Space[ Cur sor Space[ P ] . Next ] . El ement ! = X )
P = Cur sor Space[ P ] . Next ;
r et ur n P;
}
/* Next element*/
Posi t i on Advance( Posi t i on P )
{
r et ur n Cur sor Space[ P ] . Next ;
}
/*Clean up the list */
Li st MakeEmpt y( Li st L )
{
i f ( L ! = NULL )
Del et eLi st ( L ) ;
L = Cur sor Al l oc( ) ;
i f ( L == 0 )
Fat al Er r or ( " Out of memor y! " ) ;
Cur sor Space[ L ] . Next = 0;
r et ur n L;
}
/* Retrieve contents*/
El ement Type Ret r i eve( Posi t i on P )
{
r et ur n Cur sor Space[ P ] . El ement ;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page13 of 23
/* Delete entire List*/
voi d Del et eLi st ( Li st L )
{
Posi t i on P, Tmp;
P = Cur sor Space[ L ] . Next ;
Cur sor Space[ L ] . Next = 0;
whi l e( P ! = 0 )
{
Tmp = Cur sor Space[ P ] . Next ;
Cur sor Fr ee( P ) ;
P = Tmp;
}
}
/* whether first element */
Posi t i on Fi r st ( Li st L )
{
r et ur n Cur sor Space[ L ] . Next ;
}
Stack ADT
A stack is a list with the restrict that insertions (push) and deletions (pop) can be performed
only at end of the list called top. Stack is also known as LIFO (last in first out) list. The
fundamental operations are Push and Pop. The most recently inserted element can be
retrieved using the Top routine. A Pop or Top on an empty stack is an ADT error, whereas
insufficient memory for Push is an implementation error. The stack ADT and an example is
shown below:
Stacks are extensively used in modern computers. The basic design for interrupt handling and
procedure calls is implemented using stacks. Examples of stack in day-to-day life are optical
discs stacked up in a spindle, plates stacked up in a dining table, etc.
The data structure for stack implementation using single linked list is
st r uct Node
{
El ement Type El ement ;
Pt r ToNode Next ;
};
Creation of an empty stack is done by creating a header node and the MakeEmpty routine sets
the Next pointer to NULL.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page14 of 23
St ack Cr eat eSt ack( voi d )
{
St ack S;
S = mal l oc( si zeof ( st r uct Node ) ) ;
i f ( S == NULL )
Fat al Er r or ( " Out of space! ! ! " ) ;
S- >Next = NULL;
MakeEmpt y( S ) ;
r et ur n S;
}
/* Routine to empty stack */
voi d MakeEmpt y( St ack S )
{
i f ( S == NULL )
Er r or ( " Must use Cr eat eSt ack f i r st " ) ;
el se
whi l e( ! I sEmpt y( S ) )
Pop( S ) ;
}
/* Routine to check whether stack is empty */
i nt I sEmpt y( St ack S )
{
r et ur n S- >Next == NULL;
}
/* Routine to free up the stack memory */
voi d Di sposeSt ack( St ack S )
{
MakeEmpt y( S ) ;
f r ee( S ) ;
}
The Push is implemented as an insertion in front of the linked list, the front serving as Top of
the stack. Stack overflow is a situation when a new data is to be inserted but there is no
available space
voi d Push( El ement Type X, St ack S )
{
Pt r ToNode TmpCel l ;
TmpCel l = mal l oc( si zeof ( st r uct Node ) ) ;
i f ( TmpCel l == NULL )
Fat al Er r or ( " Out of space! ! ! " ) ;
el se
{
TmpCel l - >El ement = X;
TmpCel l - >Next = S- >Next ;
S- >Next = TmpCel l ;
}
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page15 of 23
The Top is implemented by retrieving the element in the first node of the list.
El ement Type Top( St ack S )
{
i f ( ! I sEmpt y( S ) )
r et ur n S- >Next - >El ement ;
Er r or ( " Empt y st ack" ) ;
r et ur n 0;
}
The Pop is implemented by deleting the first node of the list. Attempting to delete from an
empty stack results in stackunderflow.
voi d Pop( St ack S )
{
Pt r ToNode Fi r st Cel l ;
i f ( I sEmpt y( S ) )
Er r or ( " Empt y st ack" ) ;
el se
{
Fi r st Cel l = S- >Next ;
S- >Next = S- >Next - >Next ;
f r ee( Fi r st Cel l ) ;
}
}
All Stack operations using linked list takes constant time except for emptiness. The drawback
is that calls to malloc and free is expensive.
Stack ADTArrays
Stacks using arrays is more popular than linked lists. The only requirement is that stack size
should be known in advance. Associated with each stack is TopOfStack, which is -1 for an
empty stack. A stack is defined as a pointer to a structure as
t ypedef st r uct St ackRecor d *St ack
#def i ne Empt yTOS ( - 1 )
#def i ne Mi nSt ackSi ze ( 5 )
st r uct St ackRecor d
{
i nt Capaci t y;
i nt TopOf St ack;
El ement Type *Ar r ay;
};
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page16 of 23
Once the maximum size is known, the stack array is dynamically allocated. All stack
operations are performed at a fast constant time. On some machines Push and Pop can be
written in one machine instruction, operating on a register with auto increment and decrement
addressing.
/* Stack Creation of maximum size requiring argument*/
St ack Cr eat eSt ack( i nt MaxEl ement s )
{
St ack S;
i f ( MaxEl ement s < Mi nSt ackSi ze )
Er r or ( "St ack si ze i s t oo smal l " ) ;
S = mal l oc( si zeof ( st r uct St ackRecor d ) ) ;
i f ( S == NULL )
Fat al Er r or ( " Out of space! ! ! " ) ;
S- >Ar r ay = mal l oc( si zeof ( El ement Type ) * MaxEl ement s ) ;
i f ( S- >Ar r ay == NULL )
Fat al Er r or ( " Out of space! ! ! " ) ;
S- >Capaci t y = MaxEl ement s;
MakeEmpt y( S ) ;
r et ur n S;
}
In linked list creation does not take any arguments. Therefore the calling program should be
aware of the implementation in Stacks.
/* Empty Stack Creation */
voi d MakeEmpt y( St ack S )
{
S- >TopOf St ack = Empt yTOS;
}
/* Free the stack array and structure*/
voi d Di sposeSt ack( St ack S )
{
i f ( S ! = NULL )
{
f r ee( S- >Ar r ay ) ;
f r ee( S ) ;
}
}
To push contents onto the stack, the TopOfStack is incremented and then the value is stored
at the current top.
voi d Push( El ement Type X, St ack S )
{
i f ( I sFul l ( S ) )
Er r or ( "Ful l st ack" ) ;
el se
S- >Ar r ay[ ++S- >TopOf St ack ] = X;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page17 of 23
/* Free the stack memory*/
i nt I sFul l ( St ack S )
{
r et ur n S- >TopOf St ack == S- >Capaci t y - 1;
}
To Pop, the current element at the stack top is returned and then TopOfStack is decremented.
Both Pop and Top routines can be combined.
voi d Pop( St ack S )
{
i f ( I sEmpt y( S ) )
Er r or ( "Empt y st ack" ) ;
el se
S- >TopOf St ack- - ;
}
/* Return top of the stack */
El ement Type Top( St ack S )
{
i f ( ! I sEmpt y( S ) )
r et ur n S- >Ar r ay[ S- >TopOf St ack ] ;
Er r or ( "Empt y st ack" ) ;
r et ur n 0;
}
Some applications of stack are balancing symbols, Postfix expressions, Function calls,
Towers of Hanoi, etc.
Balancing Symbols
The common mistake made frequently by programmers is incorrect closing or missing
symbols such as parenthesis. This causes the compiler to report lines of error. A program that
checks whether the symbols are balalnced would be much useful. For example the sequence [
( ) ] is legal, but [ ( ] ) is incorrect. The correct representation of arithmetic expressions
includes
1. There must be equal number of opening and closing symbols
2. The left opening symbol must be balanced with the right closing symbol
A stack can be used to find whether the expression contains balanced symbols as follows:
1. Make an empty stack
2. Read characters until end of file
3. If the character is an opening symbol, push it onto the stack
4. If the stack is empty then report an error
5. Otherwise pop the stack
6. If the symbol popped is not the corresponding opening symbol, then report an error.
7. At end of file, if the stack is not an empty report an error.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page18 of 23
Consider the correct expression: {p +(q [m +n] ) }#
Push ( { ) Push ( ( ) Push ( [ ) Pop ( ] ) Pop ( ) ) Pop ( } )
At the end of the expression, the stack becomes empty, therefore balanced.
Consider the following incorrect expression: ( (x +y) #
Push ( ( ) Push ( ( ) Pop ( ) )
At the end of the expression, the stack is not emty, therefore unbalanced.
Evaluating Expressions
There are three ways of representing an algebraic expression. They are:
1. InfixGeneral notation. Operator embedded between operands. (A+B) * (C-D)
2. Postfix known as reverse polish notation. Operator trails the operand. AB+CD-*
3. Prefix known as polish notation. Operator precedes the operand. *+AB-CD
To evaluate an arithmetic expression, first the infix expression should be converted into
postfix and then evaluated using stack. The advantage of evaluating postfix expression is that
operator precedence does not come into picture.
Infix to Postfix Conversion
1. Read the expression left-to-right, until the delimiter #.
2. If an operand is read, it is placed onto the output.
3. If an operator is read, it is pushed onto the stack. If the stack operator has high or
equal priority than input operator, then pop that operator and place it onto the output
until an entry containing low priority operator.
4. If left parenthesis is read, then push it on to the stack.
5. If right parenthesis is read, pop all operators from the stack till a left parenthesis is
encountered and discard both parentheses in the output.
6. If # is read, then empty the stack.
The algorithm is of the order O(N).
Convert infix expression a +b * c +(d * e +f) * g to postfix
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page19 of 23
Evaluating Postfix expression
The postfix expression is read from left to right until the delimiter #
1. If an operand is read, push its value onto the stack
2. If an operator is read, pop two values from the stack, apply the operator to them and
push the result onto the stack
3. If #, then return top of the stack as result.
The algorithm is of the order O(N).
Evaluate the postfix expression AB*CDE/-+#Assume values 1,2,3,4 and 2 for A, B, C, D
and E respectively
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page20 of 23
Function Calls
When there is a function call, the system needs to save
1. Local variables (register values) of the calling routine, since the called function would
overwrite.
2. The current location (return address) of the routine must be saved so that control
returns after executing the function.
The information saved is called either an activation record or stack frame using stack. The
same set of saves is done during recursion. The current environment is represented at the top
of the stack.
The stack in a computer grows from high end of the memory to downwards. On most systems
there is no check for stack overflow, therefore the program will crash.
Queue ADT
Queues are also lists with insertion are done at one end, whereas deletion is performed at the
other end. The basic operations on queue are
1. EnqueueInserts an element at end of the list calledrear
2. DequeueDeletes an element at start of the list calledfront
Queue is modeled as aFIFO (First In First Out) as shown below:
Like stacks, queue can be implemented using arrays or linked list, both having fast runtimes
O(1). For each queue data structure, it consist of an array, positions front and rear, maximum
and current queue size.
#def i ne Mi nQueueSi ze ( 5 )
st r uct QueueRecor d
{
i nt Capaci t y; / * Max queue si ze */
i nt Fr ont ;
i nt Rear ;
i nt Si ze; / * No. of el ement s i n queue */
El ement Type *Ar r ay;
};
The create queue creates a queue of the specified size through dynamic memory allocation.
Queue Cr eat eQueue( i nt MaxEl ement s )
{
Queue Q;
i f ( MaxEl ement s < Mi nQueueSi ze )
Er r or ( " Queue si ze i s t oo smal l " ) ;
Q = mal l oc( si zeof ( st r uct QueueRecor d ) ) ;
i f ( Q == NULL )
Fat al Er r or ( " Out of space! ! ! " ) ;
Q- >Ar r ay = mal l oc( si zeof ( El ement Type ) * MaxEl ement s ) ;
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page21 of 23
i f ( Q- >Ar r ay == NULL )
Fat al Er r or ( " Out of space! ! ! " ) ;
Q- >Capaci t y = MaxEl ement s;
MakeEmpt y( Q ) ;
r et ur n Q;
}
/* Routine to make an empty queue */
voi d MakeEmpt y( Queue Q )
{
Q- >Si ze = 0;
Q- >Fr ont = 1;
Q- >Rear = 0;
}
/* Routine to check for empty queue */
i nt I sEmpt y( Queue Q )
{
r et ur n Q- >Si ze == 0;
}
/* Routine to check for empty queue */
i nt I sFul l ( Queue Q )
{
r et ur n Q- >Si ze == Q- >Capaci t y;
}
/* Releases the queue memory */
voi d Di sposeQueue( Queue Q )
{
i f ( Q ! = NULL )
{
f r ee( Q- >Ar r ay ) ;
f r ee( Q ) ;
}
}
/* Wrap Around */
st at i c i nt Succ( i nt Val ue, Queue Q )
{
i f ( ++Val ue == Q- >Capaci t y )
Val ue = 0;
r et ur n Val ue;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page22 of 23
To enqueue an element X, increment Size and Rear and set Q[Rear] =X
voi d Enqueue( El ement Type X, Queue Q )
{
i f ( I sFul l ( Q ) )
Er r or ( "Ful l queue" ) ;
el se
{
Q- >Si ze++;
Q- >Rear = Succ( Q- >Rear , Q ) ;
Q- >Ar r ay[ Q- >Rear ] = X;
}
}
To dequeue, return Q[Front], decrement Size and increment Front
El ement Type Fr ont AndDequeue( Queue Q )
{
El ement Type X = 0;
i f ( I sEmpt y( Q ) )
Er r or ( "Empt y queue" ) ;
el se
{
Q- >Si ze- - ;
X = Q- >Ar r ay[ Q- >Fr ont ] ;
Q- >Fr ont = Succ( Q- >Fr ont , Q ) ;
}
r et ur n X;
}
There is one potential problem. Assuming Capacity to be 10, the queue would be full after 10
enqueues, even if dequeue occurs. A simple solution is that whenever Front or Rear gets to
the end of the array, both positions are reset to their initial values. Such a queue is known as
circular queue.
Queue-Linked List Implementation
The data structure for queue implementation using single linked list with pointers Front and
Rear set to NULL
st r uct Node
{
El ement Type El ement ;
Pt r ToNode Next ;
};
st r uct Node *Fr ont =NULL, Rear =NULL;
/* Insert an element into queue at the rear end */
voi d Enqueue( El ement Type X)
{
Pt r ToNode TmpCel l ;
TmpCel l = mal l oc( si zeof ( st r uct Node ) ) ;
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3: List, Stack & Queue ADT
Page23 of 23
i f ( TmpCel l == NULL )
Fat al Er r or ( " Out of space! ! ! " ) ;
el se
{
TmpCel l - >El ement = X;
TmpCel l - >Next = NULL;
i f ( f r ont == NULL)
Fr ont = TmpCel l ;
el se
Rear - >Next = TmpCel l ;
Rear = TmpCel l ;
}
}
/* Remove the element at front of the queue */
voi d Dequeue( )
{
Pt r ToNode Fi r st Cel l ;
i f ( I sEmpt y( S ) )
Er r or ( " Empt y st ack" ) ;
el se
{
Fi r st Cel l = Fr ont ;
Fr ont = Fr ont - >Next ;
f r ee( Fi r st Cel l ) ;
}
}
Some applications of queue usage are:
1. When jobs are submitted, they are arranged in the order of arrival
2. Lines at counters are queues, because service is FCFS
3. Access for files in a file server is also FCFS
4. Calls to call centre are placed on a queue when all operators are busy
5. In universities, where resources are limited, students must sign a waiting list if all
terminals are occupied
A branch of mathematics known as Queueing theory, deals in computing probabilistically
waiting time, service time, etc.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes
Page1 of 11
Priority Queues
Queues generally follows FIFO or FCFS. In real-time environment it is impossible to practice
it. For instance when jobs are submitted to the CPU, the scheduler may let the job with
shortest time requirements to be executed ahead of the longer one, irrespective of its position
in the queue. This type of special queue is known asPriority Queue.
A priority queue is a data structure has the operations Insert and DeleteMin as shown in the
model below.
The Insert operation is equivalent of Enqueue and DeleteMin is the priority queue equivalent
of the queues Dequeue operation. Priority queues have more application besides operating
systems, such as external sorting, greedy algorithm.
Priority queues can be implemented using linked lists but insertion is expensive O(N), since
the list should be sorted. Another way is using binary search tree having runtime O(log N)
but has a set of operations that is not required. Therefore the data structure should support
operations in O(log N) without using pointers.
Binary Heaps
Binary heaps or heaps is a binary tree that is completely filled with the exception at the
lowest level from left to right, as shown in figure.
.
The heap data structure then consist of array, maximum and current heap sizes.
st r uct HeapSt r uct
{
i nt Capaci t y;
i nt si ze;
El ement Type *El ement s;
};
Binary Heap have two properties: thestructure property andheap order property.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes
Page2 of 11
Structure property: Complete binary tree of height h has between 2
h
and 2
h+1
-1 nodes.
Since it is regular, is represented using arrays but the heap size should be known in advance.
For any element in array position i the left child is at position 2i, right child at 2i+1 and
parent at i/2.
Heap order property: In a heap any node should be smaller than all of its descendants. In a
heap for every node X, the key in the parent of X is less than or equal to the key in X. By the
heap order property, the minimum element is always the root. Therefore the additional
operationFindMin takes constant time.
Insert (Enqueue)
1. To insert an element X into the heap, create a hole in the next available location at the
bottom level.
2. If X can be placed into the hole without violating the heap order, then it is placed.
3. Otherwise slide the parent node into the hole, bubbling the hole up toward the root.
4. Continue the process in step 2 & 3 until X can be placed in a hole.
The following is the process to insert 14. Initially a hole is created to the right of 32 ( next
available location). Inserting14 into the hole violates the heap order property, so 31 is slid
down and hole is bubbled up. The strategy is continued until the correct location
This general strategy is known as percolate up. The new element is percolated up until the
correct location is found. If the element to be inserted is the new minimum, it will be pushed
all the way to the top. Therefore the time taken would be O(log N) in the worst case.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes
Page3 of 11
/* H->Element[0] is a sentinel */
voi d I nser t ( El ement Type X, Pr i or i t yQueue H )
{
i nt i ;
i f ( I sFul l ( H ) )
{
Er r or ( "Pr i or i t y queue i s f ul l " ) ;
r et ur n;
}
f or ( i = ++H- >Si ze; H- >El ement s[ i / 2] > X; i / = 2 )
H- >El ement s[ i ] = H- >El ement s[ i / 2 ] ;
H- >El ement s[ i ] = X;
}
DeleteMin
1. When the minimum (root) is removed, a hole is created at the top.
2. Since the heap is now one lesser the last element X must be moved.
3. The smaller of the holes children is slid into the hole, pushing the hole one level
down.
4. The above process is repeated until X is placed into the hole
In the initial heap, 13 (root) is the minimum. Therefore31 needs to be placed in the hole. The
value31 cannot be placed into the hole as it violates the heap order. Therefore, the smaller
child14 moved to the top and the hole is slid down one level. In the next step, 19 is moved
up and the hole is slid down. Then26 is moved up and the hole is moved down to the bottom
level. Finally31 is placed in the hole.
This strategy is known as percolate down. An implementation error might occur if there are
even number of elements in the heap and a node that has only one child. The worst case
runtime is O (log N).
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes
Page4 of 11
/* Delete Min operation */
El ement Type Del et eMi n( Pr i or i t yQueue H )
{
i nt i , Chi l d;
El ement Type Mi nEl ement , Last El ement ;
i f ( I sEmpt y( H ) )
{
Er r or ( "Pr i or i t y queue i s empt y" ) ;
r et ur n H- >El ement s[ 0 ] ;
}
Mi nEl ement = H- >El ement s[ 1 ] ;
Last El ement = H- >El ement s[ H- >Si ze- - ] ;
f or ( i = 1; i * 2 <= H- >Si ze; i = Chi l d )
{
/ * Fi nd Smal l er Chi l d */
Chi l d = i * 2;
i f ( Chi l d ! = H- >Si ze && H- >El ement s[ Chi l d+1]
< H- >El ement s[ Chi l d ] )
Chi l d++;
/ * Per col at e one l evel */
i f ( Last El ement > H- >El ement s[ Chi l d ] )
H- >El ement s[ i ] = H- >El ement s[ Chi l d] ;
el se
br eak;
}
H- >El ement s[ i ] = Last El ement ;
r et ur n Mi nEl ement ;
}
Other Operations
DecreaseKeyThe DecreaseKey(P, , H) operation lowers the value of the key at position
P by a positive amount . Since this might violate the heap order, it must be fixed by a
percolate up. This operation could be useful to system administrators so that they can make
certain programs run with highest priority.
IncreaseKeyThe IncreaseKey(P, , H) operation increases the value of the key at
position P by a positive amount . This is done with a percolate down. Many schedulers
automatically drop the priority of a process that is consuming excess CPU time.
DeleteThe Delete(P, H) operation removes the node at position P from the heap. This is
done by first performing DecreaseKey(P,,H) and then DeleteMin(H). When a process is
terminated by the user abruptly, it must be removed from the priority queue.
BuildHeapThe BuildHeap(H) operation takes as input N keys and places it into an empty
heap. To do it, the N keys are placed into the tree in any order without violating the structure
property. ThenPercolateDown(i) percolate down from node I, to create a heap-ordered tree.
The runtime is of O(N).
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes
Page5 of 11
Binary Heap Application
1. The Selection ProblemGiven a list of N elements and an integer k, the selection
problem is to find the k
th
largest element. If sorting is performed either in total or partial it
results in O(N
2
) for k=N/2. Two algorithms with runtime of O(N log N) is discussed. Assume
that thek
th
smallest element is to be found.
Algorithm A
1. Read N elements into an array.
2. ApplyBuildHeap algorithm into the array.
3. Perform kDeleteMin operations
4. The last element extracted from the heap is the solution.
Algorithm B
1. A set S of k largest elements is maintained.
2. When a new element is read it is compared with thek
th
largest element, denoted asS
k
.
S
k
is the smallest element in S.
3. If the new element is larger, then it replacesS
k
in S.
4. At the end of the input, the smallest element in S is found and returned.
2. Event SimulationIn a banking environment, the queue performance depends in
estimating number of tellers required. This is computed by applying the queuing model.
However as k gets larger, analysis becomes complex, computers are used to simulate. A
simulation consists of two events namely arrival and departure. Probability functions are
used to generate streams of arrival and service time for each customer sorted by arrival time.
A way of simulating is to start at zero time and then advance clock a tick, checking to see if
there is an event. If so, the event is processed and statistics is computed. When there are no
more customers left and all tellers are free, the simulation ends. The drawback is that running
time depends upon ticks rather than events.
This problem is avoided by advancing the clock to the next event time at each stage. Since all
the times when events will happen is available, the next event could be easily judged. If the
event is an arrival, teller availability is checked. If there is none, it is placed on the arrival
line, otherwise service and departure is scheduled.
The waiting line for customers is implemented as a queue. The set of departures waiting to
happen is organized in a priority queue. If there areC customers andk tellers, then runtime of
the simulation is O(C log(k+1)) and k+1 is size of heap.
d-Heaps
In ad-heap, all nodes haved children. Thus binary heap is a2-heap. The runtime of Insert
is O(log
d
N). For larger d, the runtime for DeleteMin is O(d log
d
N). If d is a constant, then
both runtimes are O(log N).
Although an array can be used, multiplications and divisions to find parent and child is byd.
Unlessd is a power of 2, there will be a increase in runtime since shift operations cannot be
used for implementation.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes
Page6 of 11
d-heaps will be advantageous when priority queues are too large to fit in main memory. 4-
Heaps outperform binary heaps in practice. The disadvantages are that Find operation could
not be performed whereasMerge is very tedious.
The three data structures that support Merge operation efficiently with runtime of O(log N)
areLeftist heaps, Skew heaps andBinomial Queues
Leftist Heaps
Like binary heap, aleftist heap has bothstructural and ordering property, but the difference
between them is that leftist heap are unbalanced.
Leftist Heap PropertyThe null path lengthNpl(X) of any nodeX to be the length of the
shortest path from X to a node without two children. TheNpl of a node with zero or one child
is 0. Npl(NULL) =-1. TheNpl of any node is 1 more than minimum of its childsNpl.
The leftist heap property is that for every node X in the heap, theNpl of the left child is at
least as large as that of the right child. This property not only ensures that the tree is
unbalanced but also a long path of left nodes, hence the name leftist. A leftist tree with r
nodes on the right path must have at least 2
r
-1 nodes.
Skew Heaps
Skew heaps are binary trees with heap order, but there is no structural constraint. Unlike
leftist heaps, null path length is not maintained. The right path of a skew heap can be
arbitrarily long. The worst-case running time of all operations is O(N). The fundamental
operation on skew heaps ismerging.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes
Page7 of 11
Binomial Queues
Binomial queue is not a heap-ordered tree but a collection of heap-ordered trees known as a
forest. Each of the heap-ordered trees are of a constrained form known as a binomial tree.
There is at most one binomial tree of every height. A binomial tree of height 0 is a one-node
tree; a binomial tree, B
k
, of height k is formed by attaching a binomial tree, B
k-1
, to the root of
another binomial tree, B
k-1
. The binomial trees B
0
, B
1
, B
2
, B
3
, and B
4
are shown below.
Binomial trees of height k have exactly 2
k
nodes, and the number of nodes at depthd is the
binomial coefficient (
k
d
). A priority queue H
1
of size 6 is represented by the forest B
1
, B
2
i.e.,
binary representation of 0110.
Each node in a binomial tree will contain the data, first child, left and right sibling. The
children are arranged in decreasingrank.
st r uct Bi nNode
{
El ement Type El ement ;
Posi t i on Lef t Chi l d;
Posi t i on Next Si bl i ng;
};
st r uct Col l ect i on
{
i nt Cur r ent Si ze;
Bi nTr ee TheTr ees[ MaxTr ees ] ;
};
The following figure shows how the binomial queue and its representation.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes
Page8 of 11
Merge Operation
The minimum element can be found by scanning the roots of all the trees. Since there are at
most log N different trees, the minimum can be found in O(log N) time. Consider the two
binomial queues, H
1
and H
2
with 6 and 7 elements for merging.
The merge is performed by essentially adding the two queues together. Let H
3
be the new
binomial queue. Since H
1
has no binomial tree of height 0 and H
2
does, the binomial tree of
height 0 in H
2
becomes part of H
3
. Next, add binomial trees of height 1. Since both H
1
and H
2
have binomial trees of height 1, merge them by making the larger root a subtree of the
smaller, creating a binomial tree of height 2 as shown below. Thus, H3 will not have a
binomial tree of height 1.
There are now three binomial trees of height 2, namely, the trees of H
1
and H
2
plus the tree
formed by previous step. We keep one binomial tree of height 2 in H
3
and merge the other
two, creating a binomial tree of height 3. Since H
1
and H
2
have no trees of height 3, it
becomes part of H
3
. The resulting binomial queue is shown
Since merging two binomial trees takes constant time and there are O(log N) binomial trees,
the merge takes O(log N) time in the worst case. The Merge routine combines H
1
and H
2
,
placing the result in H
1
and making H
2
empty. The trees dealt are always of the same rank.
Bi nQueueMer ge( Bi nQueue H1, Bi nQueue H2 )
{
Bi nTr ee T1, T2, Car r y = NULL;
i nt i , j ;
i f ( H1- >Cur r ent Si ze + H2- >Cur r ent Si ze > Capaci t y )
Er r or ( " Mer ge woul d exceed capaci t y" ) ;
H1- >Cur r ent Si ze += H2- >Cur r ent Si ze;
f or ( i = 0, j = 1; j <= H1- >Cur r ent Si ze; i ++, j *= 2 )
{
T1 = H1- >TheTr ees[ i ] ; T2 = H2- >TheTr ees[ i ] ;
swi t ch( ! ! T1 + 2 * ! ! T2 + 4 * ! ! Car r y )
{
case 0: / * No t r ees */
case 1: / * Onl y H1 */
br eak;
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes
Page9 of 11
case 2: / * Onl y H2 */
H1- >TheTr ees[ i ] = T2;
H2- >TheTr ees[ i ] = NULL;
br eak;
case 4: / * Onl y Car r y */
H1- >TheTr ees[ i ] = Car r y;
Car r y = NULL;
br eak;
case 3: / * H1 and H2 */
Car r y = Combi neTr ees( T1, T2 ) ;
H1- >TheTr ees[ i ] =H2- >TheTr ees[ i ] = NULL;
br eak;
case 5: / * H1 and Car r y */
Car r y = Combi neTr ees( T1, Car r y ) ;
H1- >TheTr ees[ i ] = NULL;
br eak;
case 6: / * H2 and Car r y */
Car r y = Combi neTr ees( T2, Car r y ) ;
H2- >TheTr ees[ i ] = NULL;
br eak;
case 7: / * Al l t hr ee */
H1- >TheTr ees[ i ] = Car r y;
Car r y = Combi neTr ees( T1, T2 ) ;
H2- >TheTr ees[ i ] = NULL;
br eak;
}
}
r et ur n H1;
}
Bi nTr ee Combi neTr ees( Bi nTr ee T1, Bi nTr ee T2 )
{
i f ( T1- >El ement > T2- >El ement )
r et ur n Combi neTr ees( T2, T1 ) ;
T2- >Next Si bl i ng = T1- >Lef t Chi l d;
T1- >Lef t Chi l d = T2;
r et ur n T1;
}
Insertion is just a special case of merging, since a one-node tree is created and a merge is
performed. The worst-case time of this operation is O(log N). As an example, binomial
queues that are formed by inserting 1 through 7.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes
Page10 of 11
Bi nQueue I nser t ( El ement Type I t em, Bi nQueue H )
{
Bi nTr ee NewNode;
Bi nQueue OneI t em;
NewNode = mal l oc( si zeof ( st r uct Bi nNode ) ) ;
i f ( NewNode == NULL )
Fat al Er r or ( " Out of space! ! ! " ) ;
NewNode- >Lef t Chi l d = NewNode- >Next Si bl i ng = NULL;
NewNode- >El ement = I t em;
OneI t em= I ni t i al i ze( ) ;
OneI t em- >Cur r ent Si ze = 1;
OneI t em- >TheTr ees[ 0 ] = NewNode;
r et ur n Mer ge( H, OneI t em) ;
}
DeleteMin operation
A DeleteMin is performed by first finding the binomial tree with the smallest root. Let this
tree be B
k
, and let the original priority queue be H. Now remove the binomial tree B
k
from
the forest of trees in H, forming the new binomial queue H'. Also remove the root of B
k
,
creating binomial trees B
0
, B
1
, . . . , B
k-l
, which collectively form priority queue H''. Finish the
operation by merging H' and H'.
The minimum root is 12, so DeleteMin is performed to obtain the two priority queues H' and
H'. The binomial queue is results of merging H' and H' as shown
El ement Type Del et eMi n( Bi nQueue H )
{
i nt i , j ;
i nt Mi nTr ee; / * The t r ee wi t h t he mi ni mumi t em*/
Bi nQueue Del et edQueue;
Posi t i on Del et edTr ee, Ol dRoot ;
El ement Type Mi nI t em;
i f ( I sEmpt y( H ) )
{
Er r or ( " Empt y bi nomi al queue" ) ;
r et ur n - I nf i ni t y;
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes
Page11 of 11
}
Mi nI t em= I nf i ni t y;
f or ( i = 0; i < MaxTr ees; i ++ )
{
i f ( H- >TheTr ees[ i ] && H- >TheTr ees[ i ] - >El ement < Mi nI t em)
{
/ * Updat e mi ni mum*/
Mi nI t em= H- >TheTr ees[ i ] - >El ement ;
Mi nTr ee = i ;
}
}
Del et edTr ee = H- >TheTr ees[ Mi nTr ee ] ;
Ol dRoot = Del et edTr ee;
Del et edTr ee = Del et edTr ee- >Lef t Chi l d;
f r ee( Ol dRoot ) ;
Del et edQueue = I ni t i al i ze( ) ;
Del et edQueue- >Cur r ent Si ze = ( 1 << Mi nTr ee ) - 1;
f or ( j = Mi nTr ee - 1; j >= 0; j - - )
{
Del et edQueue- >TheTr ees[ j ] = Del et edTr ee;
Del et edTr ee = Del et edTr ee- >Next Si bl i ng;
Del et edQueue- >TheTr ees[ j ] - >Next Si bl i ng=NULL;
}
H- >TheTr ees[ Mi nTr ee ] = NULL;
H- >Cur r ent Si ze - = Del et edQueue- >Cur r ent Si ze + 1;
Mer ge( H, Del et edQueue ) ;
r et ur n Mi nI t em;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes : Hashing
Page1 of 7
Hashing
The hash table ADT supports only a subset of the operations allowed by binary search trees.
The implementation of hash tables is frequently called hashing. Hashing is a technique used
for performinginsertion, deletion andfind in constant average time.
The hash table data structure is an array of sizeTableSize, containing the keys. Typically, a
key is a string with an associated value (for instance, salary information). Each key is mapped
into some number in the range 0 to TableSize - 1 and placed in the appropriate cell. This
mapping is called a hash function, which ideally should be simple to compute and should
ensure that any two distinct keys get different cells.
The problems are with choosing a function, deciding what to do when two keys hash to the
same value (collision), and deciding on the table size. Since there are finite number of cells
and inexhaustible supply of keys, the hash function should be one that distributes the keys
evenly among the cells.
If the input keys are integers, then simply returning Key mod TableSize is generally a
reasonable strategy, unless the key has some undesirable properties. Therefore, it is a good
idea to ensure that the table size is prime and the input keys are random integers. In such
case, this function is not only simple to compute but also distributes the keys evenly.
The two simple methods for resolving collisions are:
1. Separate Chaining (Open hashingCollisions are stored outside the hash table)
2. Open Addressing (Closed hashingcollision result in storing one of the records at
another slot in the table.)
Separate Chaining (Open Hashing)
In separate chaining, a list of all elements that hash to the same value is maintained. The list
contains a header. Assume that the keys are first 10 perfect squares and the hashing function
is Hash(X) =X mod 10.
For Find operation, the hash function is used to determine which list to traverse. The list is
then traversed returning the position where the item is found. To perform an Insert, the
appropriate list is traversed down, the element is inserted either at the front of the list or at the
end of the list, whichever is easiest.
The operations are similar to linked lists. The call Find(Key, H) returns a pointer to the cell
containing theKey. If an item is to be inserted, it is checked for existence. If not found, then
it is inserted at front of the list.
The load factor of a hash table is defined as the number of elements in the hash table to the
TableSize. In an unsuccessful search, the number of links to traverse is. A successful search
requires 1+(/2) links to be traversed. The general rule for separate chaining is1.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes : Hashing
Page2 of 7
Posi t i on Fi nd( El ement Type Key, HashTabl e H ) {
Posi t i on P;
Li st L;
L = H- >TheLi st s[ Hash( Key, H- >Tabl eSi ze ) ] ;
P = L- >Next ;
whi l e( P ! = NULL && P- >El ement ! = Key )
P = P- >Next ;
r et ur n P;
}
voi d I nser t ( El ement Type Key, HashTabl e H ) {
Posi t i on Pos, NewCel l ;
Li st L;
Pos = Fi nd( Key, H ) ;
i f ( Pos == NULL ) / * Key i s not f ound */
{ NewCel l = mal l oc( si zeof ( st r uct Li st Node) ) ;
i f ( NewCel l == NULL )
Fat al Er r or ( " Out of space! ! ! " ) ;
el se
{
L=H- >TheLi st s[ Hash( Key, H- >Tabl eSi ze) ] ;
NewCel l - >Next = L- >Next ;
NewCel l - >El ement = Key;
L- >Next = NewCel l ;
}
}
}
Open Addressing (Closed Chaining)
Open hashing has the disadvantage of requiring pointers. This tends to slow the algorithm. In
closed hashing, also known as open addressing, if a collision occurs, alternate cells are tried
until an empty cell is found. Cells h
0
(X), h
1
(X), h
2
(X), . . . are tried in succession whereh
i
(X)
= (Hash(X) + F(i)) modTableSize, withF(0) =0. The function, F, is the collision resolution
strategy. Because all data goes inside the table, a bigger table is needed for closed hashing
than for open hashing. Generally, the load factor should be below=0.5 for closed hashing.
The three common collision resolution strategies are:
1. Linear Probing
2. Quadratic Probing
3. Double Hashing
Linear Probing
In linear probing, F is a linear function of i, typically F(i) =i. This amounts to trying cells
sequentially (with wraparound) in search of an empty cell. Consider of inserting keys {89, 18,
49, 58, 69} into a closed table using the same hash function.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes : Hashing
Page3 of 7
The first collision occurs when 49 is inserted; it is put in the next available spot, i.e. 0, which
is open. 58 collides with 18, 89, and then 49 before an empty cell is found three away. The
collision for 69 is handled in a similar manner.
As long as the table is big enough, a free cell can always be found, but the time to do so can
get quite large. Worse, even if the table is relatively empty, blocks of occupied cells start
forming. This effect, known as primary clustering, means that any key that hashes into the
cluster will require several attempts to resolve the collision, and then it will add to the cluster.
The expected number of probes using linear probing is roughly 1/2(1 +1/(1 - )
2
) for
insertions and unsuccessful searches and 1/2(1 +1/ (1- )) for successful searches.
Quadratic Probing
Quadratic probing is a collision resolution method that eliminates the primary clustering
problem of linear probing. In quadratic probing the collision function is quadratic. The
popular choice is F(i) =i
2
. Consider the same input used in the linear probing example.
When 49 collides with 89, the next position attempted is one cell away. This cell is empty, so
49 is placed there. Next 58 collides at position 8. Then the cell one away is tried but another
collision occurs. A vacant cell is found at the next cell tried, which is 22 =4 away. 58 is thus
placed in cell 2. The same thing happens for 69.
For linear probing it is a bad idea to let the hash table get nearly full, because performance
degrades. For quadratic probing, the situation is even more drastic: There is no guarantee of
finding an empty cell once the table gets more than half full, or even before the table gets half
full if the table size is not prime. This is because at most half of the table can be used as
alternate locations to resolve collisions. If quadratic probing is used, and the table size is
prime, then a new element can always be inserted if the table is at least half empty.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes : Hashing
Page4 of 7
Posi t i on Fi nd( El ement Type Key, HashTabl e H )
{
Posi t i on Cur r ent Pos;
i nt Col l i si onNum;
Col l i si onNum= 0;
Cur r ent Pos = Hash( Key, H- >Tabl eSi ze ) ;
whi l e( H- >TheCel l s[ Cur r ent Pos] . I nf o! =Empt y &&
H- >TheCel l s[ Cur r ent Pos] . El ement ! =Key)
{
Cur r ent Pos += 2 * ++Col l i si onNum- 1;
i f ( Cur r ent Pos >= H- >Tabl eSi ze )
Cur r ent Pos - = H- >Tabl eSi ze;
}
r et ur n Cur r ent Pos;
}
voi d I nser t ( El ement Type Key, HashTabl e H )
{
Posi t i on Pos;
Pos = Fi nd( Key, H ) ;
i f ( H- >TheCel l s[ Pos ] . I nf o ! = Legi t i mat e )
{
H- >TheCel l s[ Pos ] . I nf o = Legi t i mat e;
H- >TheCel l s[ Pos ] . El ement = Key;
}
}
Although quadratic probing eliminates primary clustering, elements that hash to the same
position will probe the same alternate cells. This is known as secondary clustering.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes : Hashing
Page5 of 7
Double Hashing
For double hashing, the function must never evaluate to zero. It is also important to make
sure all cells can be probed. A function such as hash
2
(X) =R - (X mod R), with R a prime
smaller thanTableSize, works well. If theTableSize is not prime it is possible to run out of
alternate locations prematurely.
The first collision occurs when 49 is inserted. hash
2
(49) = 7 - 0 = 7, so 49 is inserted in
position 6. hash
2
(58) = 7 - 2 = 5, so 58 is inserted at location 3. Finally, 69 collides and is
inserted at a distancehash
2
(69) = 7 - 6 = 1 away. If we tried to insert 60 in position 0, we
would have a collision. Sincehash
2
(60) = 7 - 4 = 3, we would then try positions 3, 6, 9, and
then 2 until an empty spot is found.
Rehashing
If the table gets too full, the running time for the operations will start taking too long and
inserts might fail for closed hashing with quadratic resolution. This can happen if there are
too many deletions intermixed with insertions.
A solution, then, is to build another table that is about twice as big (with associated new hash
function) and scan down the entire original hash table, computing the new hash value for
each element and inserting it in the new table.
As an example, suppose the elements 13, 15, 24, 6 and 23 are inserted into a closed hash table
of size 7. The hash function is h(x) =x mod 7. Suppose linear probing is used to resolve
collisions. The resulting hash table would be 70% full.
Because the table is so full, a new table is created. The size of this table is 17, because this is
the first prime which is twice as large as the old table size. The new hash function is then
h(X) =X mod 17. The old table is scanned, and elements 6, 15, 23, 24, and 13 are inserted
into the new table.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes : Hashing
Page6 of 7
Old Table New Table
This entire operation is called rehashing. This is obviously a very expensive operation the
running time is O(N), since there are N elements to rehash and the table size is roughly 2N.
HashTabl e Rehash( HashTabl e H )
{
i nt i , Ol dSi ze;
Cel l *Ol dCel l s;
Ol dCel l s = H- >TheCel l s;
Ol dSi ze = H- >Tabl eSi ze;
H = I ni t i al i zeTabl e( 2 * Ol dSi ze ) ;
f or ( i = 0; i < Ol dSi ze; i ++ )
i f ( Ol dCel l s[ i ] . I nf o == Legi t i mat e )
I nser t ( Ol dCel l s[ i ] . El ement , H ) ;
f r ee( Ol dCel l s ) ;
r et ur n H;
}
Extendible Hashing
If the amount of data is too large to fit in main memory, the main consideration is the number
of disk accesses required to retrieve data.
Assume that there are N records to store; the value of N changes over time. Furthermore, at
most M records fit in one disk block (say M =4).
If either open hashing or closed hashing is used, the major problem is that collisions could
cause several blocks to be examined during a find, even for a well-distributed hash table.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 3 Notes : Hashing
Page7 of 7
Furthermore, when the table gets too full, an extremely expensive rehashing step must be
performed.
Assume data consists of several six-bit integers. The extendible hashing is shown as below:
The root of the "tree" contains four pointers determined by the leading two bits of the data.
Each leaf has up to M =4 elements. It happens that in each leaf the first two bits are identical;
indicated by the number in parentheses. To be more formal, D will represent the number of
bits used by the root, which is known as the directory. The number of entries in the directory
is thus 2
D
. d
L
is the number of leading bits that all the elements of some leaf L have in
common. d
L
will depend on the particular leaf, and d
L
<=D.
Now insert the key 100100. This would go into the third leaf, but as the third leaf is already
full, there is no room. Therefore split the leaf into two leaves, which are now determined by
the first three bits. This requires increasing the directory size to 3.
This very simple strategy provides quick access times for insert and find operations on large
databases.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page1 of 17
Definition
A tree is a collection of one or more nodes such that:
There is a specially designated noder called theroot.
The remaining nodes are partitioned into n 0 subtrees T
1
, T
2
..., T
n
each of whose
roots are connected by a directed edge tor
Terminologies
The root of each subtree is said to be achild of r, and r is theparent of each subtree
root.
Every node except the root hasone parent.
Thedegree of a node is the number of subtrees or childrens of the node.
The nodes with the same parent aresiblings.
The node with degree0 is aleaf or terminal node.
Children of the same parent aresiblings.
A path from node n
1
to n
k
is defined as a sequence of nodes n
1
, n
2
, . . . , n
k
such that n
i
is the parent of n
i+1
for 1 i <k. In a tree there is exactly one path from the root to
each node.
The length of a path is number of edges on the path.
Root is at level 0. The level of any node is one plus the level of the parent.
Theancestors of a node are all the nodes along the path from the root to the node. If
there is a path from n
1
to n
2
, then n
1
is an ancestor of n
2
and n
2
is adescendant of n
1
.
Theheight of n
i
is the longest path from n
i
to a leaf. Thus all leaves are at height 0.
The height of a tree is equal to the height of the root.
Tree Implementation
One way to implement a tree is to have in each node, besides its data, a pointer to each child
of the node. Since number of children per node can vary so greatly and is not known in
advance, it might be infeasible to make the children direct links in the data structure. The
solution is to keep the children of each node in a linked list of tree nodes as shown below
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page2 of 17
Binary Tree
A binary tree is a tree in which no node can have more than two children.
Recursive definition: A binary tree is a finite set of nodes that consists of a root and two
disjoint binary trees called theleft subtree T
L
and theright subtree T
R
.
A proper binary tree is a binary tree in which every node has zero or two children. In a
perfect binary tree in which all leaves are at the same depth. A binary tree may is complete if
all leaves are at depth n or n-1 for some n. Any tree can be transformed into binary tree by
left child-right sibling representation. One of the principal uses of binary trees is in the area
of compiler design.
A binary tree is defined recursively: a root, a left subtree, and a right subtree
The maximum number of nodes on level i of a binary tree is 2
i-1
, i>=1.
The maximum nubmer of nodes in a binary tree of depthk is 2
k
-1, k>=1.
Binary Tree Implementation
Because a binary tree has at most two children, it could be represented directly using
pointers. The declaration of binary tree node is similar in structure to that for doubly linked
lists, consisting of an element, a reference to the left child and a reference to the right child.
st r uct Tr eeNode
{
El ement Type El ement ;
Tr ee Lef t ;
Tr ee Ri ght ;
};
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page3 of 17
Binary Expression Tree
The above figure is an example of an expression tree. Theleaves of an expression tree are
operands and other nodes contain operators. It is also possible for a node to have only one
child, in the case of an unary operator. The expression can be evaluated by traversing the
tree. Traversal of binary tree is to visit each node in the binary tree exactly once and is
naturally recursive. The three modes of traversal are
Preorder traversal
First visit the root of the tree
Visit recursively all nodes in left subtree
Recursively visit all nodes in right subtree
Inorder traversal
Visit recursively all nodes in left subtree of the tree
Visit the root
Recursively visit all nodes in right subtree
Postorder traversal
Visit recursively all nodes in left subtree of the tree
Visit all nodes in right subtree of the tree
Visit the root of the tree
Inorder Traversal: A parenthesized infix expression can be obtained by recursively
producing a parenthesized left expression, then printing out the operator at the root, and
finally recursively producing a parenthesized right expression. This is known as an inorder
traversal. The left subtree evaluates to a+( b*c) and the right subtree evaluates to ( ( d
*e) +f ) . The entire tree therefore represents( a+( b*c) ) +( ( ( d*e) +f ) *g) .
Postorder Traversal: An alternate traversal strategy is to recursively print out the left
subtree, the right subtree, and then the operator. Applying this strategy to the tree above, the
output is abc*+de*f +g*+, which is a postfix expression. This traversal strategy is
generally known as a postorder traversal.
Preorder Traversal: A third traversal strategy is to print out the operator first and then
recursively print out the left and right subtrees. The resulting expression, ++a*bc*+*def g,
is the less useful prefix notation and the traversal strategy is a preorder traversal.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page4 of 17
Construction of an Expression Tree
Let the expression be in postfix form. Read the expression one symbol at a time. If the
symbol is an operand, create a one node tree and push a pointer to it onto a stack. If the
symbol is an operator, pop pointers to two trees T
1
and T
2
from the stack (T
1
is popped first)
and form a new tree whose root is the operator and whose left and right children point to T
2
and T
1
respectively. A pointer to this new tree is then pushed onto the stack.
Consider the input expression ab + c d e + * * and assume that the stack grows from left to
right. The first two symbols are operands, so create one-node trees and push pointers to them
onto a stack
Next, a '+' is read, so two pointers to trees are popped, a new tree is formed and a pointer to it
is pushed onto the stack.
Next, c, d, and e are read, and for each a one-node tree is created and a pointer to the
corresponding tree is pushed onto the stack.
Continuing, a '*' is read, so we pop two tree pointers and form a new tree with a '*' as root.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page5 of 17
Finally, the last symbol is read, two trees are merged, and a pointer to the final tree is left on
the stack.
Binary Search Tree
An important application of binary trees is their use in searching. Assume that each node in
the tree is assigned a key value. The property that makes a binary tree into a binary search
tree is that for every node, X, in the tree, the values of all the keys in the left subtree are
smaller than the key value inX, and the values of all the keys in the right subtree are larger
than the key value inX. The average depth and operations of a binary search tree is O(log N).
The binary search tree data structure is given below
st r uct Tr eeNode {
El ement Type El ement ;
Sear chTr ee Lef t ;
Sear chTr ee Ri ght ;
};
The Find operation returns a pointer to the node in treeT that has key X, or NULL if there is
no such node.
Posi t i on Fi nd( El ement Type X, Sear chTr ee T )
{
i f ( T == NULL )
r et ur n NULL;
i f ( X < T- >El ement )
r et ur n Fi nd( X, T- >Lef t ) ;
el se i f ( X > T- >El ement )
r et ur n Fi nd( X, T- >Ri ght ) ;
el se
r et ur n T;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page6 of 17
To insert X into tree T, proceed down the tree as with a Find. If X is found, do nothing.
Otherwise, insert X at the last spot on the path traversed.
Sear chTr ee I nser t ( El ement Type X, Sear chTr ee T )
{
i f ( T == NULL )
{
T = mal l oc( si zeof ( st r uct Tr eeNode ) ) ;
i f ( T == NULL )
Fat al Er r or ( "Out of space! ! ! " ) ;
el se
{
T- >El ement = X;
T- >Lef t = T- >Ri ght = NULL;
}
}
el se i f ( X < T- >El ement )
T- >Lef t = I nser t ( X, T- >Lef t ) ;
el se i f ( X > T- >El ement )
T- >Ri ght = I nser t ( X, T- >Ri ght ) ;
r et ur n T;
}
Deletion is one of the toughest operation. Once the node to be deleted is found, the
possibilities needs to be considered are:
1. Leaf NodeSearch for parent of the node and make link to the leaf as NULL.
2. Node with one childSearch for the parent node and assign the parent link to the
child node of the node to be deleted.
3. Node with two childrenFor node with two children replace the key of this node
with the smallest key of the right subtree or largest key of the left subtree and
recursively delete that node.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page7 of 17
Sear chTr ee Del et e( El ement Type X, Sear chTr ee T )
{
Posi t i on TmpCel l ;
i f ( T == NULL )
Er r or ( " El ement not f ound" ) ;
el se
i f ( X < T- >El ement ) / * Go l ef t */
T- >Lef t = Del et e( X, T- >Lef t ) ;
el se
i f ( X > T- >El ement ) / * Go r i ght */
T- >Ri ght = Del et e( X, T- >Ri ght ) ;
el se / * Found el ement t o be del et ed */
i f ( T- >Lef t && T- >Ri ght ) / * Two chi l dr en */
{
TmpCel l = Fi ndMi n( T- >Ri ght ) ;
T- >El ement = TmpCel l - >El ement ;
T- >Ri ght = Del et e( T- >El ement , T- >Ri ght ) ;
}
el se / * One or zer o chi l dr en */
{
TmpCel l = T;
i f ( T- >Lef t ==NULL ) / * Al so handl e 0 chi l d */
T = T- >Ri ght ;
el se i f ( T- >Ri ght == NULL )
T = T- >Lef t ;
f r ee( TmpCel l ) ;
}
r et ur n T;
}
Adelson Velskii Landis (AVL) Tree
An AVL tree is a binary search tree with a balance condition. An AVL tree is identical to a
binary search tree, except that for every node in the tree, the height of the left and right
subtrees can differ by at most 1.
To check whether a tree is AVL or not, the balancing factor is determined for each node of
the tree. The balancing factor for root and leaf node 0.
Balancing Factor =Height of Left subtree Height of Right subtree
TheBF should be either 0, 1 or -1 for any node in an AVL tree.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page8 of 17
The tree on the left is an AVL tree, but the tree on the right is not.
All the tree operations can be performed in O(log N) time, except insertion. Insertion is
potentially difficult is that inserting a node could violate the AVL tree property. If so, then
the property has to be restored with a simple modification to the tree, known as a rotation.
In the below figure, when insert at X happens node k2 violates the AVL balance property
because its left subtree is two levels deeper than its right subtree. To rebalance, X is moved
up one level and Z one level down, i.e., the nodes are rearranged into an equivalent tree.
Thus k1 becomes the new root. According to binary search tree property for the original tree
k2>k1, so k2 becomes right child of k1 in the new tree. X andZ remain as the left child of k1
and right child of k2. Subtree Y is placed as k2 left child in the new tree to satisfy the
ordering requirements
The conversion of one of the above trees to the other is known as a rotation. A rotation
involves only a few pointer changes, and changes the structure of the tree while preserving
the search tree property.
In the below figure, when insert at Z happens node k1 violates the AVL balance property.
This is fixed by a rotation
The new height of the entire subtree is the same as the original subtree that caused the
insertion at X to grow. Therefore the height on the path to the root need not be updated and no
further rotation is required.
Single Rotation
Insert keys 1 through 7 in sequential order in an empty AVL tree. The first problem occurs
when inserting key 3, because the AVL property is violated at the root. A single rotation
between the root and its right child is performed to fix the problem. The tree is shown in the
following figure, before and after the rotation.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page9 of 17
Next, inserting key 4, which causes no problems, but the insertion of 5 creates a violation at
node 3, which is fixed by a single rotation.
Next inserting 6 causes a balance problem for the root, since its left subtree is of height 0, and
its right subtree would be height 2. Therefore a single rotation at the root between 2 and 4 is
performed.
Next key 7 is inserted that results in imbalance at node 5, thereby another rotation.
Double Rotation
The single rotation does not fix the problem if insertion is done either onto right subtree of
the left child or onto left subtree of the right child. In such cases double rotation is used.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page10 of 17
Continuing previous example, insert keys 8 through 15 in reverse order. Inserting 15 is easy,
since it does not destroy the balance property, but inserting 14 causes a height imbalance at
node 7. The double rotation is aright-left double rotation and involves 7, 15, and 14. Here, k
3
is the node with key 7, k
1
is the node with key 15, and k
2
is the node with key 14 with
subtrees A, B, C, and D all empty.
Next insert 13, which requires a double rotation. Here it is again a right-left double rotation
that will involve 6, 14, and 7. In this case, k
3
is the node with key 6, k
1
is the node with key
14, andk
2
is the node with key 7. Subtree A is the tree rooted at the node with key 5, subtree
B is the empty subtree that was originally the left child of the node with key 7, subtree C is
the tree rooted at the node with key 13, and finally, subtree D is the tree rooted at the node
with key 15.
If 12 is now inserted, there is an imbalance at the root. Since 12 is not between 4 and 7, a
single rotation will restore the tree.
Insertion of 11 will require a single rotation.
To insert 10, a single rotation needs to be performed, and the same is true for insertion of 9. 8
is inserted without a rotation, creating the balanced tree as follows
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page11 of 17
Finally, 8 is inserted to show the symmetric case of the double rotation. This causes the
node containing 9 to become unbalanced. Since 8 is between 9 and 8, a double rotation
needs to be performed, yielding the following tree.
/* Data Structure for AVL node */
st r uct Avl Node {
El ement Type El ement ;
Avl Tr ee Lef t ;
Avl Tr ee Ri ght ;
i nt Hei ght ;
};
/* Function to compute Height of an AVL node */
st at i c i nt Hei ght ( Posi t i on P ) {
i f ( P == NULL )
r et ur n - 1;
el se
r et ur n P- >Hei ght ;
}
/* Maximum */
st at i c i nt Max( i nt Lhs, i nt Rhs ) {
r et ur n Lhs > Rhs ? Lhs : Rhs;
}
/* Insertion into an AVL node */
Avl Tr ee I nser t ( El ement Type X, Avl Tr ee T )
{
i f ( T == NULL )
{
/ * Cr eat e and r et ur n a one- node t r ee */
T = mal l oc( si zeof ( st r uct Avl Node ) ) ;
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page12 of 17
i f ( T == NULL )
Fat al Er r or ( " Out of space! ! ! " ) ;
el se
{
T- >El ement = X; T- >Hei ght = 0;
T- >Lef t = T- >Ri ght = NULL;
}
}
el se
i f ( X < T- >El ement )
{
T- >Lef t = I nser t ( X, T- >Lef t ) ;
i f ( Hei ght ( T- >Lef t ) - Hei ght ( T- >Ri ght ) == 2 )
i f ( X < T- >Lef t - >El ement )
T = Si ngl eRot at eWi t hLef t ( T ) ;
el se
T = Doubl eRot at eWi t hLef t ( T ) ;
}
el se i f ( X > T- >El ement ) {
T- >Ri ght = I nser t ( X, T- >Ri ght ) ;
i f ( Hei ght ( T- >Ri ght ) - Hei ght ( T- >Lef t ) == 2 )
i f ( X > T- >Ri ght - >El ement )
T = Si ngl eRot at eWi t hRi ght ( T ) ;
el se
T = Doubl eRot at eWi t hRi ght ( T ) ;
}
/ * El se X i s i n t he t r ee al r eady; do not hi ng */
T- >Hei ght =Max( Hei ght ( T- >Lef t ) , Hei ght ( T- >Ri ght ) ) +1;
r et ur n T;
}
/* Single Rotation */
st at i c Posi t i on Si ngl eRot at eWi t hLef t ( Posi t i on K2 ) {
Posi t i on K1;
K1 = K2- >Lef t ;
K2- >Lef t = K1- >Ri ght ;
K1- >Ri ght = K2;
K2- >Hei ght =Max( Hei ght ( K2- >Lef t ) , Hei ght ( K2- >Ri ght ) ) +1;
K1- >Hei ght =Max( Hei ght ( K1- >Lef t ) , K2- >Hei ght ) + 1;
r et ur n K1; / * New r oot */
}
st at i c Posi t i on Si ngl eRot at eWi t hRi ght ( Posi t i on K1 ) {
Posi t i on K2;
K2 = K1- >Ri ght ;
K1- >Ri ght = K2- >Lef t ;
K2- >Lef t = K1;
K1- >Hei ght =Max( Hei ght ( K1- >Lef t ) , Hei ght ( K1- >Ri ght ) ) +1;
K2- >Hei ght = Max( Hei ght ( K2- >Ri ght ) , K1- >Hei ght ) + 1;
r et ur n K2; / * New r oot */
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page13 of 17
/* Double Rotation */
st at i c Posi t i on Doubl eRot at eWi t hLef t ( Posi t i on K3 )
{
/ * Rot at e bet ween K1 and K2 */
K3- >Lef t = Si ngl eRot at eWi t hRi ght ( K3- >Lef t ) ;
/ * Rot at e bet ween K3 and K2 */
r et ur n Si ngl eRot at eWi t hLef t ( K3 ) ;
}
st at i c Posi t i on Doubl eRot at eWi t hRi ght ( Posi t i on K1 ) {
/ * Rot at e bet ween K3 and K2 */
K1- >Ri ght = Si ngl eRot at eWi t hLef t ( K1- >Ri ght ) ;
/ * Rot at e bet ween K1 and K2 */
r et ur n Si ngl eRot at eWi t hRi ght ( K1 ) ;
}
Splay Trees
Splay trees are variation of Binary Search Tree (BST). The basic idea is not to spend too
much time on balancing. A Splay tree guarantees for M consecutive tree operations take at
most O(M log N) tough the individual operation can take more than O(log N).
The worst case time for M operations is O(M(f(N))), then amortized runtime is O(f(N)).
Splay trees have been designed with an amortized runtime of O(log N). It does not guarantee
exactly O(log N) for each operation.
The basic idea of the splay tree is that after a node is accessed, it is pushed to the root by a
series of AVL tree rotations. Whenever an object is accessed, it becomes the new root. This
should be done without adversely affecting other nodes.
This method is likely to have practical utility, because in many applications when a node is
accessed, it is likely to be accessed again in the near future. Splay trees also do not require the
maintenance of height or balance information, thus saving space and simplifying the code to
some extent
The splaying strategy is to rotate bottom up along the access path idea but selective about
how rotations are performed.. Let X be a (nonroot) node on the access path. If the parent of X
is the root of the tree, then rotate X and theroot. This is the last rotation along access path.
Otherwise, X has both a parent (P) and a grandparent (G), and there are two cases, plus
symmetries, to consider. The first case is the zig-zag case. Here X is a right child and P is a
left child (or vice versa). If so perform a double rotation, exactly like an AVL double
rotation. Otherwise, it is a zig-zig case: X and P are either both left children or both right
children. In that case, we transform the tree on the left to the tree on the right.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page14 of 17
Consider the tree with the result of splaying at the node with key 1 is as shown below. An
access of the node with key 1, which takes N-1 units, the access on the node with key 2 will
only take about N/2 units instead of N-2 units.
Splaying not only moves the accessed node to the root, but also has the effect of roughly
halving the depth of most nodes on the access path. When access paths are long, thus leading
to a longer-than-normal search time, the rotations tend to be good for future operations. When
accesses are cheap, the rotations can be bad.
Deletion
When a node is accessed for deletion, it becomes the new root. Thus the root would be
deleted. If it is deleted, then two subtrees results namely T
L
and T
R
. The new root is
determined as follows:
1. Find the largest element in T
L
, then this element is rotated to root of T
L
.
2. T
R
is made the right child of T
L
.
Splaying not only moves the accessed node to the root, but also has the effect of roughly
halving the depth of most nodes on the access path. When access paths are long, thus leading
to a longer-than-normal search time, the rotations tend to be good for future operations. When
accesses are cheap, the rotations can be bad.
Because the rotations for splay trees are performed in pairs from the bottom up, a recursive
implementation does not work. Thus, splay trees are coded non-recursively and work in two
passes. The first pass goes down the tree and the second goes back up, performing rotations.
This requires that the path be saved. This can be done by using a stack or by adding an extra
field to the node record that will point to the parent.
The analysis of splay trees is difficult, because it must take into account the ever-changing
structure of the tree. On the other hand, splay trees are much simpler to program than AVL
trees, since there are fewer cases to consider and no balance information to maintain. Finally,
there are several variations of splay trees that can perform even better.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page15 of 17
B-Trees
There is a popular search tree that is not binary. This tree is known as a B-tree or a balanced
M-ary tree. A B-tree of order M is a tree has the following properties:
1. The root is either a leaf or has between 2 and M children.
2. All non-leaf nodes (except the root) have between [M/2] and M children.
3. All leaves are at the same depth and have between [L/2] and L children for any L.
4. All data is stored at the leaves.
5. The non-leaf nodes store upto M-1 keys to guide the searching. Key i represents the
smallest key in subtree i+1.
Contained in each interior node are pointers P
1
, P
2
, . . . , P
m
to the children, and values k
1
, k
2
, .
. . , k
m-1
, representing the smallest key found in the subtrees P
2
, P
3
, . . . , P
m
respectively. Of
course, some of these pointers might be NULL, and the corresponding k
i
would then be
undefined. For every node, all the keys in subtree P
1
are smaller than the keys in subtree P
2
,
and so on. The leaves contain all the actual data, which is either the keys themselves or
pointers to records containing the keys.
A B-tree of order 4 is more popularly known as a 2-3-4 tree, and a B-tree of order 3 is known
as a 2-3 tree.
Insertion
The initial 2-3 B-tree is shown below. Interior nodes are in ellipses, which contain the two
pieces of data for each node. A dash line as a second piece of information in an interior node
indicates that the node has only two children. Leaves are drawn in boxes, which contain the
keys. The keys in the leaves are ordered.
To perform a Find, start at the root and branch in one of the directions, depending on the
relation of the key sought for to the values stored at the node. To perform an Insert on a
previously unseen key, X, follow the path as though performing a Find. When it gets to a leaf
node, the correct place to put X is found.
To insert a node with key 18, just add it to a leaf without causing any violations of the 2-3
tree properties.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page16 of 17
Now inserting 1 into the tree is not possible as the node where it belongs is already full.
Placing the new key into this node would give it a fourth element which is not allowed. This
can be solved by making two nodes of two keys each and adjusting the information in the
parent as shown below.
Now attempting to insert 19 into the current tree as in the previous step results in two nodes
of two keys each.
This tree has an internal node with four children, but only three per node is allowed. The
solution is to split this node into two nodes with two children.
Now inserting element with key 28, a leaf with four children is created, which is split into
two leaves of two children.
This creates an internal node with four children, which is then split into two children. In this
case the root is split into two nodes. and finish by creating a new root. This is how a 2-3 tree
gains height.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068: Unit 4Trees
Page17 of 17
When a key is inserted, the only changes to internal nodes occur on the access path. These
changes can be made in time proportional to the length of this path
With general B-trees of order M, when a key is inserted, the only difficulty arises when the
node that is to accept the key already has M keys. This key gives the node M+1 keys, which
we can split into two nodes with (M +1) / 2 and (M +1) / 2 keys respectively. As this gives
the parent an extra node, it is necessary to check whether this node can be accepted by the
parent and split the parent if it already has M children. Repeat this until a parent is found with
less than M children. If the root is split, then create a new root with two children
Deletion
Deletion can be performed by finding the key to be deleted and removing it.
1. If this key was one of only two keys in a node, then its removal leaves only one key.
This can be fixed by combining this node with a sibling.
2. If the sibling has three keys, then one is suppressed so that both nodes have two keys.
3. If the sibling has only two keys, then combine the two nodes into a single node with
three keys. The parent of this node now loses a child, so we might have to percolate
this strategy all the way to the top. If the root loses its second child, then the root is
also deleted and the tree becomes one level shallower. As nodes are combined,
remember to update the information kept at the internal nodes.
The depth of a B-tree is at most log [M/2] N. At each node on the path, perform O(log M)
work to determine which branch to take (using a binary search), but an Insert or Delete
require O(M) work to fix up all the information at the node. The worst-case running time for
each of the Insert and Delete operations is thus O(M logM N) =O( (M / log M ) log N), but a
find takes only O(log N).
As M gets larger, the insertion and deletion times increase. If main memory speed is a
concern, higher order B-trees, such as 5-9 trees, are not an advantage.
The real use of B-trees lies in database systems, where the tree is kept on a physical disk
instead of main memory. Accessing a disk is typically several orders of magnitude slower
than any main memory operation. If a B-tree of order M is used, then the number of disk
accesses is O(logM N). Although each disk access carries the overhead of O(log M) to
determine the direction to branch, the time to perform this computation is typically much
smaller than the time to read a block of memory and can thus be considered inconsequential.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page1 of 24
Terminologies
A graph G =(V, E) consists of a set of vertices, V, and a set of edges, E.
Eachedge is a pair (v, w), where v, w V. Edges are sometimes referred to as arcs.
If the pair is ordered, then the graph is directed. Directed graphs are sometimes
referred to as digraphs.
A symmetric digraph is one that has for every edge (v, w) the edge (w, v) exists.
Vertex w is adjacent to v if and only if (v, w) E.
In an undirected graph with edge (v, w), and hence (w, v) w is adjacent to v and v is
adjacent to w.
Sometimes an edge has a third component, known as either aweight or a cost.
A path in a graph is a sequence of vertices w
1
, w
2
, w
3
, . . . , w
n
such that (w
i
,
w
i+1
) E for 1 i <N.
Thelength of such a path is the number of edges on the path, which is equal to N 1.
If the graph contains an edge (v, v) from a vertex to itself, then the path v, v is
sometimes referred to as aloop.
A simple path is a path such that all vertices are distinct, except that the first and last
could be the same.
A cycle in a directed graph is a path of length at least 1 such that w1 =wn
A directed graph is acyclic if it has no cycles. Its abbreviated as DAG.
An undirected graph is connected if there is a path from every vertex to every
other vertex. A directed graph with this property is called strongly connected.
If a directed graph is not strongly connected, but the underlying graph is connected,
then the graph is said to be weakly connected.
A complete graph is a graph in which there is an edge between every pair of vertices.
The degree of a vertex is the number of edges incident to that vertex
For directed graph,
the in-degree of a vertex v is the number of edges that have v as the head
the out-degree of a vertex v is the number of edges that have v as the tail
if di is the degree of a vertex i in a graph G with n vertices and e edges, the
number of edges is
V={a,b,c,d,e}
E={(a,b),(a,c),(a,d),(b,e),(c,d),(c,e),(d,e)}
2 / ) (
1
0
n
i
d e
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page2 of 24
Graph Representation
The two popular ways of representing graphs are:
1. Adjacency matrix representation
2. Adjacency list representation
Adjacency Matrix
One simple way to represent a graph is to use a two-dimensional array, known as an
adjacency matrix representation. For each edge (u, v), set a[u][v] =1; otherwise the entry in
the array is 0.
If the edge has a weight associated with it, then we can set a[u][v] equal to the weight and use
either a very large or a very small weight as a sentinel to indicate nonexistent edges.
This matrix has the merit of extreme simplicity, the space requirement can be prohibitive if
the graph does not have very many edges. An adjacency matrix is an appropriate
representation if the graph is dense.
Adjacency List
If the graph is sparse, a better solution is an adjacency list representation. For each vertex,
keep a list of all adjacent vertices. The space requirement is O(|E| + |V|). If the edges have
weights, then this additional information is also stored in the cells.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page3 of 24
Adjacency lists are the standard way to represent graphs. Undirected graphs can be similarly
represented; each edge (u, v) appears in two lists, so the space usage essentially doubles. A
common requirement in graph algorithms is to find all vertices adjacent to some given vertex
v, and this can be done, in time proportional to the number of such vertices found, by a
simple scan down the appropriate adjacency list.
Topological Sort
A topological sort is an ordering of vertices in a directed acyclic graph, such that if there is a
path fromv
i
tov
j
, thenv
j
appears after v
i
in the ordering.
Simple Algorithm: First find any vertex with no incoming edges i.e., with 0indegree. Print
this vertex, and remove it, along with its edges, from the graph. Apply the above strategy to
the rest of the graph. The running time of the algorithm is O(|V|
2
).
The drawback is that for a sparse graph, only few vertices have their indegree updated when
the step is repeated. A slightly improved one to perform thetopological sort of DAG using
queue is:
1. First, theIndegree is computed for every vertex.
2. Then all vertices of Indegree 0 are placed on an initially empty queue.
3. While the queue is not empty, a vertex v is removed, and all edges adjacent to v have
their Indegrees decremented.
4. A vertex is put on the queue as soon as itsIndegree falls to 0.
5. The topological ordering then is the order in which the vertices Dequeue.
The time to perform this algorithm is O(|E| +|V|) if adjacency lists are used.
Indegree Before Dequeue#
Vertex 1 2 3 4 5 6 7
v1 0 0 0 0 0 0 0
v2 1 0 0 0 0 0 0
v3 2 1 1 1 0 0 0
v4 3 2 1 0 0 0 0
v5 1 1 0 0 0 0 0
v6 3 3 3 3 2 1 0
v7 2 2 2 1 0 0 0
Enqueue v1 v2 v5 v4
v3
v7
v6
Dequeue v1 v2 v5 v4 v3 v7 v6
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page4 of 24
voi d Topsor t ( Gr aph G )
{
Queue Q;
unsi gned i nt Count er ;
ver t ex V, W;
Q = Cr eat eQueue( NumVer t ex ) ;
MakeEmpt y( Q ) ;
f or each Ver t ex V
i f ( i ndegr ee[ V] = 0 )
Enqueue( V, Q ) ;
whi l e( ! I sEmpt y( Q ) )
{
V = Dequeue( Q ) ;
TopNum[ V] =++Count er ; / *assi gn next number */
f or each Wadj acent t o V
i f ( - - I ndegr ee[ W] = 0 )
Enqueue( W, Q ) ;
}
i f ( Count er ! = NumVer t ex )
Er r or ( "Gr aph has a cycl e") ;
Di sposeQueue( Q ) ;
}
SINGLE-SOURCE SHORTEST-PATH PROBLEM
Given as input a weighted graph, G = (V, E), and a distinguished vertex s, find the shortest
weighted path froms to every other vertex inG.
Unweighted Shortest Path
Figure below shows an unweighted graph, G. Using some vertex, s, which is an input
parameter, find the shortest path froms to all other vertices. This is a special case of weighted
shortest-path problem, since all edges is assigned weight of 1.
1. Chooses to bev
3
. The shortest path froms to v
3
is then a path of length 0. Mark this
information, in the graph.
2. Next start looking for all vertices that are adjacent tos. v
1
andv
6
are one edge froms,
i.e., length 1.
3. Now find vertices whose shortest path froms is exactly 2, by finding all the vertices
adjacent to v
1
and v
6
, whose shortest paths are not already known. Thus the shortest
path to v
2
andv
4
is 2.
4. Finally by examining vertices adjacent to v
2
& v
4
, it is found that v
5
and v
7
have a
shortest path of three edges. All vertices have now been calculated and is the final
result of the algorithm.
This strategy for searching a graph is known asbreadth-first search. It operates by processing
vertices in layers: the vertices closest to the start are evaluated first, and the most distant
vertices are evaluated last.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page5 of 24
For each vertex, three pieces of information is maintained. First, its distance froms in the
entryd
v
. Initially all vertices are unreachable except for s, whose path length is 0. The entry
in p
v
is the bookkeeping variable, which will allows to print the actual paths. The entry
Known is set to 1 after a vertex is processed. Initially, all entries are unknown, including the
start vertex. When a vertex is Known, no further cheaper path will ever be found, and so
processing for that vertex is essentially complete.
The basic algorithm mimics the diagrams by declaring as known the vertices at distanced =
0, thend =1, thend =2, and so on, and setting all the adjacent verticesw that still haved
w
=
to a distanced
w
=d + 1. By tracing back through thep
v
variable, the actual path can be
printed. The running time of the algorithm is O(|V|
2
).
The simple algorithm can be refined by using just one queue. At the start of the pass, the
queue contains only vertices of distance CurrDist. When adjacent vertices of distance
CurrDist + 1 is added, they enqueue at the rear and will not be processed until after all the
vertices of distanceCurrDist have been processed. After the last vertex at distanceCurrDist
dequeues and is processed, the queue only contains vertices of distanceCurrDist + 1, so this
process perpetuates. The running time is O (|E| +|V|), as long as adjacency lists are used.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page6 of 24
Initial State
v3 Dequeued v1 Dequeued v6 Dequeued
v Known dv pv Known dv pv Known dv pv Known dv pv
v1 0 0 0 1
v3
1 1
v3
1 1
v3
v2 0 0 0 0 0 2 v1 0 2 v1
v3 0 0 0 1 0 0 1 0 0 1 0 0
v4 0 0 0 0 0 2 v1 0 2 v1
v5 0 0 0 0 0 0 0 0
v6 0 0 0 1 v3 0 1 v3 1 1 v3
v7 0 0 0 0 0 0 0 0
Q: v3
v1, v6 v6, v2 , v4 v2 , v4
v2 Dequeued
v4 Dequeued v5 Dequeued v7 Dequeued
v Known dv pv Known dv pv Known dv pv Known dv pv
v1 1 1
v3
1 1
v3
1 1
v3
1 1
v3
v2 1 2 v1 1 2 v1 1 2 v1 1 2 v1
v3 1 0 0 1 0 0 1 0 0 1 0 0
v4 0 2 v1 1 2 v1 1 2 v1 1 2 v1
v5 0 3 v2 0 3 v2 1 3 v2 1 3 v2
v6 1 1 v3 1 1 v3 1 1 v3 1 1 v3
v7 0 0 0 3 v4 0 3 v4 0 3 v4
Q: v4 , v3
v5 , v7 v7 empty
voi d unwei ght ed( Tabl e T )
{
Queue Q;
Ver t ex V, W;
Q = Cr eat eQueue( NumVer t ex ) ;
MakeEmpt y( Q ) ;
Enqueue( S, Q ) ;
Whi l e( ! I sEmpt y( Q ) )
{
V = Dequeue( Q ) ;
T[ V] . Known = Tr ue;
f or each Wadj acent t o V
i f ( T[ W] . Di st = )
{
T[ W] . Di st = T[ V] . Di st + 1;
T[ W] . Pat h = V;
Enqueue( W, Q ) ;
}
}
Di sposeQueue( Q ) ; / * f r ee t he memor y */
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page7 of 24
Dijkstra's Algorithm
The general method to solve the single-source shortest-path problem is known as Dijkstra's
algorithm. This thirty-year-old solution is a prime example of a greedy algorithm. Greedy
algorithms generally solve a problem in stages by doing what appears to be the best thing at
each stage.
In this method, weighted graph is used. As in unweighted method, each vertex is marked as
either known or unknown. A tentative distanced
v
is kept for each vertex. This distance turns
out to be the shortest path length froms to v using only known vertices as intermediates.
Recordp
v
, which is the last vertex to cause a change tod
v
. At each stage, Dijkstra's algorithm
selects a vertex v, which has the smallest d
v
among all the unknown vertices, and declares
that the shortest path froms to v is known. The remainder of a stage consists of updating the
values of d
w
.
Assume the start nodes, as v
1
. The first vertex selected is v
1
, with path length 0. This vertex
is marked known. Withv
1
known, the adjacent vertices arev
2
and v
4
. Both these vertices get
their entries adjusted. Next, v
4
is selected and marked known. Vertices v
3
, v
5
, v
6
, and v
7
are
adjacent, and it turns out that all require adjusting as shown.
Next, v
2
is selected. v
4
is adjacent but already known, so no work is performed on it. v
5
is
adjacent but not adjusted, because the cost of going throughv
2
is 2 +10 =12 and a path of
length 3 is already known. The next vertex selected is v
5
at cost 3. v
7
is the only adjacent
vertex, but it is not adjusted, because 3 +6 >5.
Thenv
3
is selected, and the distance for v
6
is adjusted down to 3 +5 =8. Next v
7
is selected;
v
6
gets updated down to 5 +1 =6. Finally, v
6
is selected and requires no change.
Thus, the total running time is O(|V|
2
+|E|) =O(|V|
2
) for finding and updating the minimum.
If the graph is dense, then the algorithm is also simple and optimal.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page8 of 24
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page9 of 24
voi d Di j kst r a( Tabl e T )
{
Ver t ex V, W;
f or ( ; ; )
{
V = smal l est unknown di st ance ver t ex;
I f ( V == Not AVer t ex)
br eak;
T[ V] . Known = Tr ue;
f or each Wadj acent t o V
i f ( ! T[ W] . Known )
i f ( T[ V] . Di st + Cvw < T[ W] . Di st )
{ / * updat e W*/
Decr ease( T[ W] . Di st t o
T[ V] . Di st + cV, W) ;
T[ W] . Pat h = V;
}
}
}
Graphs with Negative Edge
If the graph has negative edge costs, then Dijkstra's algorithm does not work. The problem is
that once a vertex u is declared known, it is possible that from some other, unknown vertex v
there is a path back tou that is very negative. In such case, taking a path froms tov back tou
is better than going froms tou without usingv.
A combination of weighted and unweighted algorithms will solve the problem, but at the cost
of a drastic increase in running time. The concept of known vertices is ignored and begin by
placings on a queue. Then, at each stage, dequeue a vertexv. Find all verticesw adjacent to v
such that d
w
>d
v
+c
v,w
. Updated
w
andp
w
, and placew on a queue if it is not already there. A
bit can be set for each vertex to indicate presence in the queue. Repeat the process until the
queue is empty.
The algorithm works if there are no negative-cost cycles. Each vertex can dequeue at most |V|
times, so the running time is O(|E| * |V|) if adjacency lists are used. If negative-cost cycles are
present, then the algorithm will loop indefinitely. By stopping the algorithm after any vertex
has dequeued |V| +1 times, termination could be ensured.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page10 of 24
voi d Wei ght edNegat i ve( Tabl e T )
{
Queue Q;
Ver t ex V, W;
Q = Cr eat eQueue( NumVer t ex ) ;
MakeEmpt y( Q ) ;
Enqueue( S, Q ) ;
Whi l e( ! I sEmpt y( Q ) )
{
V = Dequeue( Q ) ;
f or each Wadj acent t o V
i f ( T[ V] . Di st + Cv. w < T[ W] . Di st )
{ / *updat e W*/
T[ W] . Di st = T[ V] . Di st + Cv. w ;
T[ W] . Pat h = V;
i f ( Wi s not al r eady i n Q )
Enqueue( W, Q ) ;
}
}
Di sposeQueue( Q ) ;
}
Acyclic Graph
If the graph is known to be acyclic, Dijkstra's algorithm can be improved by changing the
order in which vertices are declared known. The new rule is to select vertices in topological
order. The algorithm can be done in one pass, since the selections and updates can take place
as the topological sort is being performed. The running time is O(|E| +|V|), since the selection
takes constant time.
This selection rule works because when a vertex v is selected, its distance, d
v
, can no longer
be lowered, since by the topological ordering rule it has no incoming edges emanating from
unknown nodes.
A more important use of acyclic graphs is critical path analysis. Each node in the graph
represents an activity that must be performed, along with the time it takes to complete the
activity. This graph is known as an activity-node graph. The edges represent precedence
relationships: An edge (v, w) means that activityv must be completed before activity w may
begin. Any activities that do not depend on each other can be performed in parallel.
This type of a graph could be used to model construction projects. Several important
questions which would be of interest to answer. First, what is the earliest completion time for
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page11 of 24
the project? Another important question is to determine which activities can be delayed, and
by how long, without affecting the minimum completion time.
Convert the activity-node graph to an event-node graph. Each event corresponds to the
completion of an activity and all its dependent activities. Dummy edges and nodes may need
to be inserted to avoid introducing false dependencies. The event node graph corresponding
to the above graph is
To find the earliest completion time of the project, find the length of the longest path from the
first event to the last event. It is easy to adapt the shortest-path algorithm to compute the
earliest completion time for all nodes in the graph. If EC
i
is the earliest completion time for
nodei, then the applicable rules areEC
1
=0 andEC
w
=max (EC
v
+c
vw
)
The latest time, LC
i
, is that each event can finish without affecting the final completion time.
LC
n
=Ec
n
and LC
v
=min (LC
w
- c
vw
). The earliest completion times are computed for
vertices by their topological order, and the latest completion times are computed by reverse
topological order.
The slack time for each edge in the event-node graph represents the amount of time that the
completion of the corresponding activity can be delayed without delaying the overall
completion. Slack(v,w) =LC
w
- EC
v
- C
v,w
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page12 of 24
Some activities have zero slack. These arecritical activities, which must finish on schedule.
There is at least one path consisting entirely of zero-slack edges; such a path is acritical path.
Maximum Flow
A directed graph G = (V, E) with edge capacities c
v,w
is given. These capacities could
represent the amount of traffic that could flow on a street between two intersections. There
are two vertices: s, called as source, and t, known as sink. Through any edge, (v, w), at most
c
v,w
units of "flow" may pass. At any vertex, v, that is not either s or t, the total flow coming
in must equal the total flow going out. The maximum flow problem is to determine the
maximum amount of flow that can pass froms tot.
The problem is solved in stages. Given graph, G, construct aflow graphG
f
. G
f
tells the flow
that has been attained at any stage in the algorithm. Initially all edges inG
f
have no flow, and
when the algorithm terminates, G
f
contains maximum flow. Also a graph, G
r
, called the
residual graph is constructed. G
r
tells, for each edge, how much more flow can be added.
This is calculated by subtracting the current flow from the capacity for each edge. An edge in
G
r
is known as a residual edge.
At each stage, we find a path inG
r
froms to t. This path is known as anaugmenting path.
The minimum edge on this path is the amount of flow that can be added to every edge on the
path. This is done by adjustingG
f
and recomputingG
r
. When there is no path froms to t in
G
r
, then algorithm terminates. This algorithm is nondeterministic, any path froms to t is
chosen; obviously some choices are better than others
The initial configuration of graphsG, G
f
, G
r
are
There are many paths froms to t in the residual graph. Let paths, b, d, t be selected. Then 2
units of flow can be sent through every edge on this path. Once an edge is saturated, it is
removed from the residual graph. Then graphsG, G
f
, G
r
looks like
Next, select the path s, a, c, t, which also allows two units of flow. Making the required
adjustments gives the graphsG, G
f
, G
r
as follows:
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page13 of 24
The only path left to select is s, a, d, t, which allows one unit of flow. The resulting graphs
are shown as
The algorithm terminates at this point, becauset is unreachable froms. The resulting flow of
5 happens to be the maximum.
Suppose the paths, a, d, t is chosen. This path allows 3 units of flow and thus seems to be a
good choice. The result of this choice, however, is that there is now no longer any path froms
tot in the residual graph, and thus, the algorithm has failed to find an optimal solution
In order to make the algorithm work, for every edge (v, w) with flowf
v,w
in the flow graph,
add an edge in the residual graph (w, v) of capacityf
v,w
. Thus, the algorithm is able to undo its
decisions by sending flow back in the opposite direction. Now select the augmenting paths,
a, d, t, and the graphs are obtained.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page14 of 24
In the residual graph, there are edges in both directions betweena andd. Either one more unit
of flow can be pushed froma to d, or up to three units can be pushed back, i.e., flow can be
undone. Now the algorithm finds the augmenting paths, b, d, a, c, t, of flow 2 by pushing two
units of flow fromd toa, the algorithm takes 2 units of flow away.
There is no augmenting path in this graph, so the algorithm terminates. If the capacities are
all integers and the maximum flow is f, then, since each augmenting path increases the flow
value by at least 1, f stages suffice, and the total running time is O(f|E|), since an augmenting
path can be found in O(|E|) time by an unweighted shortest-path algorithm.
Minimum Spanning Tree
A minimum spanning tree (MST) of an undirected graphG is a tree formed from graph edges
that connects all the vertices of G at lowest total cost. A minimum spanning tree exists if and
only if G is connected. The number of edges in the MST is |V| - 1. The MST is a tree because
it is acyclic, it is spanning because it covers every edge, and it is minimum for the obvious
reason.
For instance, to wire a house with a minimum of cable, then a MST problem needs to be
solved. One way to compute a minimum spanning tree is to grow the tree in successive
stages. There are two basic algorithms to solve this problem; both are greedy.
1. Prims Algorithm
2. Kurskals Algorithm
Prims Algorithm
In first stage, one node is picked as theroot, and add an edge, and thus an associated vertex,
to the tree. At any point in the algorithm, there are a set of vertices that have already been
included in the tree; the rest of the vertices have not. The algorithm then finds, at each stage,
a new vertex to add to the tree by choosing the edge (u, v) such that the cost of (u, v) is the
smallest among all edges.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page15 of 24
Initially, v1 is in the tree as aroot with no edges. Each step adds one edge and one vertex to
the tree. The Prim's algorithm is essentially identical to Dijkstra's algorithm for shortest paths.
For each vertex, values d
v
and p
v
are maintained for an indication of whether it is known or
unknown. d
v
is the weight of the shortest arc connectingv to a known vertex, andp
v
is the last
vertex to cause a change ind
v
. The rest of the algorithm is exactly the same, exception for the
definition of d
v
and the update rule. For this problem, the update rule is even simpler than
before: After a vertexv is selected, for each unknownw adjacent tov, d
v
=min(d
w
, c
w,v
).
Initiallyv
1
is selected, and v
2
, v
3
, v
4
are updated. The next vertex selected is v
4
. Every vertex
is adjacent tov
4
. v
1
is not examined, because it is known. v
2
is unchanged, because it hasd
v
=2
and the cost fromv
4
tov
2
is 3; and rest updated.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page16 of 24
The next vertex chosen is v
2
, arbitrarily breaking a tie. This does not affect any distances.
Thenv
3
is chosen, which affects the distance inv
6
. Next selection of v
7
, forcesv
6
andv
5
to be
adjusted. v
6
and thenv
5
are selected, completing the algorithm.
The edges in the spanning tree can be read from the final table: (v
2
, v
1
), (v
3
, v
4
), (v
4
, v
1
), (v
5
,
v
7
), (v
6
, v
7
), (v
7
, v
4
). The total cost is 16. The running time is O (|V|
2
) without heaps, which is
optimal for dense graphs, and O (|E| log |V|) using binary heaps, which is good for sparse
graphs.
Kruskals Algorithm
The Kruskal strategy is continually to select the edges in order of smallest weight and accept
an edge if it does not cause a cycle. Formally, Kruskal's algorithm maintains a forest -- a
collection of trees. Initially, there are |V| single-node trees. Adding an edge merges two trees
into one. The algorithm terminates when enough edges are accepted. It is simple to decide
whether edge (u,v) should beaccepted or rejected. There is only one resultant tree, and this is
the minimum spanning one.
The appropriate data structure is theunion/find algorithm. The invariant used is that at any
point, two vertices belong to the same set if and only if they are connected in the current
spanning forest. Thus, each vertex is initially in its own set. If u andv are in the same set, the
edge is rejected, because since they are already connected, adding (u, v) would form a cycle.
Otherwise, the edge is accepted, and a union is performed on the two sets containingu andv.
Edge Weight Action
(v
1
, v
4
) 1 Accepted
(v
6
, v
7
) 1 Accepted
(v
1
, v
2
) 2 Accepted
(v
3
, v
4
) 2 Accepted
(v
2
, v
4
) 3 Rejected
(v
1
, v
3
) 4 Rejected
(v
4
, v
7
) 4 Accepted
(v
3
, v
6
) 5 Rejected
(v
5
, v
7
) 6 Accepted
The edges could be sorted to facilitate the selection, but building a heap in linear time would
be much better. Because an edge consists of three pieces of data, it is efficient to implement
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page17 of 24
the priority queue as an array of pointers to edges, rather than as an array of edges. Therefore
to rearrange the heap, only pointers, not large records, need to be moved. The worst-case
running time of this algorithm is O(|E| log |E|).
voi d Kr uskal ( Gr aph G )
{
i nt EdgesAccept ed;
Di sj Set S;
Pr i or i t yQueue H;
Ver t ex U, V;
Set Type Uset , Vset ;
Edge E;
I ni t i al i ze( S ) ;
ReadGr aphI nt oHeapAr r ay( G, H ) ;
Bui l dHeap( H ) ;
EdgesAccept ed = 0;
whi l e( EdgesAccept ed < NumVer t ex- 1 )
{
E = Del et eMi n( H ) ; / * e = ( U, V) */
Uset = Fi nd( U, S ) ;
V_set = Fi nd( V, S ) ;
i f ( Uset ! = Vset )
{
EdgesAccept ed ++; / * accept t he edge */
Set Uni on( S, Uset , Vset ) ;
}
}
}
Depth First Search
Depth-first search is a generalization of preorder traversal. Starting at some vertexv, process
v and then recursively traverse all vertices adjacent to v. If this process is performed on an
arbitrary graph, then care should be taken to avoid cycles. To do this, when a vertex v is
visited, it is marked as visited, and recursively call depth-first search on all adjacent vertices
that are not already marked.
voi d Df s( Ver t ex V )
{
Vi si t ed[ V] = Tr ue;
f or each Wadj acent t o V
i f ( ! Vi si t ed[ W] )
Df s( W) ;
}
The boolean arrayVisited[ ] is initialized toFalse. By recursively calling the procedures only
on nodes that have not been visited, thus do not loop indefinitely. If the graph is undirected
and not connected, or directed and not strongly connected, this strategy might fail to visit
some nodes. In such case, search for an unmarked node, apply a depth-first traversal there,
and continue this process until there are no unmarked nodes. Because this strategy guarantees
that each edge is encountered only once, the total time to perform the traversal is O(|E| +|V|),
as long as adjacency lists are used.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page18 of 24
Undirected Graph
An undirected graph is connected if and only if a depth-first search starting from any node
visits every node. If not, then find all the connected components and apply Dfs on each of
these in turn. Consider the undirected graph as shown.
The key in DFS graph traversal is:
1. If path exists from one node to another node walk across the edge, i.e., exploring
2. If the path does not exist from one specific node to any other node, return to the
previous node, backtracking.
Let the start vertex beA and let the exploration be done inalphabetical order.
Now markA as visited and call Dfs(B) recursively. Dfs(B) marksB as visited and callsDfs(C)
recursively. Dfs(C) marks C as visited and calls Dfs(D) recursively. Dfs(D) sees bothA and
B, but both these are marked, so no recursive calls are made. Dfs(D) also sees that C is
adjacent but marked, so no recursive call is made there, and Dfs(D) returns back to Dfs(C).
Dfs(C) seesB adjacent, ignores it, finds a previously unseen vertexE adjacent, and thus calls
Dfs(E). Dfs(E) marks E, ignores A and C, and returns to Dfs(C). Dfs(C) returns to Dfs(B).
Dfs(B) ignores bothA andD and returns. Dfs(A) ignores bothD andE and returns.
The root of the tree is A, the first vertex visited. When (v, w) is processed, w is unmarked, or
when (w, v) is processed, v is unmarked, indicated with atree edge. If when processing (v,
w), and if w is already marked, or when processing (w, v), if v is already marked, then draw a
dashed line, known asback edge, to indicate that this "edge" is not really part of the tree. The
depth-first search of the graph is as shown.
Bi-Connectivity
A connected undirected graph is biconnected if there are no vertices whose removal
disconnects the rest of the graph. If a mass transit system is biconnected, users always have
an alternate route should some terminal be disrupted.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page19 of 24
If a graph is not biconnected, the vertices whose removal would disconnect the graph are
known asarticulation points. These nodes are critical in many applications. The graph shown
is not biconnected: C and D are articulation points. The removal of C would disconnect G,
and the removal of D would disconnect E andF, from the rest of the graph.
Depth-first search provides a linear-time algorithm to find all articulation points in a
connected graph. First, starting at any vertex, perform a depth-first search and number the
nodes as they are visited. For each vertex v, the preorder number is num(v). Then, for every
vertexv in the depth-first search spanning tree, compute the lowest-numbered vertex, low(v),
that is reachable fromv by taking zero or more tree edges and then possibly one back edge.
The lowest-numbered vertex reachable by A, B, and C is vertex 1 (A), because they can all
take tree edges to D and then one back edge back to A. We can efficiently compute low by
performing a postorder traversal of the depth-first spanning tree. By the definition of low,
low(v) is the minimum of
1. num(v)
2. the lowest num(w) among all back edges (v, w)
3. the lowest low(w) among all tree edges (v, w)
The first condition is the option of taking no edges, the second way is to choose no tree edges
and a back edge, and the third way is to choose some tree edges and possibly a back edge.
This third method is succinctly described with a recursive call. Since low for all the children
of v needs to be evaluated before evaluating low(v), this is a postorder traversal. For any edge
(v, w), whether it is a tree edge or a back edge could be determined by checking num(v) and
num(w). Doing all the computation takes O(|E| +|V|) time.
The root is an articulation point if and only if it has more than one child, because if it has two
children, removing the root disconnects nodes in different subtrees, and if it has only one
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page20 of 24
child, removing the root merely disconnects the root. Any other vertex v is an articulation
point if and only if v has some child w such that low(w) num(v).
voi d Fi ndAr t ( Ver t ex V )
{
Ver t ex W;
Vi si t ed[ V] = Tr ue;
Low[ V] = Num[ V] = Count er ++; / * Rul e 1 */
f or each Wadj acent t o V
{
i f ( ! Vi si t ed[ W] ) / * f or war d edge */
{
Par ent [ W] = V;
Fi ndAr t ( W) ;
i f ( Low[ W] >= Num[ V] )
pr i nt f ( "%v i s an ar t i cul at i on poi nt \ n", V ) ;
Low[ V] = Mi n( Low[ V] , Low[ W] ) ; / * Rul e */
}
el se
i f ( Par ent [ V] ! = W) / * back edge */
Low[ V] = Mi n( Low[ V] , Num[ W] ) ; / * Rul e 2 */
}
}
Euler Circuits
One of the problems in graph theory was to find a cycle that visits every edge exactly once.
This graph problem was solved in 1736 by Euler and is commonly referred to as an Euler
path or Euler Tour or Euler circuit problem.
An Euler circuit, which must end on its starting vertex, is possible only if the graph is
connected and each vertex has an even degree (number of edges). This is because, on the
Euler circuit, a vertex is entered and then left. If any vertexv has odd degree, then eventually
it will reach the point where only one edge into v is unvisited, and taking it will leave
stranded at v. If exactly two vertices have odd degree, an Euler tour, which must visit every
edge but need not return to its starting vertex, is still possible if it is started at one of the odd-
degree vertices and finish at the other. If more than two vertices have odd degree, then an
Euler tour is not possible.
That is, any connected graph, all of whose vertices have even degree, must have an Euler
circuit. Furthermore, a circuit can be found in linear time. The basic algorithm is to perform a
depth-first search. The main problem is that only a portion of the graph is visited and return
to the starting point prematurely.
To make splicing simple, the path should be maintained as a linked list. To avoid repetitious
scanning of adjacency lists, for each adjacency list, a pointer to the last edge scanned is
maintained. When a path is spliced in, the search for a new vertex from which to perform the
next Dfs must begin at the start of the splice point. With the appropriate data structures, the
running time of the algorithm is O(|E| +|V|).
Consider the following graph. Suppose, starting at vertex 5, the circuit 5, 4, 10, 5 is traversed
and thereafter stuck, with most of the graph still untraversed.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page21 of 24
Then continue from vertex 4, which still has unexplored edges. A depth-first search results in
the path 4, 1, 3, 7, 4, 11, 10, 7, 9, 3, 4. Splice this path into the previous path of 5, 4, 10, 5,
then the new path is 5, 4, 1, 3, 7 ,4, 11, 10, 7, 9, 3, 4, 10, 5.
The next vertex with an untraversed edge is 9, and the algorithm finds the circuit 9, 12, 10, 9.
When this is added to the current path, a circuit of 5, 4, 1, 3, 2, 8, 9, 12, 10, 9, 6, 3, 7, 4, 11,
10, 7, 9, 3, 4, 10, 5 is obtained. As all the edges are traversed, the algorithm terminates with
an Euler circuit.
Directed Graph
As with undirected graphs, directed graphs can be traversed in linear time, using depth-first
search. If the graph is not strongly connected, a depth-first search starting at some node might
not visit all nodes. In this case we repeatedly perform depth-first searches, starting at some
unmarked node, until all vertices have been visited.
Arbitrarily start the depth-first search at vertex B. This visits vertices B, C, A, D, E, and F.
Now restart at some unvisited vertex. Arbitrarily, start at H, which visitsI andJ. Finally, start
at G, which is the last vertex that needs to be visited.
The dashed arrows in the depth-first spanning forest are edges (v, w) for which w was already
marked at the time of consideration. First, there are back edges, such as (A, B) and (I, H).
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page22 of 24
There are also forward edges, such as (C, D) and (C, E), that lead from a tree node to a
descendant. Finally, there are cross edges, such as (F, C) and (G, F), which connect two tree
nodes that are not directly related.
One use of depth-first search is to test whether or not a directed graph is acyclic. The rule is
that a directed graph is acyclic if and only if it has no back edges.
Strong Components
By performing two depth-first searches, it could be tested whether a directed graph is
strongly connected, and if it is not, the subsets of vertices that are strongly connected to
themselves can be produced.
First, a depth-first search is performed on the directed graphG as shown above. The vertices
of G are numbered by a postorder traversal of the depth-first spanning forest, and then all
edges inG are reversed, formingG
r
The algorithm is completed by performing a depth-first search onG
r
, always starting a new
depthfirst search at the highest-numbered vertex. Begin the depth-first search of G
r
at vertex
G, which is numbered 10. This leads nowhere, so the next search is started at H. This call
visits I and J. The next call starts at B and visits A, C, and F. The next calls after this are
Dfs(D) and finallyDfs(E).
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page23 of 24
Each of the trees in this depth-first spanning forest forms a strongly connected component.
Thus, the strongly connected components are {G}, {H, I, J}, {B, A, C, F}, {D}, and {E}.
NP Completeness
Introduction
So far solutions to a wide variety of graph theory problems seen so far have polynomial
running times, with the exception of the network flow problem and for some problems certain
variations seem harder than the original.
For instance, the Euler circuit problem, (which finds a path that touches every edge exactly
once), is solvable in linear time whereas the Hamiltonian cycle problem (a simple cycle that
contains every vertex) has no linear algorithm.
Not only are no linear algorithms known for these variations, but there are no known
algorithms that are guaranteed to run in polynomial time. There are a host of important
problems that are roughly equivalent in complexity called the NP-complete problems. The
exact complexity of these NP-complete problems is yet to be determined and remains the
foremost open problem in theoretical computer science.
Easy vs Hard
The many problems can be solved in linear time either assume some preprocessing or occur
on arithmetic examples. The running time, is measured as a function of the amount of input.
Generally, better than linear running time is not possible.
At the other end of the spectrum lie some truly hard problems. These problems are so hard
that they are impossible. J ust as real numbers are not sufficient to express a solution to x2 <
0, therefore computers cannot solve every problem that happens to come along. These
"impossible" problems are calledundecidable problems.
One particular undecidable problem is thehalting problem. Is it possible for the C compiler
to have an extra feature that not only detects syntax errors but also infinite loops? This
problem is undecidable because such a program might have a hard time checking itself.
Assume an infinite loop-checking program called LOOP is written. LOOP takes as input a
programP and runsP on itself. It prints out YES if P loops. If P terminates, a natural thing to
do would be to print out NO. Instead of doing that, LOOP will go into an infinite loop. What
happens when LOOP is given itself as input? Either LOOP halts, or it does not halt. The
problem is that both these possibilities lead to contradictions.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 4: Graphs
Page24 of 24
The Class NP
NP stands for nondeterministic polynomial-time. A deterministic machine, at each point in
time, is executing an instruction. Depending on the instruction, it then goes to some next
instruction. A nondeterministic machine has achoice of next steps. It is free to choose any
that it wishes, and if one of these steps leads to a solution, it will always choose the correct
one. A nondeterministic machine thus has the power of extremely good (optimal) guessing
but nobody could possibly build a nondeterministic computer.
The class NP includes all problems that have polynomial-time solutions, but proving
exponential lower bounds is an extremely difficult task. Not all decidable problems are in NP.
Consider the problem of determining whether a graph does not have a Hamiltonian cycle. It
seems to enumerate all the cycles and check them one by one. Thus the Non-Hamiltonian
cycle problem is not known to be in NP.
NP Complete Problem
Among all the problems known to be in NP, there is a subset, known as the NP-complete
problems, which contains the hardest. An NP-complete problem has the property that any
problem in NP can bepolynomially reduced to it.
A problemP
1
can be reduced to P
2
as follows: Provide a mapping so that any instance of P
1
can be transformed to an instance of P
2
. Solve P
2
, and then map the answer back to the
original. As an example, numbers are entered into a pocket calculator in decimal. The
decimal numbers are converted to binary, and all calculations are performed in binary. Then
the final answer is converted back to decimal for display. For P
1
to be polynomially reducible
toP
2
, all the work associated with the transformations must be performed in polynomial time.
The reason that NP-complete problems are the hardest NP problems is that a problem that is
NP-complete can essentially be used as a subroutine for any problem in NP. Thus, if any NP-
complete problem has a polynomial-time solution, then every problem in NP must have a
polynomial-time solution.
Consider theTravelling Salesman Problem (TSP): Given a complete graphG = (V, E), with
edge costs, and an integer K, is there a simple cycle that visits all vertices with total cost K?
To show that TSP is NP-complete, polynomially reduce the Hamiltonian cycle problem to it.
Construct a new graph G'. G' has the same vertices as G. For G', each edge (v, w) has a
weight of 1 if (v, w) G, and2 otherwise. ChooseK =|V|.
It is easy to verify that G has a Hamiltonian cycle problem if and only if G' has a Traveling
Salesman tour of total weight |V|.
The first problem that was proven to be NP-complete was thesatisfiability problem. Some of
the well-known NP-complete problems are bin packing, knapsack, graph coloring, and
clique. The list is extensive and includes problems from operating systems (scheduling and
security), database systems, operations research, logic, and especially graph theory.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page1 of 17
One of the fundamental problems of computer science is ordering a list of items. There's a
plethora of solutions to this problem, known as sorting algorithms. Some sorting algorithms
are simple and intuitive, such as the bubble sort. Others, such as the quick sort are extremely
complicated, but produce lightning-fast results. Sorting algorithms are divided into two
categories: internal andexternal sorts.
Insertion Sort
One of the simplest sorting algorithms is the insertion sort. Insertion sort consists of N - 1
passes. For pass P =1 through N - 1, insertion sort ensures that the elements in positions 1
throughP are in sorted order. Insertion sort makes use of the fact that element in positions 1
throughP - 1 are already known to be in sorted order. In passP, we move theP
th
element left
until its correct place is found among the first P +1 elements.
Consider the input sequence of 34, 8, 64, 51, 32, 31 for insertion. The figure on the right
shows insertion sort for the elements 3, 1, 4, 1, 5, 9, 2
voi d I nser t i onSor t ( El ement Type A[ ] , i nt N )
{
i nt j , P;
El ement Type Tmp;
f or ( P = 1; P < N; P++ )
{
Tmp = A[ P ] ;
f or ( j = P; j > 0 && A[ j - 1 ] > Tmp; j - - )
A[ j ] = A[ j - 1 ] ;
A[ j ] = Tmp;
}
}
Data movement is done without the explicit use of swaps. The element in position P is saved
in Tmp, and all larger elements (prior to position P) are moved one spot to the right. Then
Tmp is placed in the correct spot.
Because of the nested loops, each of which can take N iterations, insertion sort is O(N
2
). If
the input is presorted, the running time is O(N), because the test in the inner for loop always
fails immediately.
Pros: Relatively simple and easy to implement.
Cons: Inefficient for large lists.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page2 of 17
Shell Sort
Shellsort, named after its inventor, Donald Shell, was one of the first algorithms to break the
quadratic time barrier. It works by comparing elements that aredistant; the distance between
comparisons decreases as the algorithm runs until the last phase, in which adjacent elements
are compared. For this reason, shellsort is referred to asdiminishing increment sort.
Shellsort uses a sequence, h
1
, h
2
, . . . , h
t
, called the increment sequence. Any increment
sequence will do as long ash
1
=1, but obviously some choices are better than others. After a
phase, using some increment h
k
, for every i, A[i] A[i +h
k
]; all elements spacedh
k
apart are
sorted. The file is then said to beh
k
-sorted. An important property of shellsort is that an h
k
-
sorted file that is thenh
k
- 1-sorted remains h
k
-sorted.
The general strategy toh
k
-sort is for each position, i, inh
k
+1, h
k
+2, . . ., n, place the element
in the correct spot amongi, i - h
k
, i - 2h
k
, etc. A popular choice for increment sequence is to
use the sequence suggested by Shell: h
t
=[N/2] , andh
k
=[h
k+1
/ 2]
Shellsort makes multiple passes through a list and sorts a number of equally sized sets using
the insertion sort. It improves on the efficiency of insertion sort byquickly shifting values to
their destination. The worst-case running time of shellsort is O(N
2
).
As an example consider the input sequence 18, 32, 12, 5, 38, 33, 16 and 2. In this case N=8,
therefore the shell increment is floor(N/2) =4.
The 4-sort is done in the following phases:
Consider initially 18 and 38. Since they are in order, no change
18 32 12 5 38 33 16 2
18 32 12 5 38 33 16 2
Next consider 32 and 33. Since they are in order, no change
18 32 12 5 38 33 16 2
18 32 12 5 38 33 16 2
Next consider 12 and 16. Since they are in order, no change
18 32 12 5 38 33 16 2
18 32 12 5 38 33 16 2
Next consider 5 and 2. Since they are not in order, it is swappped
18 32 12 5 38 33 16 2
18 32 12 2 38 33 16 5
The next is a 2-sort and is done in the following phases:
Consider the elements in odd position 18, 12, 38 and 16. The elements are sorted in
order as follows
18 32 12 2 38 33 16 5
12 32 16 2 18 33 38 5
Next consider the elements in even position 18, 12, 38 and 16. The elements are
sorted in order as follows
12 32 16 2 18 33 38 5
12 2 16 5 18 32 38 33
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page3 of 17
The last increment of shellsort, a 1-sort is basically an Insertion sort.
12 32 16 5 18 33 38 2
2 5 12 16 18 32 33 38
voi d Shel l sor t ( El ement Type A[ ] , i nt N )
{
i nt i , j , I ncr ement ;
El ement Type Tmp;
f or ( I ncr ement = N / 2; I ncr ement > 0; I ncr ement / = 2 )
f or ( i = I ncr ement ; i < N; i ++ )
{
Tmp = A[ i ] ;
f or ( j = i ; j >= I ncr ement ; j - = I ncr ement )
i f ( Tmp < A[ j - I ncr ement ] )
A[ j ] = A[ j - I ncr ement ] ;
el se
br eak;
A[ j ] = Tmp;
}
}
Pros: Twice as fast as the insertion sort
Cons: It is a complex algorithm and not nearly as efficient as merge, heap, and quick sorts
It's relatively simple algorithm makes it a good choice for sorting lists of less than 5000 items
unless speed important. It's also an excellent choice for repetitive sorting of smaller lists.
Heap Sort
The heap sort is the slowest of the fast sorting algorithms, but unlike algorithms such as the
merge and quick sorts it does not require massive recursion or multiple arrays to work. This
makes it the most attractive option for very large data sets of millions of items.
The heap sort begins by building a heap out of the data set, and then removing the largest
item and placing it at the end of the sorted array. After removing the largest item, it
reconstructs the heap and removes the largest remaining item and places it in the next open
position from the end of the sorted array. This is repeated until there are no items left in the
heap and the sorted array is full. The implementation uses the same array to store both the
heap and the sorted array. Whenever an item is removed from the heap, it frees up a space at
the end of the array so that the removed item can be placed in.
Given an array of 6 elements: 15, 19, 10, 7, 17, 16, sort it in ascending order using heap sort.
Building the Heap Tree
The array represented as a tree, complete but not ordered
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page4 of 17
Start with the rightmost node at height Size/2, i.e. 10. It has one greater child and has to be
percolated down.
Next comes 19. Its children are smaller, so no percolation is needed. The last node to be
processed is 15. Its left child is the greater of the children and is percolated down to the left.
The children of 15 is greater, and item 15 has to be moved down further, swapped with 17.
Now the tree is ordered, and the binary heap is built.
Sorting - performing deleteMax operations
DeleteMax the top element 19 and store 19 in a temporary place. A hole is created at the top.
Swap 19 with the last element of the heap. As 10 will be adjusted in the heap, its cell will no
longer be a part of the heap. Instead it becomes a cell from the sorted array
Percolate down the hole. Percolate once more as 10 is less that 15, so it cannot be inserted in
the previous hole. Now 10 can be inserted in the hole
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page5 of 17
Next DeleteMax the top element 17. Store 17 in a temporary place. A hole is created at the
top. Swap 17 with the last element of the heap. As 10 will be adjusted in the heap, its cell will
no longer be a part of the heap. Instead it becomes a cell from the sorted array
The element 10 is less than the children of the hole, and we percolate the hole down. Insert 10
in the hole.
Next DeleteMax 16. Store 16 in a temporary place. A hole is created at the top. Swap 16 with
the last element of the heap. As 7 will be adjusted in the heap, its cell will no longer be a part
of the heap. Instead it becomes a cell from the sorted array
Percolate the hole down as 7 cannot be inserted there - it is less than the children of the hole.
Now Insert 7 in the hole.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page6 of 17
Next DeleteMax the top element 15. Store 15 in a temporary location. A hole is created. Swap
15 with the last element of the heap. As 10 will be adjusted in the heap, its cell will no longer
be a part of the heap. Instead it becomes a position from the sorted array.
Now store 10 in the hole
DeleteMax the top element 10. Remove 10 from the heap and store it into a temporary
location. Swap 10 with the last element of the heap. As 7 will be adjusted in the heap, its cell
will no longer be a part of the heap. Instead it becomes a cell from the sorted array.
Store 7 in the hole as it is the only remaining element in the heap. 7 is the last element from
the heap, so now the array is sorted
voi d Per cDown( El ement Type A[ ] , i nt i , i nt N )
{
i nt Chi l d;
El ement Type Tmp;
f or ( Tmp = A[ i ] ; Lef t Chi l d( i ) < N; i = Chi l d )
{
Chi l d = Lef t Chi l d( i ) ;
i f ( Chi l d ! = N - 1 && A[ Chi l d + 1 ] > A[ Chi l d ] )
Chi l d++;
i f ( Tmp < A[ Chi l d ] )
A[ i ] = A[ Chi l d ] ;
el se
br eak;
}
A[ i ] =Tmp;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page7 of 17
voi d Heapsor t ( El ement Type A[ ] , i nt N )
{
i nt i ;
f or ( i = N / 2; i >= 0; i - - ) / * Bui l dHeap */
Per cDown( A, i , N ) ;
f or ( i = N - 1; i > 0; i - - )
{
Swap( &A[ 0 ] , &A[ i ] ) ; / * Del et eMax */
Per cDown( A, 0, i ) ;
}
}
Pros: In-place and non-recursive, making it a good choice for extremely large data sets.
Cons: Slower than the merge and quick sorts.
Merge Sort
Mergesort runs in O(N log N) worst-case running time, and the number of comparisons used
is nearly optimal. The fundamental operation in this algorithm is merging two sorted lists.
Because the lists are sorted, this can be done in one pass through the input, if the output is put
in a third list.
The basic merging algorithm takes two input arrays A and B, an output array C, and three
counters, Aptr, Bptr, and Cptr, which are initially set to the beginning of their respective
arrays. The smaller of A[Aptr] and B[Bptr] is copied to the next entry in C, and the
appropriate counters are advanced. When either input list is exhausted, the remainder of the
other list is copied to C.
For instance, let arrayA contain 1, 13, 24, 26, andB contain 2, 15, 27, 38, then the algorithm
proceeds as follows:
First, a comparison is done between 1 and 2. 1 is added toC, and then 13 and 2 are compared.
2 is added toC, and then 13 and 15 are compared.
13 is added to C, and then 24 and 15 are compared. This proceeds until 26 and 27 are
compared.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page8 of 17
26 is added toC, and theA array is exhausted.
The remainder of theB array is then copied toC.
The time to merge two sorted lists is clearly linear, because at most N-1 comparisons are
made, where N is the total number of elements.
voi d Mer ge( El ement Type A[ ] , El ement Type TmpAr r ay[ ] ,
i nt Lpos, i nt Rpos, i nt Ri ght End )
{
i nt i , Lef t End, NumEl ement s, TmpPos;
Lef t End = Rpos- 1;
TmpPos = Lpos;
NumEl ement s = Ri ght End Lpos + 1;
whi l e( Lpos <= Lef t End && Rpos <= Ri ght End)
i f ( A[ Lpos] <= A[ Rpos] )
TmpAr r ay[ TmpPos++] = A[ Lpos++] ;
el se
TmpAr r ay[ TmpPos++] = A[ Rpos++] ;
whi l e( Lpos <= Lef t End) / *Copyr est of f i r st hal f */
TmpAr r ay[ TmpPos++] = A[ Lpos++] ;
whi l e( Rpos <= Ri ght End) / *Copyr est of secondhal f */
TmpAr r ay[ TmpPos++] = A[ Rpos++] ;
f or ( i =0; i <NumEl ement s; i ++, Ri ght End- - )
A[ Ri ght End] = TmpAr r ay[ Ri ght End] ;
}
voi d MSor t ( El ement Type A[ ] , El ement Type TmpAr r ay[ ] ,
i nt Lef t , i nt Ri ght )
{
i nt Cent er ;
i f ( Lef t < Ri ght )
{
Cent er = ( Lef t + Ri ght ) / 2;
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page9 of 17
MSor t ( A, TmpAr r ay, Lef t , Cent er ) ;
MSor t ( A, TmpAr r ay, Cent er + 1, Ri ght ) ;
Mer ge( A, TmpAr r ay, Lef t , Cent er + 1, Ri ght ) ;
}
}
This algorithm is a classic divide-and-conquer strategy. The problem is divided into smaller
problems and solved recursively. Theconquering phase patches together the answers.
Merge-sort on an input sequenceS withN elements consists of three steps:
DividepartitionS into two sequencesS
1
andS
2
of about N/2 elements each
Recurrecursively sort S
1
andS
2
ConquermergeS
1
andS
2
into a unique sorted sequence
Like heap-sort
It uses a comparator
It has O(N log N) running time
Unlike heap-sort
It does not use an auxiliary priority queue
It accesses data in a sequential manner
Another variant of merge sort is to have the process done in the input array itself as shown:
Quick Sort
Quicksort is the fastest known sorting algorithm in practice. Its average running time is O(N
log N). It is very fast, mainly due to a very tight and highly optimized inner loop. Like
mergesort, quicksort is a divide-and-conquer recursive algorithm. The basic algorithm to sort
an arrayS consists of the following four easy steps:
1. If the number of elements inS is 0 or 1, then return.
2. Pick any element v inS. This is called the pivot.
3. Partition S - {v} (the remaining elements in S) into two disjoint groups:
S
1
={x S - {v}| x v}, and S
2
={x S -{v}| x v}
4. Return {quicksort(S
1
) followed byv followed byquicksort(S
2
)}.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page10 of 17
The action of quicksort on a set of numbers is shown.
Pros: Extremely fast.
Cons: Very complex algorithm, massively recursive.
Picking the Pivot
Although the algorithm works no matter which element is chosen as pivot, some choices are
obviously better than others.
A Wrong WayThe popular, uninformed choice is to use the first element as the pivot. This
is acceptable if the input is random, but if the input is presorted or in reverse order, then the
pivot provides a poor partition, because virtually all the elements go into S
1
or S
2
. An
alternative is choosing the larger of the first two distinct keys as pivot, but this has the same
bad properties as merely choosing the first key.
A Safe ManeuverA safe course is merely to choose the pivot randomly. This strategy is
generally perfectly safe, unless the random number generator has a flaw, since it is very
unlikely that a random pivot would consistently provide a poor partition.
Median-of-Three PartitioningThe median of a group of N numbers is the [N/2] th largest
number. The best choice of pivot would be the median of the file. Unfortunately, this is hard
to calculate and would slow down quicksort considerably. A good estimate can be obtained
by picking three elements randomly and using the median of these three as pivot.
Partitioning Strategy
The first step is to get the pivot element out of the way by swapping it with the last element. i
starts at the first element andj starts at the next-to-last element.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page11 of 17
Whilei is to the left of j, we move i right, skipping over elements that are smaller than the
pivot. Movej left, skipping over elements that are larger than the pivot. When i and j have
stopped, i is pointing at a large element and j is pointing at a small element. If i is to the left
of j, those elements are swapped. The effect is to push a large element to the right and a small
element to the left.
Next swap the elements pointed to byi andj and repeat the process until i andj cross.
At this stage, i andj have crossed, so no swap is performed. The final part of the partitioning
is to swap thepivot element with the element pointed to byi.
When the pivot is swapped with i in the last step, every element in a position P <i must be
small. This is because either position P contained a small element to start with, or the large
element originally in position P was replaced during a swap. A similar argument shows that
elements in positions P >i must be large. Care should be taken when handling keys that are
equal to the pivot.
For very small arrays (N 20), quicksort does not perform as well as insertion sort. A
common solution is not to use quicksort recursively for small arrays.
El ement Type Medi an3( El ement Type A[ ] , i nt Lef t , i nt Ri ght )
{
i nt Cent er = ( Lef t + Ri ght ) / 2;
i f ( A[ Lef t ] > A[ Cent er ] )
Swap( &A[ Lef t ] , &A[ Cent er ] ) ;
i f ( A[ Lef t ] > A[ Ri ght ] )
Swap( &A[ Lef t ] , &A[ Ri ght ] ) ;
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page12 of 17
i f ( A[ Cent er ] > A[ Ri ght ] )
Swap( &A[ Cent er ] , &A[ Ri ght ] ) ;
Swap( &A[ Cent er ] , &A[ Ri ght - 1 ] ) ; / * Hi de pi vot */
r et ur n A[ Ri ght - 1 ] ; / * Ret ur n pi vot */
}
voi d Qsor t ( El ement Type A[ ] , i nt Lef t , i nt Ri ght )
{
i nt i , j ;
El ement Type Pi vot ;
i f ( Lef t + Cut of f <= Ri ght )
{
Pi vot = Medi an3( A, Lef t , Ri ght ) ;
i = Lef t ; j = Ri ght - 1;
f or ( ; ; )
{
whi l e( A[ ++i ] < Pi vot ) { }
whi l e( A[ - - j ] > Pi vot ) { }
i f ( i < j )
Swap( &A[ i ] , &A[ j ] ) ;
el se
br eak;
}
Swap( &A[ i ] , &A[ Ri ght - 1 ] ) ; / *Rest or e pi vot */
Qsor t ( A, Lef t , i - 1 ) ;
Qsor t ( A, i + 1, Ri ght ) ;
}
el se / * Do an i nser t i on sor t on t he subar r ay */
I nser t i onSor t ( A + Lef t , Ri ght - Lef t + 1 ) ;
}
Indirect Sort
So far sorting has been an array of integers but when elements are large structures, copying
the items can be very expensive. This can be solved by doing indirect sorting.
For instance, payroll records with each record consisting of a name, address, phone number,
salary, and tax. To sort this information by one particular field, such asname the fundamental
operation is the swap, but swapping two structures can be a very expensive operation,
because the structures are potentially large.
The solution is to have an additional array of pointers where each element of the pointer array
points to an structure in the original array. When sorting, compare keys in the original array
but swap the element in the pointer array. This saves a lot of copying at the expense of
indirect references to the elements of the original array without tremendous loss of efficiency.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page13 of 17
For example the above data is sorted by the fieldDept as follows:
Now, to access the records in sorted order, use the order provided by the index column. In
this case, (3, 2, 4, 1).
Bucket Sort
Bucket sort is possibly the simplest distribution sorting algorithm. The essential requirement
is that the size of the universe from which the elements to be sorted are drawn is a small,
fixed constant, say M.
The input A
1
, A
2
, . . . , A
n
must consist of only positive integers smaller than M i.e., the set of
integers in the interval [0, M-1]. Keep an array calledCount, of sizeM, which is initialized to
all 0s. Thus count has M cells or buckets or counters, which are initially empty. When A
i
is
read, increment Count[A
i
] by 1, i.e., the i
th
counter keeps track of the number of occurrences
of the i
th
element. After all the input is read, scan the Count array, printing out a
representation of the sorted list Finally, the sorted result is produced by first placing the
required number of zeroes in the array, then the required number of ones, followed by the
twos, and so on, up to m-1.. This algorithm takes O(M +N).
In the figure below, the universal set is assumed to be {0, 1, . . ., 9}. Therefore, ten counters
are required-one to keep track of the number of zeroes, one to keep track of the number of
ones, and so on. A single pass through the data suffices to count all of the elements. Once the
counts have been determined, the sorted sequence is easily obtained. E.g., the sorted sequence
contains no zeroes, two ones, one two, and so on
This algorithm uses a more powerful operation than simple comparisons. By incrementing
the appropriate bucket, the algorithm essentially performs anM-way comparison in unit time.
This is similar to the strategy used in extendible hashing.
Although bucket sort seems like much too trivial an algorithm to be useful, it turns out that
there are many cases where the input is only small integers, so that using a method like
quicksort is really overkill.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page14 of 17
External Sorting
All the algorithms examined so far require that the input fit into main memory. There are,
however, applications where the input is much too large to fit into memory. This section will
discuss external sorting algorithms, which are designed to handle very large inputs.
Most of the internal sorting algorithms take advantage of the fact that memory is directly
addressable. For instance, shellsort compares elements A[i] and A[i - h
k
] in one time unit. If
the input is on a tape, then these operations lose their efficiency, since elements on a tape can
only be accessed sequentially. Even if the data is on a disk, there is still a practical loss of
efficiency because of the delay required to spin the disk and move the disk head. The time it
takes to sort the input is insignificant compared to the time to read the input.
Assume that there are at least three tape drives to perform the sorting. Two drives are
required to do an efficient sort; the third drive simplifies matters.
Simple Algorithm
The basic external sorting algorithm uses the merge routine from mergesort. Suppose there
are four tapes, T
a1
, T
a2
, T
b1
, T
b2
, which are two input and two output tapes. Depending on the
point in the algorithm, thea and b tapes are either input tapes or output tapes. Assume that
data is initially on T
a1
and the internal memory can hold (and sort) M records at a time. A
natural first step is to readM records at a time from the input tape, sort the records internally,
and then write the sorted records alternately to T
b1
and T
b2
. Each set of sorted records is
termed as arun. Thereafter rewind all the tapes.
Consider the following input.
If M =3, then after the runs are constructed, the tapes will contain data as follows
NowT
b1
andT
b2
contain a group of runs. Take the first run from each tape and merge them,
writing the result, which is a run twice as long, ontoT
a1
. Then take the next run from each
tape, merge these, and write the result toT
a2
. Continue this process, alternating betweenT
a1
andT
a2
, until either T
b1
or T
b2
is empty. At this point either both are empty or there is one run
left. In the latter case, copy this run to the appropriate tape.
Rewind all four tapes, and repeat the same steps, this time using thea tapes as input and theb
tapes as output. This will give runs of 4M. Continue the process until one run of length N is
obtained.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page15 of 17
This algorithm will require Log(N/M) passes, plus the initial run constructing pass.
Multiway Merge
If there are extra tapes, then the number of passes required to sort the input can be reduced.
This is done by extending the basic (two-way) merge to ak-way merge.
Merging two runs is done by winding each input tape to the beginning of each run. Then the
smaller element is found, placed on an output tape, and the appropriate input tape is
advanced. If there are k input tapes, this strategy works the same way, the only difference
being that it is slightly more complicated to find the smallest of the k elements.
The smallest of these elements can be found by using a priority queue. To obtain the next
element to write on the output tape, perform a DeleteMin operation. The appropriate input
tape is advanced, and if the run on the input tape is not yet completed, insert the new element
into the priority queue.
Continuing with the same example, the input is distributed onto the three tapes as shown.
Two more passes are required to complete the sort of three-way merging.
After the initial run construction phase, the number of passes required usingk-way merging is
log
k
(N/M) , because the runs get k times as large in each pass.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page16 of 17
Polyphase Merge
The k-way merging strategy requires the use of 2
k
tapes. This could be prohibitive for some
applications. It is possible to get by with only k+1 tapes. As an example, two-way merging is
performed using only three tapes.
Suppose there are 3 tapes, T
1
, T
2
, and T
3
, and an input file on T
1
that will produce 34 runs.
One option is to put 17 runs on each of T
2
and T
3
. Next merge this result onto T
1
, obtaining
one tape with 17 runs. The problem is that since all the runs are on one tape, now some of
these runs must be put onT
2
to perform another merge. The logical way to do this is to copy
the first eight runs fromT
1
onto T
2
and then perform the merge. This has the effect of adding
an extra half pass for every pass.
An alternative method is to split the original 34 runs unevenly. Lets put 21 runs onT
2
and 13
runs onT
3
. Now merge 13 runs onto T
1
beforeT
3
was empty. At this point, rewindT
1
andT
3
,
and mergeT
1
, with 13 runs, and T
2
, which has 8 runs, onto T
3
. Then merge 8 runs until T
2
was empty, which would leave 5 runs left onT
1
and 8 runs onT
3
. Now mergeT
1
andT
3
, and
so on. The following table below shows the number of runs on each tape after each pass.
If the number of runs is a Fibonacci number F
n
, then the best way to distribute them is to split
them into two Fibonacci numbers F
n-1
and F
n-2
. Otherwise, it is necessary to pad the tape with
dummy runs in order to get the number of runs up to a Fibonacci number.
Replacement Selection
An algorithm for constructing runs is considered. This technique is commonly referred to as
replacement selection. Initially, M records are read into memory and placed in a priority
queue. Perform a DeleteMin, writing the smallest record to the output tape. Then read the
next record from the input tape. If it is larger than the record just written, add it to the priority
queue. Otherwise, it cannot go into the current run. Since the priority queue is smaller by one
element, store this new element in the dead space of the priority queue until the run is
completed and use the element for the next run.
Storing an element in the dead space is similar to what is done in heapsort. Continue doing
this until the size of the priority queue is zero, at which point the run is over. Start a new run
by building a new priority queue, using all the elements in the dead space. The run
construction withM =3 is shown below. Dead elements are indicated by an asterisk.
In this example, replacement selection produces only three runs, compared with the five runs
obtained by sorting. Because of this, a three-way merge finishes in one pass instead of two. If
the input is randomly distributed, replacement selection can be shown to produce runs of
average length2M. Since external sorts take so long, every pass saved can make a significant
difference in the running time.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Sorting
Page17 of 17
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Disjoint Set
Page1 of 5
Equivalence Relation
A relationR is defined on a setS if for every pair of elements (a, b), a, b S, a R b is either
true or false. If a R b is true, then we say that a is related tob. An equivalence relation is a
relationR that satisfies three properties:
1. (Reflexive) a R a, for all a S.
2. (Symmetric) a R b if and only if b R a.
3. (Transitive) a R b andb R c implies that a R c.
Electrical connectivity, where all connections are by metal wires, is an equivalence relation.
Dynamic Equivalence Relation
The input is initially a collection of n sets, each with one element. This initial representation
is that all relations (except reflexive relations) are false. Each set has a different element, so
that Si Sj =; i.e., the sets aredisjoint.
There are two permissible operations. The first is Find, which returns the name of the set
containing a given element. The second operation adds relations. To add the relation a ~b,
then check if a andb are already related. This is done by performing finds on botha and b
and checking whether they are in the same equivalence class. If they are not, then apply
Union. This operation merges the two equivalence classes containing a and b into a new
equivalence class. The result of is to create a new set S
k
=S
i
S
j
, destroying the originals
and preserving the disjointness of all sets.
The algorithm to do this is frequently known as the disjoint set Union/Find algorithm for this
reason. This algorithm is dynamic because, during the course of the algorithm, the sets can
change via theUnion operation. The solution to theUnion/Find problem makesUnions easy
but Finds hard. The running time for any sequences of at mostM finds and up to N - 1 unions
will be O(M +N).
Basic Data Structure
TheFind operation does not return any specific name, but Find on two elements return the
same answer if and only if they are in the same set. A tree can be used to represent each set,
since each element in a tree has the same root. Thus, theroot can be used to name the set.
Initially, each set contains one element. Since only the name of the parent is required, the tree
is stored implicitly in an array: each entryP[i] in the array represents the parent of element i.
If i is root, thenP[i] =0.
To perform aUnion of two sets, merge the two trees by making the root of one tree point to
the root of the other. Figures represent the forest after each of Union (5,6) Union (7,8), Union
(5,7). The new root after theUnion (X, Y) isX.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Disjoint Set
Page2 of 5
A Find (X) on element X is performed by returning the root of the tree containingX. The time
to perform this operation is proportional to the depth of the node representing X, so the
worst-case running time of a find is O(N). The running time is computed for a sequence of M
intermixed instructions. In this case, M consecutive operations could take O(M N) time in the
worst case.
voi d Set Uni on( Di sj Set S, Set Type Root 1, Set Type Root 2 )
{
S[ Root 2 ] = Root 1;
}
Set Type Fi nd( El ement Type X, Di sj Set S )
{
i f ( S[ X ] <= 0 )
r et ur n X;
el se
r et ur n Fi nd( S[ X ] , S ) ;
}
Smart Union Algorithm
A simple improvement to the Union is always to make the smaller tree a subtree of the larger,
breaking ties by any method known as union-by-size. The three unions in the preceding
example were all ties, and so can be considered to be performed by size. If the next operation
wasUnion (4, 5), then the forest is as shown.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Disjoint Set
Page3 of 5
If Unions are done by size, the depth of any node is never more than log N. This implies that
the running time for aFind operation is O(log N), and a sequence of M operations takes O(M
log N).
An alternative implementation, which also guarantees that all the trees will have depth at
most O(log N), is union-by-height. The height is kept track instead of the size, of each tree
and Union is performed by making the shallow tree a subtree of the deeper tree. This is an
easy algorithm, since the height of a tree increases only when two equally deep trees are
joined. Thus, union-by-height is a trivial modification of union-by-size.
/* Union-by-height */
voi d Set Uni on( Di sj Set S, Set Type Root 1, Set Type Root 2 )
{
i f ( S[ Root 2] < S[ Root 1] ) / *Root 2 i s deeper set */
S[ Root 1] = Root 2; / * Make Root 2 new r oot */
el se
{
i f ( S[ Root 1] == S[ Root 2] ) / * Same hei ght , */
S[ Root 1 ] - - ; / * so updat e */
S[ Root 2 ] = Root 1;
}
}
The following table shows the table representation for union-by-size and union-by-height for
the above tree.
Path Compression
The worst case of Union/Find algorithm discussed so far is O(M log N ) and can occur
frequently. This is based on the observation that any method to perform theUnions will yield
the same worst-case trees, since it must break ties arbitrarily. Therefore, the only way to
speed the algorithm up, without reworking the data structure entirely, is to do something
clever on theFind operation known aspath compression.
Path compression is performed during a Find operation and is independent of the strategy
used to perform Unions. The effect of path compression is that every node on the path fromX
to theroot has its parent changed to theroot.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Disjoint Set
Page4 of 5
The effect of path compression after Find (15) on the generic worst tree is shown below.
The effect of path compression is that with an extra two pointer moves, nodes 13 and 14 are
now one position closer to the root and nodes 15 and 16 are now two positions closer. Thus,
fast future accesses on these nodes will pay for the extra work to do the path compression.
Set Type Fi nd( El ement Type X, Di sj Set S )
{
i f ( S[ X ] <= 0 )
r et ur n X;
el se
r et ur n S[ X ] = Fi nd( S[ X ] , S ) ;
}
As the code shows, path compression is a trivial change to the basicFind algorithm. The only
change to the find routine is that S[X] is made equal to the value returned by find; thus after
the root of the set is found recursively, X is made to point directly to it. This occurs
recursively to every node on the path to the root, so this implements path compression.
When Unions are done arbitrarily, path compression is a good idea, because there is an
abundance of deep nodes and these are brought near the root by path compression. It has been
proven that when path compression is done a sequence of M operations requires at most O(M
log N) time.
Path compression is perfectly compatible with union-by-size, and thus both routines can be
implemented at the same time. Path compression is not entirely compatible with union-by-
height, because path compression can change heights of the trees.
Application
Let there be a network of computers and a list of bidirectional connections; each of these
connections allows a file transfer from one computer to another. Is it possible to send a file
from any computer on the network to any other? An extra restriction is that the problem must
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Disjoint Set
Page5 of 5
be solved on-line. Thus, the list of connections is presented one at a time, and the algorithm
must be prepared to give an answer at any point.
An algorithm to solve this problem can initially put every computer in its own set. The
invariant is that two computers can transfer files if and only if they are in the same set. Thus
the ability to transfer files forms an equivalence relation. Now read connections one at a time,
say (u, v), then test to see whether u and v are in the same set and do nothing if they are. If
they are in different sets, then merge their sets. At the end of the algorithm, the graph is
connected if and only if there is exactly one set.
If there areM connections andN computers, the space requirement is O(N). Usingunion-by-
size and path compression, the worst-case running time is O(M (M, N)), since there are2M
finds and at most N - 1 unions. Thus running time is linear for all practical purposes.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page1 of 18
Greedy Algorithm
Greedy algorithms work in phases. In each phase, a decision is made that appears to be good,
without regard for future consequences, i.e., some local optimum is chosen. When the
algorithm terminates, hope that the local optimum is equal to the global optimum. If this is
the case, then the algorithm is correct; otherwise, the algorithm has produced a suboptimal
solution.
If the absolute best answer is not required, then simple greedy algorithms are sometimes used
to generate approximate answers, rather than using the more complicated algorithms
generally required to generate an exact answer. Dijkstra's, Prim's, and Kruskal's algorithms
are all greedy algorithms. There are several real-life examples of greedy algorithms. Traffic
problems provide an example where making locally optimal choices does not always work.
A Simple Scheduling Problem
Given jobs j
1
, j
2
, . . . , j
N
, all with known running times t
1
, t
2
, . . . , t
N
, respectively. The
question is: What is the best way to schedule these jobs in order to minimize the average
completion time? Assume non-preemptive scheduling, i.e., once a job is started, it must run
to completion.
Single Processor
Consider the example, which has four jobs and associated running times as shown below:
Job Time
j
1
15
j
2
8
j
3
3
j
4
10
Let the jobs in the schedule be j
i1
, j
i2
, . . . , j
iN
. The first job finishes in time t
i1
. The second job
finishes after t
i1
+ t
i2
, and the third job finishes after t
i1
+ t
i2
+ t
i3
. The total cost of the
schedule C is
One possible schedule (First Come First Serve) is shown below with an average completion
time of 25 units ( [15+23+26+36] / 4 ).
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page2 of 18
Another better schedule (Shortest J ob First) has a mean completion of 17.75 and isoptimal.
Multiple Processor Case
The above problem could be extended to the case of several processors. Let there be jobs j
1
,
j
2
, . . . , j
N
, with associated running times t
1
, t
2
, . . . , t
N
, respectively, and a number of
processors, say P. Assume without loss of generality that the jobs are ordered, shortest
running time first.
Consider an example with P =3 processor and 9 jobs with their runtimes as follows
Job Time
j
1
3
j
2
5
j
3
6
j
4
10
j
5
11
j
6
14
j
7
15
j
8
18
j
9
20
The following figure shows an optimal arrangement to minimize mean completion time. Jobs
j
1
, j
4
, and j
7
are run on Processor 1. Processor 2 handles j
2
, j
5
, and j
8
, and Processor 3 runs the
remaining jobs. The total time to completion is 165, with a mean of 18.33
An alternative optimal solution is shown below
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page3 of 18
Minimizing the Final Completion Time
If the same user owns all these jobs, then minimizing the final completion time is the
preferable method of scheduling. This does not improve the mean completion time, but
minimizing the final completion time is apparently much harder than minimizing the mean
completion time, i.e., it is NP-Complete. This is achieved by the following ordering with a
completion time of 34, better than the previous two ordering of 40 and 38.
Huffman Codes
Another application of greedy algorithm is known as file compression. If the size of the
character set isC, then logC bits are needed in a standard encoding.
Consider a file that contains only the characters a, e, i, s, t, plus blank spaces and newlines.
The statistical data available is that the file has tena's, fifteene's, twelvei's, threes's, four t's,
thirteen blanks, and one newline. As shown in the table the file requires 174 bits to represent,
since there are 58 characters and each character requires three bits.
Character Code Frequency Total Bits
a 001 10 30
e 010 15 45
i 011 12 36
s 100 3 9
t 101 4 12
space 110 13 39
newline 111 1 3
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page4 of 18
The binary code that represents the alphabet can be represented by the binary tree as shown:
The tree shown above has data only at the leaves. The representation of each character can be
found by starting at the root and recording the path, using a 0 to indicate the left branch and a
1 to indicate the right branch. For instance, s is reached by going left, then right, and finally
right. This is encoded as 011. This data structure is sometimes referred to as a trie. If
character c
i
is at depthd
i
and occursf
i
times, then the cost of the code is d
i
f
i
.
The above tree can be modified to have a new cost of 173 as shown below. The new one is a
full tree, thus the code is an optimal one.
If the characters are placed only at the leaves, any sequence of bits can always be decoded
unambiguously. For instance, suppose the encoded string is 0100111100010110001000111. 0
is not a character code, 01 is not a character code, but 010 representsi, so the first character is
i. Then 011 follows, giving at. Then 11 follows, which is a newline. The remainder of the
code isa, space, t, i, e, andnewline.
Therefore, the basic problem is to find the full binary tree of minimum total cost, where all
characters are contained in the leaves. Theoptimal tree for the above character set is shown
below as this code uses only 146 bits.
The algorithm to construct a coding tree was given by Huffman in 1952 and is known as
Huffmans code or algorithm.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page5 of 18
Huffmans Algorithm
The algorithm is described as follows: Assume that the number of characters isC. A forest of
trees is maintained. Theweight of a tree is equal to the sum of the frequencies of its leaves. C
1 times, select the two trees, T
1
and T
2
, of smallest weight, breaking ties arbitrarily, and
form a new tree with subtrees T
l
and T
2
.
At the beginning of the algorithm, there are C single-node trees-one for each character. At the
end of the algorithm there is one tree, and this is the optimal Huffman coding tree.
The following figure shows the initial forest, with the weight of each tree is shown in
numbers above the root.
The two trees of lowest weight are merged together, creating the forest as shown below.
Name the new root T1 and makes the left child arbitrarily. The total weight of the new tree is
just the sum of the weights of the old trees.
Now there are six trees, and we again select the two trees of smallest weight. These happen
to beT1 andt, which are then merged into a new tree with root T2 and weight 8 as shown.
The third step mergesT2 anda, creatingT3, with weight 10 +8 =18.
After the third merge is completed, the two trees of lowest weight are the single-node trees
representingi and theblank space. These trees are merged into the new tree with root T4.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page6 of 18
The fifth step is to merge the trees with rootse andT3, since these trees have the two smallest
weights.
Finally, the optimal tree, is obtained by merging the two remaining trees. with root T6.
If the trees are maintained in a priority queue, ordered by weight, then the running time is
O(C log C), since there will be oneBuildHeap, 2C - 2 DeleteMins, and C 2 inserts, on the
priority queue.
Approximate Bin Packing
given n items of sizes s
1
, s
2
, . . . , s
N
. All sizes satisfy 0 < s
i
1.The problem is to pack these
items in the fewest number of bins, given that each bin has unit capacity. There are two
versions of the bin packing problem.
1. On-lineeach item must be placed in a bin before the next item can be processed.
2. Off-lineNothing is done until all the input has been read.Items are placed in the bin
thereafter.
Online Algorithms
An on-line algorithm does not always give an optimal solution. There are three simple
algorithms that guarantee that the number of bins used is no more than twice optimal.
1. Next Fit
2. First Fit
3. Best Fit
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page7 of 18
As an example, assume item list with sizes 0.2, 0.5, 0.4, 0.7, 0.1, 0.3, and 0.8.
Next Fit
The simplest algorithm is next fit. When processing any item, check to see whether it fits in
the same bin as the last item. If it does, it is placed there; otherwise, a new bin is created. This
algorithm is simple to implement and runs in linear time. The packing using next-fit strategy
is shown below:
Not only is next fit simple to program, its worst-case behavior is also easy to analyze.
First Fit
Although next fit has a reasonable performance guarantee, it performs poorly in practice,
because it creates new bins when it does not need to. In the sample run, it could have placed
the item of size 0.3 in either B
1
or B
2
, rather than create a new bin.
The first fit strategy is to scan the bins in order and place the new item in the first bin that is
large enough to hold it. Thus, a new bin is created only when the results of previous
placements have left no other alternative.
A simple method of implementing first fit would process each item by scanning down the list
of bins sequentially. This would take O(N
2
). It is also possible to implement first fit to run in
O( N log N).
Best Fit
In best-fit, instead of placing a new item in the first spot that is found, it is placed in the
tightest spot among all the bins. For the given sample, 0.3 is placed in B
3
where it fits
perfectly, instead of B
2
.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page8 of 18
Best-fit is never more than roughly 1.7 times as bad as optimal and does perform better for
random inputs if an O(N log N) algorithm is used.
Off-line Algorithms
In off-line algorithm, the entire item list is known before any packing and therefore optimal
packing could be achieved by exhaustive search. The major problem in on-line algorithm is
that it is hard to pack the large items, especially when they occur late in the input. In off-line
algorithms the items are sorted, placing the largest item first. Then first fit or best fit is
applied, known asfirst fit decreasing andbest fit decreasing.
The first-fit decreasing for the given sample is shown below. The results for best fit
decreasing are almost identical.
Divide and Conquer
Divide and conquer algorithms consist of two parts:
Divide: Smaller problems are solved recursively (except, base cases).
Conquer: The solution to the original problem is then formed from the solutions
to the subproblems.
Routines which contains at least two recursive calls are called divide and conquer algorithms,
while routines which contains only one recursive call are not. The classic examples of divide
and conquer are mergesort and quicksort. The running time is O(N log N).
The three problems discussed are:
1. Closest-Points Problem
2. The Selection Problem
3. The Arithmetic Problems
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page9 of 18
Closest Point Problem
The input to the problem is a list P of points in a plane. If p
l
=(x
1
, y
1
) andp
2
=(x
2
, y
2
), then
the Euclidean distance betweenp
l
andp
2
is [(x
1
- x
2
)
2
+(y
l
- y
2
)
2
]
l/2
. It is required to find the
closest pair of points. If there areN points, then there areN (N - 1)/2 pairs of distances.
Checking all of these exhaustively is expensive with a runtime of O(N
2
).
Assume the points to be sorted byX coordinate. At worst, this adds O(N log N) to the final
time bound. Since the proposed algorithm has an O(N log N) bound for the entire algorithm,
the sort would then be free, from a complexity standpoint.
Consider a small sample point set P. Since the points are sorted byX coordinate, draw an
imaginary vertical line that partitions the points set into two halves, P
L
andP
R
. Now either the
closest points are both inP
L
, or they are both inP
R
, or one is inP
L
and the other is inP
R
. Call
these distancesd
L
, d
R
, andd
C
as shown.
d
L
andd
R
can be computed recursively. The problem is to computed
C
Let =min(d
L
, d
R
). d
C
needs to be computed if d
C
improves on . If d
C
is such a distance, then the two points that
define d
C
must be within of the dividing line called as a strip. This observation limits the
number of points that need to be considered ( =d
R
in the given example).
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page10 of 18
For large point sets that are uniformly distributed, the number of points that are expected to
be in the strip is very small. Thus, a brute force calculation on these points in O(N) time.
/* Points are all in the strip */
f or ( i = 0; i < NumPoi nt sI nSt r i p; i ++ )
f or ( j = i +1; j < NumPoi nt sI nSt r i p; j ++ )
i f ( Di st ( P
i
, P
j
) < )
= Di st ( P
i
, P
j
) ;
In the worst case, all the points could be in the strip, so brute force does not always work in
linear time. This algorithm can be improved as follows: They coordinates of the two points
that defined
C
can differ by at most . Otherwise, d
C
>. Suppose that the points in the strip
are sorted by their y coordinates. Therefore, if P
i
andP
j
's y coordinates differ by more than ,
then proceed toP
i
+l.
/* Points are all in the strip and sorted by y coordinate */
f or ( i = 0; i < NumPoi nt sI nSt r i p; i ++ )
f or ( j = i +1; j < NumPoi nt sI nSt r i p; j ++ )
i f ( P
i
and P
j
' s coor di nat es di f f er by mor e
t han )
br eak; / * got o next pi */
el se
i f ( Di st ( P
i
, P
j
) < )
= Di st ( P
i
, P
j
) ;
For instance, that for point p
3
, only the two pointsp
4
andp
5
lie in the strip within vertical
distance.
Two lists are maintained. One is the point list sorted by x coordinate, and the other is the
point list sorted byy coordinate known asP andQ, respectively. These can be obtained by a
preprocessing sorting step at cost O(N log N) and thus does not affect the time bound. P
L
and
Q
L
are the lists passed to the left-half recursive call, andP
R
andQ
R
are the lists passed to the
right-half recursive call.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page11 of 18
Once the dividing line is known, step through Q sequentially, placing each element inQ
L
or
Q
R
, as appropriate. Thus Q
L
and Q
R
will be automatically sorted by y coordinate. When the
recursive calls return, scan through theQ list and discard all the points whosex coordinates
are not within the strip. Then Q contains only points in the strip, and these points are
guaranteed to be sorted by their y coordinates.This strategy ensures that the entire algorithm
is O (N log N), because only O(N) extra work is performed.
Selection Problem
The selection problem requires us to find thek
th
smallest element in a list S of N elements. Of
particular interest is the special case of finding the median. This occurs whenk =N/2 .
The basic algorithm is a simple recursive strategy. Assuming that elements are simply sorted,
an element v, known as the pivot, is chosen. The remaining elements are placed into two sets,
S
1
and S
2
. S
1
contains elements that are guaranteed to be no larger than v, and S
2
contains
elements that are no smaller thanv. Finally, if k |S
1
|, then thek
th
smallest element inS can be
found by recursively computing the k
th
smallest element in S
1
. If k = | S
1
| + 1, then the pivot
is the k
th
smallest element. Otherwise, the k
th
smallest element in S is the ( k - | S1| -1 )st
smallest element in S
2
. The main difference between this algorithm and quicksort is that there
is only one subproblem to solve instead of two.
For quicksort, a good choice for pivot was to pick three elements and use their median. This
gives some expectation that the pivot is not too bad, but does not provide a guarantee. Instead
of finding the median from a sample of random elements, we will find the median from a
sample of medians.
The basic pivot selection algorithm is as follows:
1. Arrange theN elements intoN/5 groups of 5 elements, ignoring the (at most four)
extra elements.
2. Find the median of each group. This gives a list M of N/5 medians.
3. Find the median of M. Return this as the pivot, v.
The termmedian-of-median-of-five partitioning is used to describe the quickselect algorithm
that uses the pivot selection rule given above. This partitioning guarantees that each recursive
subproblem is at most roughly 70 percent as large as the original. Also the pivot can be
computed quickly enough to guarantee an O (N) running time for the entire selection
algorithm.
Assuming that N is divisible by 5, so there are no extra elements and also that N/5 is odd, so
that the set M contains an odd number of elements. Thus for convenience, N is of the form
10k +5. Also assume that all the elements are distinct. The figure shows how the pivot might
be chosen whenN =45.3
In the above figure, v represents the element which is selected by the algorithm as pivot.
Sincev is the median of nine elements, and assuming that all elements are distinct, there must
be four medians that are larger than v and four that are smaller denoted as L and S
respectively.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page12 of 18
Consider a group of five elements with a large median (typeL). The median of the group is
smaller than two elements in the group and larger than two elements in the group. Let H
represent the huge elements. These are elements that are known to be larger than a large
median. Similarly, T represents the tiny elements, which are smaller than a small median.
There are 10 elements of type H: Two are in each of the groups with an L type median, and
two elements are in the same group asv. Similarly, there are 10 elements of type T.
Elements of type L or H are guaranteed to be larger than v, and elements of type S or T are
guaranteed to be smaller than v. There are thus guaranteed to be 14 large and 14 small
elements in our problem. Therefore, a recursive call could be on at most 45 - 14 - 1 =30
elements.
Extend this analysis to general N of the form 10k + 5. In this case, there are k elements of
type L and k elements of type S . There are 2k + 2 elements of type H, and also 2k + 2
elements of type T. Thus, there are 3k +2 elements that are guaranteed to be larger than v
and 3k +2 elements that are guaranteed to be smaller. Thus, in this case, the recursive call
can contain at most 7k +2 <0.7N elements. The selection algorithm is recursively called on
N/5 elements.
Improvements to Arithmetic Operations
Integer Multiplication
For large numbers, multiplication does not take constant time. The natural multiplication
algorithm takes quadratic time whereas divide and conquer algorithm runs in subquadratic
time.
To multiply two N-digit numbers X and Y by hand requires O(N
2
) operations, because each
digit in X is multiplied by each digit in Y. If X =61,438,521 and Y =94,736,407, XY =
5,820,464,730,934,047.
BreakX andY into two halves, consisting of the most significant and least significant digits,
respectively. ThenX
L
=6,143, X
R
=8,521, Y
L
=9,473, andY
R
=6,407.
ThusX =X
L
10
4
+X
R
, Y =Y
L
10
4
+Y
R
and XY =X
L
Y
L
10
8
+(X
L
Y
R
+X
R
Y
L
)10
4
+X
R
Y
R
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page13 of 18
The above equation consists of four multiplications, X
L
Y
L
, X
L
Y
R
, X
R
Y
L
, and X
R
Y
R
, which are
each half the size of the original problem (N/2 digits). The multiplications by 10
8
and 10
4
amount to the placing of zeros. This and the subsequent additions add only O(N) additional
work. If these four multiplications are performed recursively as above, stopping at an
appropriate base case, the runtime is still O(N
2
).
To achieve a subquadratic algorithm, it must use less than four recursive calls. The key
observation is that X
L
Y
R
+X
R
Y
L
=(X
L
- X
R
)(Y
R
- Y
L
) +X
L
Y
L
+X
R
Y
R
. Thus, instead of using two
multiplications to compute the coefficient of 10
4
, use one multiplication, plus the result of
two multiplications that have already been performed as shown below.
Matrix Multiplication
A fundamental numerical problem is the multiplication of two matrices. A simple O( N
3
)
algorithm to compute C =AB, where A, B, and C are N N matrices. To compute C
i,j
,
compute the dot product of the i
th
row in A with the j
th
column in B.
voi d Mat r i xMul t i pl y( Mat r i x A, Mat r i x B, Mat r i x C, i nt N)
{
i nt i , j , k;
f or ( i = 0; i < N; i ++ ) / * I ni t i al i zat i on */
f or ( j = 0; j < N; j ++ )
C[ i ] [ j ] = 0. 0;
f or ( i = 0; i < N; i ++ )
f or ( j = 0; j < N; j ++ )
f or ( k = 0; k < N; k++ )
C[ i ] [ j ] += A[ i ] [ k] * B[ k] [ j ] ;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page14 of 18
The basic idea of Strassen's algorithm is to divide each matrix into four quadrants, as shown
Therefore, the computations are made easy as shown below:
C
1,1
=A
1,1
B
1,1
+A
1,2
B
2,1
C
1,2
=A
1,1
B
1,2
+A
1,2
B
2,2
C
2,1
=A
2,1
B
1,1
+A
2,2
B
2,1
C
2,2
=A
2,1
B
1,2
+ A
2,2
B
2,2
Consider the following example for matrix multiplication.
Now eight N/2 by N/2 matrix are defined.
The seven multiplications are:
M
1
=(A
1,2
A
2,2
) (B
2,1
+B
2,2
)
M
2
=(A
1,1
+A
2,2
) (B
1,1
+B
2,2
)
M
3
=(A
1,1
A
2,1
) (B
1,1
+B
1,2
)
M
4
=(A
1,1
+A
1,2
) B
2,2
M
5
=A
1,1
(B
1,2
B
2,2
)
M
6
=A
2,2
(B
2,1
B
1,1
)
M
7
=(A
2,1
+A
2,2
) B
1,1
Once the multiplications are performed, the final answer can be obtained with few more
additions and subtractions.
C
1,1
=M
1
+M
2
M
4
+M
6
C
1,2
=M
4
+M
5
C
1,3
=M
6
+M
7
C
1,4
=M
2
M
3
+M
5
M
7
The solution of this recurrence is T(N) =O( N
log
2
7
) =O( N
2.81
).
Thus eight N/2 by N/2 matrix multiplications could be performed and four N/2 by N/2 matrix
additions is done. But still it is of the order O(N
3
).
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page15 of 18
To reduce the number of sub problems below 8, Strassen used a strategy similar to the integer
multiplication divide and conquer algorithm and thus only seven recursive calls resulted by
carefully arranging the computations.
Randomized Algorithms
In randomized algorithms, at least once during the algorithm, a random number is used to
make a decision. The running time of the algorithm depends not only on the particular input,
but also on the random numbers that occur.
The worst-case running time of a randomized algorithm is almost the same as the worst-case
running time of nonrandomized algorithm. The important difference is that a good
randomized algorithm has no bad inputs, but only bad random numbers.
Consider two variants of quicksort. Variant A uses the first element as pivot, while variant B
uses a randomly chosen element as pivot. In both cases, the worst-case running time is O(N
2
),
because it is possible at each step that the largest element is chosen as pivot. Variant A will
run in O(N
2
) time every single time it is given an already sorted list. If variant B is presented
with the same input twice, it will have two different running times, depending on what
random numbers occur.
Random Number Generators
Since randomized algorithms require random numbers, a method to generate them is
required. Suppose for flipping a coin; it is required to generate a 0 or 1 randomly. One way to
do this is to examine the system clock and use the lowest bit. The problem is that this does
not work well if a sequence of random numbers is needed.
What really needed is a sequence of random numbers and these numbers should appear
independent. The standard method to generate random numbers is linear congruential
generator, which was devised by Lehmer in 1951. Numbers x
1
, x
2
, . . . are generated
satisfying
x
i+1
=A x
i
modM.
To start the sequence, some value of x
0
must be given. This value is known as theseed. If x
0
is chosen such that 1 x
0
<M, and if A and M are correctly chosen, then sequence would be
random.
As an example, if M =11, A =7, andx
0
=1, then the numbers generated are 7, 5, 2, 3, 10, 4,
6, 9, 8, 1, 7, 5, 2, . . . After M - 1 =10 numbers, the sequence repeats. If M is prime, there are
choices of A that give a full period of M 1. if M is chosen to be a large, 31-bit prime, the
period should be significantly large for most applications.
It is common for machines have random have a random number generator. The following
one works on 32-bit machines. Compute the quotient of M/A and define these as Q and R
respectively.
doubl e Random( voi d )
{
l ong TmpSeed;
TmpSeed = A * ( Seed %Q ) - R * ( Seed / Q ) ;
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page16 of 18
i f ( TmpSeed >= 0 )
Seed = TmpSeed;
el se
Seed = TmpSeed + M;
r et ur n ( doubl e ) Seed / M;
}
voi d I ni t i al i ze( unsi gned l ong I ni t Val )
{
Seed = I ni t Val ;
}
Unfortunately many libraries have generators based on the function x
i+1
=(Ax
i
+C) mod 2
B
whereB is number of bits in the machines integer andC is odd.
Skip Lists
The first use of randomization is a data structure that supports both searching and insertion in
O(log N) expected time. The simplest possible data structure to support searching is the
linked list. The time to perform a search is proportional to the number of nodes that have to
be examined, which is at most N.
The figure below shows a linked list in which every other node has an additional pointer to
the node two ahead of it in the list. Because of this, at most [N/2] +1 nodes are examined in
the worst case.
Extending the above idea, every fourth node has a pointer to the node four ahead. Only [N/4]
+2 nodes are examined as show below.
The limiting case of this extension is shown above. Every 2
i
th node has a pointer to the node
2
i
ahead of it. The total number of pointers has only doubled, but now at most log N nodes
are examined during a search. The search in this data structure is essentially abinary search.
The problem with this data structure is that it is much too rigid to allow efficient insertion.
The key to making this data structure usable is to relax the structure conditions slightly.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page17 of 18
Define a level k node to be a node that hask pointers. As shown above, thei th pointer in any
level k node ( k i ) points to the next node with at least i levels. Drop the restriction that the
i th pointer points to the node 2
i
ahead, and replace it with the less restrictive condition above.
To insert a new element, allocate a new node for it. At this point what level the node should
be is decided. In the above figure, approximately 1/2
i
nodes are at level i. Choose the level of
the node randomly, in accordance with this probability distribution. The easiest way to do this
is to flip a coin until a head occurs and use the total number of flips as the node level. This is
known as skip list.
To perform a find in the skip list, start at the highest pointer at the header. Traverse along this
level until the next node is larger than the one currently looking for. When this occurs, go to
the next lower level and continue the strategy. When progress is stopped at level 1, either it is
in front of the node being searched, or the search node is not in the list.
To perform an insert, proceed as in a find, and keep track of each point when switching to a
lower level. The new node, whose level is determined randomly, is then spliced into the list.
The following figure shows how the skip list looks before and after an insertion.
Operations of Skip lists have O (log N) expected costs. Skip lists are similar to hash tables,
in that they require an estimate of the number of elements that will be in the list so that the
number of levels can be determined. If an estimate is not available, assume a large number or
use a technique similar to rehashing. Skip lists are as efficient as many balanced search tree
implementations and are certainly much simpler to implement in many languages.
Primality Testing
Primality testing is to examine the problem of determining whether or not a large number is
prime. The problem is of major theoretical interest, because nobody now knows how to test
whether ad-digit number N is prime in time polynomial ind.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit 5: Algorithm Design Techniques
Page18 of 18
The key to the following polynomial-time algorithm that tests for primality, is a well-known
theorem due to Fermat. If P is prime, and 0 <A <P, thenA
P-1
1(modP).
If the algorithm declares that the number is not prime, it is certain that the number is not
prime. If the algorithm declares that the number is prime, then, with high probability but not
100 percent certainty, the number is prime. The error probability does not depend on the
particular number that is being tested but instead depends on random choices made by the
algorithm. Thus, this algorithm occasionally makes a mistake, but the error ratio can be made
arbitrarily negligible.
HugeI nt RandI nt ( HugeI nt Low, HugeI nt Hi gh )
{
r et ur n r and( ) %( Hi gh - Low + 1 ) + Low;
}
HugeI nt Wi t ness( HugeI nt A, HugeI nt i , HugeI nt N )
{
HugeI nt X, Y;
i f ( i == 0 )
r et ur n 1;
X = Wi t ness( A, i / 2, N ) ;
i f ( X == 0 ) / *I f N r ecur si vel y composi t e, st op*/
r et ur n 0;
Y = ( X * X ) %N;
/ *N i s not pr i me i f non- t r i vi al r oot 1 i s f ound*/
i f ( Y == 1 && X ! = 1 && X ! = N - 1 )
r et ur n 0;
i f ( i %2 ! = 0 )
Y = ( A * Y ) %N;
r et ur n Y;
}
i nt I sPr i me( HugeI nt N )
{
r et ur n Wi t ness( RandI nt ( 2, N- 2) , N- 1, N ) == 1;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit5: Algorithm Design Techniques
Page1 of 12
Dynamic Programming
Any recursive mathematical formula could be directly translated to a recursive algorithm, but
the underlying reality is that often the compiler will not do justice to the recursive algorithm,
and an inefficient program results. In such cases, rewrite the recursive algorithm as a non-
recursive algorithm that systematically records the answers to the subproblems in a table. A
technique that makes use of this approach is known asdynamic programming.
The natural recursive program to compute the Fibonacci numbers was found to be very
inefficient.
i nt Fi b( i nt N )
{
i f ( N <= 1 )
r et ur n 1;
el se
r et ur n Fi b( N - 1 ) + Fi b( N - 2 ) ;
}
The above routine has a running time T(N) that satisfies T(N) T(N - 1) + T(N - 2) and is
exponential. If the compiler's recursion simulation algorithm were able to keep a list of all
precomputed values and not make a recursive call for an already solved subproblem, then the
exponential explosion could be avoided.
The following figure shows the growth of redundant calculations.
On the other hand, since to compute F
N
, all that is needed is F
N-
1 and F
N-2
, we only need to
record the two most recently computed Fibonacci numbers. This yields the O(N) algorithm
i nt Fi bonacci ( i nt N )
{
i nt i , Last , Next ToLast , Answer ;
i f ( N <= 1 )
r et ur n 1;
Last = Next ToLast = 1;
f or ( i = 2; i <= N; i ++ )
{
Answer = Last + Next ToLast ;
Next ToLast = Last ;
Last = Answer ;
}
r et ur n Answer ;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit5: Algorithm Design Techniques
Page2 of 12
Ordering Matrix Multiplication
Suppose four matrices are given, A, B, C, andD, of dimensionsA =50 10, B =10
40, C =40 30, and D =30 5. Although matrix multiplication is not commutative, it is
associative, i.e., the matrix product ABCD can be parenthesized, and thus evaluated, in any
order. Multiplying two matrices of dimensions p q andq r, respectively, uses pqr scalar
multiplications and is exponential. What is the best way to perform the three matrix
multiplications required to computeABCD?
In case of four matrices, it is simple to solve the problem by exhaustive search, since there
are only five ways to order the multiplications. Each case is evaluated as
1. (A((BC)D)): Evaluating BC requires 10 40 30 = 12,000 multiplications.
Evaluating (BC)D requires the 12,000 multiplications to compute BC, plus an
additional 10 30 5 =1,500 multiplications, for a total of 13,500. Evaluating
(A((BC)D) requires 13,500 multiplications for (BC)D, plus an additional 50 10 5
=2,500 multiplications, for a grand total of 16,000 multiplications.
2. (A(B(CD))): EvaluatingCD requires 40 30 5 =6,000 multiplications. Evaluating
B(CD) requires 6,000 multiplications to computeCD, plus an additional 10 40 5
=2,000 multiplications, for a total of 8,000. Evaluating (A(B(CD)) requires 8,000
multiplications for B(CD), plus an additional 50 10 5 =2,500 multiplications, for
a grand total of 10,500 multiplications.
3. ((AB)(CD)): EvaluatingCD requires 40 30 5 =6,000 multiplications. Evaluating
AB requires 50 10 40 =20,000 multiplications. Evaluating ((AB)(CD)) requires
6,000 multiplications for CD, 20,000 multiplications for AB, plus an additional 50
40 5 =10,000 multiplications for a grand total of 36,000 multiplications.
4. (((AB)C)D): Evaluating AB requires 50 10 40 = 20,000 multiplications.
Evaluating (AB)C requires the 20,000 multiplications to compute AB, plus an
additional 50 40 30 =60,000 multiplications, for a total of 80,000. Evaluating
(((AB)C)D) requires 80,000 multiplications for (AB)C, plus an additional 50 30 5
=7,500 multiplications, for a grand total of 87,500 multiplications.
5. ((A(BC))D): Evaluating BC requires 10 X 40 X 30 = 12,000 multiplications.
Evaluating A(BC) requires the 12,000 multiplications to compute BC, plus an
additional 50 X 10 X 30 =15,000 multiplications, for a total of 27,000. Evaluating
((A(BC))D) requires 27,000 multiplications for A(BC), plus an additional 50 X 30 X
5 =7,500 multiplications, for a grand total of 34,500 multiplications.
The calculations show that the best ordering uses roughly one-ninth the number of
multiplications as the worst ordering. Thus, it might be worthwhile to perform a few
calculations to determine the optimal ordering.
Suppose that the matrices are A
1
, A
2
, . . . , A
N
, and the last multiplication performed is
(A
1
A
2
. . . A
i
)(A
i+1
A
i+2
. . . A
N
). Then there are T(i) ways to compute (A
1
A
2
. . . A
i
) and
T(N i) ways to compute (A
i+1
A
i+2
. . . A
N
). Thus, there are T(i)T(N i) ways to compute
(A
1
A
2
. . . A
i
)(A
i+1
A
i+2
. . . A
N
) for each possiblei. The solution of this recurrence is the
well-known Catalan numbers, which grow exponentially.
Define M
Left,Right
to be the number of multiplications required in an optimal ordering,
then, if Left <Right,
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit5: Algorithm Design Techniques
Page3 of 12
The above equation implies that if we have an optimal multiplication arrangement of A
Left
A
Right
, the subproblems A
Left
A
i
and A
i+1
A
Right
cannot be performed suboptimally.
The formula translates directly to a recursive program, but such a program would be blatantly
inefficient. However, since there are only approximately N
2
/2 values of M
Left,Right
that ever
need to be computed, it is clear that a table can be used to store these values. Further, if Right
Left =k, then the only values M
x,y
are needed in the computation of M
Left,Right
satisfyingy
x <k. This tells the order in which the table should be computed.
Optimal Binary Search Tree
Given a list of words, w
1
, w
2
,..., w
N
, and fixed probabilities p
1
, p
2
, . . . , p
N
of their
occurrence. The problem is to arrange these words in a binary search tree in a way that
minimizes the expected total access time. In a binary search tree, the number of comparisons
needed to access an element at depth d is d + 1, so if w
i
is placed at depth d
i
, then the
objective is to minimize
The following table shows seven words along with their probability of occurrence.
The three possible binary search trees for the sample input is shown. The first tree was
formed using a greedy strategy. The word with the highest probability of being accessed was
placed at the root. The left and right subtrees were then formed recursively. Thesecond tree
is the perfectly balanced search tree. Neither of these trees is optimal, as demonstrated by the
existence of the third tree. The searching costs of these trees are also shown.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit5: Algorithm Design Techniques
Page4 of 12
A dynamic programming solution follows from two observations.
Once again, place the sorted words w
Left
, w
Left+1
, . . . , w
Right-1
, w
Right
into a binary search tree.
Suppose the optimal binary search tree has w
i
as the root, whereLeft i Right, then theleft
subtree must contain w
Left
, . . . ,w
i-1
, and the right subtree must contain w
i+1
. . .,w
Right
.
Further, both of these subtrees must also be optimal.
Theleft subtree has a cost of C
Left,i-1
, relative to its root, and theright subtree has a cost of C
i+l
,Right
relative to its root. Each node in these subtrees is one level deeper from w
i
than from
their respective roots.
A formula for the cost C
Left,Right
is
The iterative output of the algorithm is shown below.
For each subrange of words, the cost and root of the optimal binary search tree are
maintained. The bottommost entry, computes the optimal binary search tree for the entire set
of words in the input. The running time of this algorithm is O(N
3
).
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit5: Algorithm Design Techniques
Page5 of 12
The precise computation for the optimal binary search tree for a particular subrange, namely
am..if, is shown below. It is obtained by computing the minimum-cost tree obtained by
placingam, and, egg, andif at the root. For instance, whenand is placed at the root, the left
subtree contains am..am (of cost 0.18, via previous calculation), the right subtree contains
egg..if (of cost 0.35), and p
am
+p
and
+p
egg
+p
if
=0.68, for a total cost of 1.21.
All Pairs Shortest Path
An algorithm is required to compute shortest weighted paths between every pair of points in a
directed graph G =(V, E).
Dijkstra's algorithm provides the idea for the dynamic programming algorithm: Select the
vertices in sequential order. Define D
k,i,j
to be the weight of the shortest path from v
i
to v
j
that
uses only v
1
, v
2
, . . . ,v
k
as intermediates. By this definition, D
0,i,j
=ci,j, where c
i,j
is if (v
i
, v
j
)
is not an edge in the graph. Also, by definition, D
|V|,i,j
is the shortest path from v
i
to v
j
in the
graph.
The shortest path from v
i
to v
j
that uses only v
1
, v
2
, . . . ,v
k
as intermediates is the shortest
path that either does not use v
k
as an intermediate at all, or consists of the merging of the two
paths v
i
v
k
and v
k
v
j
, each of which uses only the first k - 1 vertices as intermediates. This
leads to the formula
D
k,i,j
=min{D
k - 1,i,j
, D
k - 1,i,k
+D
k - 1,k,j
}
voi d Al l Pai r s( TwoDi mAr r ay A, TwoDi mAr r ay D,
TwoDi mAr r ay Pat h, i nt N )
{
i nt i , j , k;
f or ( i = 0; i < N; i ++ ) / * I ni t i al i ze */
f or ( j = 0; j < N; j ++ )
{
D[ i ] [ j ] = A[ i ] [ j ] ;
Pat h[ i ] [ j ] = Not AVer t ex;
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit5: Algorithm Design Techniques
Page6 of 12
f or ( k = 0; k < N; k++ )
/ * Coni der each ver t ex as an i nt er medi at e */
f or ( i = 0; i < N; i ++ )
f or ( j = 0; j < N; j ++ )
i f ( D[ i ] [ k ] + D[ k ] [ j ] < D[ i ] [ j
] )
{
/ * Updat e shor t est pat h */
D[ i ] [ j ] = D[ i ] [ k ] + D[ k ] [ j
] ;
Pat h[ i ] [ j ] = k;
}
}
Backtracking Algorithms
A backtracking algorithm amounts to a clever implementation of exhaustive search, with
generally unfavorable performance. However, savings over a brute force exhaustive search
can be significant. An O(N
2
) algorithm for sorting is pretty bad, but an O( N
5
) algorithm for
the traveling salesman (or any NP-complete) problem would be a landmark result.
A practical example of a backtracking algorithm is the problem of arranging furniture in a
new house. There are many possibilities to try, but typically only a few are actually
considered. Starting with no arrangement, each piece of furniture is placed in some part of the
room. If all the furniture is placed and the owner is happy, then the algorithm terminates.
Although this algorithm is essentiallybrute force, it does not try all possibilities directly. For
instance, arrangements that consider placing the sofa in the kitchen are never tried. Many
other bad arrangements are discarded early, because an undesirable subset of the arrangement
is detected. The elimination of a large group of possibilities in one step is known aspruning.
Turnpike Reconstruction Problem
Given N points, p
1
, p
2
, . . . , p
N
, located on the x-axis. x
i
is the x coordinate of p
i
. Let us
further assume that x
1
= 0 and the points are given from left to right. These N points
determine N(N 1)/2 distances d
1
, d
2
, . . . , d
N
between every pair of points of the form | x
i
-
x
j
| ( i j ). The turnpike reconstruction problem is to reconstruct a point set from the
distances. This finds applications in physics and molecular biology. There is no algorithm
that is guaranteed to work in polynomial time. The algorithm presented runs in O( N
2
log N).
Let D be the set of distances, and assume that | D | =M =N(N 1)/2. Consider the example,
D ={1, 2, 2, 2, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8, 10}. Since | D | =15, N =6.
Start the algorithm by setting x
1
= 0. Clearly, x
6
= 10, since 10 is the largest element in D.
Therefore, remove 10 from D. The points that are placed and the remaining distances are as
shown below.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit5: Algorithm Design Techniques
Page7 of 12
The largest remaining distance is 8, which means that either x
2
= 2 or x
5
= 8. The choice is
unimportant, since either both choices lead to a solution, set x
5
=8 without affecting the
solution. Remove the distances x
6
- x
5
= 2 and x
5
- x
1
= 8 from D, obtaining the following
figure.
Since 7 is the largest value in D, either x
4
=7 or x
2
=3. If x4 =7, the distances x
6
- 7 =3 and
x
5
- 7 =1 must be present in D. If set x
2
=3, then 3 - x
1
=3 and x
5
- 3 =5 must be present in
D. Both of these distances are in D, therefore try one and see if it leads to a solution. If it
turns out that it does not, backtrack and try the other.
Trying the first choice, set x
4
=7, which leaves
Now the largest distance is 6, so either x
3
= 6 or x
2
= 4. If x
3
= 6, then x
4
- x
3
= 1, which is
impossible, since 1 is no longer in D. Suppose, if x
2
=4 then x
2
- x
0
=4, and x
5
- x
2
=4. This
is also impossible, since 4 only appears once in D. Thus, this line of reasoning leaves no
solution, sobacktrack.
Since x
4
= 7 failed to produce a solution, try x
2
= 3. If this also fails, give up and report no
solution.
Once again, choice is made between x
4
= 6 and x
3
= 4. x
3
=4 is impossible, because D only
has one occurrence of 4, and two would be implied by this choice. x
4
=6 is possible,
therefore the state is
The only remaining choice is to assign x3 =5; this works because it leaves D empty, and so a
solution exists.
The following figure shows a decision tree representing the actions taken to arrive at the
solution. The labels are placed in the branches' destination nodes. A node with an asterisk
indicates that the points chosen are inconsistent with the given distances; nodes with two
asterisks have onlyimpossible nodes as children, and thus represent an incorrect path.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit5: Algorithm Design Techniques
Page8 of 12
The driving routine, Turnpike, is shown in Figure 10.64. It receives the point array X, the
distance arrayD, andN. If a solution is discovered, thentrue will be returned, the answer will
be placed inX, andD will be empty. Otherwise, false will be returned, X will be undefined,
and the distance array D will be untouched. The routine sets x
1
, x
N-1
, and x
N
, alters D, and
calls the backtracking algorithmPlace to place the other points.
i nt Tur npi ke ( i nt X [ ] , Di st Set D , unsi gned i nt N)
{
X[ 1] = 0;
X[ N] = Del et eMax( D) ;
X[ N - 1] = Del et eMax( D) ;
i f ( X[ N] - X[ N - 1] D )
{
Remove( X[ N] - X[ N - 1] , D ) ;
r et ur n Pl ace( X, D, N, 2, N - 2) ;
}
el se
r et ur n Fal se;
}
i nt Pl ace( i nt X[ ] , Di st Set D , unsi gned i nt n,
i nt Lef t , i nt Ri ght )
{
i nt DMax, Found = Fal se;
i f ( D i s empt y) t hen
r et ur n Tr ue;
DMax = Fi ndMax( D ) ;
/ * Check i f set t i ng X[ Ri ght ] = DMax i s f easi bl e. */
i f ( | X[ j ] - DMax| D
f or al l 1 j < Lef t and Ri ght < j N )
{
X[ Ri ght ] = DMax; / * Tr y X[ Ri ght ] = DMax */
f or ( 1 j < Lef t , Ri ght < j N )
Del et e( | X[ j ] - DMax| , D ) ;
Found = Pl ace( X, D, N, Lef t , Ri ght - 1 ) ;
i f ( ! Found ) / * Backt r ack */
f or ( 1 j < Lef t , Ri ght < j N)
I nser t ( | X[ j ] - DMax| , D) ; / *Undo del et i on*/
}
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit5: Algorithm Design Techniques
Page9 of 12
/ * I f f i r st at t empt f ai l ed, t r y t o see i f set t i ng */
/ * X[ Lef t ] =X[ N] - DMax i s f easi bl e */
i f ( ! Found && ( | X[ N] - DMax- X[ j ] | D
f or al l 1 j < Lef t and Ri ght < j N) )
{
X[ Lef t ] = X [ N] - DMax; / * Same l ogi c as bef or e */
f or ( 1 j < Lef t , Ri ght < j N )
Del et e( | X[ N] - Dmax X[ j ] | , D ) ;
Found = Pl ace( X, D, N, Lef t + 1, Ri ght ) ;
i f ( ! Found ) / * Backt r ack */
f or ( 1 j < Lef t , Ri ght < j N)
I nser t ( | X[ N] - Dmax X[ j ] | , D) ; / *Undo*/
}
r et ur n Found;
}
If there is no backtracking, the runtime is O(N
2
log N). If backtracking happens repeatedly,
then the performance of the algorithm is affected repeatedly.
Games
It is the strategy that a computer might use to play a strategic game, such as checkers or
chess. As an example, a simpler game of tic-tac-toe is considered.
Tic-tac-toe is, of course, a draw if both sides play optimally. By performing a careful case-
by-case analysis, it is not a difficult matter to construct an algorithm that never loses and
always wins when presented the opportunity. This can be done, because certain positions are
knowntraps and can be handled by a lookup table. Other strategies, such as taking the center
square when it is available, make the analysis simpler. If this is done, then by using a table, a
move is chosen based only on the current position. Of course, this strategy requires the
programmer, and not the computer, to do most of the thinking.
Minimax Strategy
The general strategy is to use an evaluation function to quantify the "goodness" of a position.
A position that is a win for a computer might get the value of +1; a draw could get 0; and a
position that the computer has lost would get a - 1. A position for which this assignment can
be determined by examining the board is known as a terminal position.
If a position is not terminal, the value of the position is determined by recursively assuming
optimal play by both sides. This is known as a minimax strategy, because one player (the
human) is trying to minimize the value of the position, while the other player (the computer)
is trying to maximize it.
A successor position of P is any position P
s
that is reachable from P by playing one move. If
the computer is to move when in some position P, it recursively evaluates the value of all the
successor positions. The computer chooses the move with the largest value; this is the value
of P. To evaluate any successor position P
s
, all of P
s
's successors are recursively evaluated,
and the smallest value is chosen. This smallest value represents the most favorable reply for
the human player.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit5: Algorithm Design Techniques
Page10 of 12
voi d Fi ndCompMove( Boar dt ype Boar d, i nt *Best Move, i nt *Val ue)
{
i nt Dc, i , Response; / * Dc i s don' t car e */
i f ( Ful l Boar d( Boar d ) )
*Val ue = Dr aw;
el se
i f ( I mmedi at eCompWi n( Boar d, best move ) )
*Val ue = CompWi n;
el se
{
*Val ue = CompLoss;
f or ( i =1; i <=9; i ++ ) / * t r y each squar e */
{
i f ( I sEmpt y( Boar d, i ) )
{
Pl ace( Boar d, i , Comp ) ;
Fi ndHumanMove( Boar d, &Dc, &Response ) ;
Unpl ace( Boar d, i ) ; / * Rest or e Boar d */
i f ( Response > *Val ue ) / * Updat e best */
{
*Val ue = Response;
*Best Move = i ;
}
}
}
}
}
The routine for Fi ndHumanMove is identical to Fi ndCompMove but with respect to
minimization objective.
For more complex games, such as checkers and chess, it is obviously infeasible to search all
the way to the terminal nodes. In such case, the search is stopped after a certain depth of
recursion is reached. The nodes where the recursion is stopped become terminal nodes. These
terminal nodes are evaluated with a function that estimates the value of the position. For
instance, in a chess program, the evaluation function measures such variables as the relative
amount and strength of pieces and positional factors. The evaluation function is crucial for
success, because the computer's move selection is based on maximizing this function. The
best computer chess programs have sophisticated evaluation functions.
For computer chess, the single most important factor seems to be number of moves of look-
ahead the program is capable of. This is sometimes known as ply; it is equal to the depth of
the recursion.
The basic method to increase the look-ahead factor in game programs is to come up with
methods that evaluate fewer nodes without losing any information. One method is to use a
table to keep track of all positions that have been evaluated. If the values of the positions are
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit5: Algorithm Design Techniques
Page11 of 12
saved, the second occurrence of a position need not be recomputed; it essentially becomes a
terminal position. The data structure that records this is known as atransposition table; it is
almost always implemented by hashing. In many cases, this can save considerable
computation
Pruning
The above figure shows the trace of the recursive calls used to evaluate some hypothetical
position in a hypothetical game. This is commonly referred to as agame tree. Thevalue of
the game tree is 44.
The above figure shows the evaluation of the same game tree, with several unevaluated
nodes. Almost half of the terminals nodes have not been checked, i.e., these possibilities are
not evaluated, since there is no need for it. This type of elimination to save computation time
is known as pruning.
S.K.Vijai Anand cseannauniv.blogspot.com
CS2068Unit5: Algorithm Design Techniques
Page12 of 12
Consider node D in the above figure. At this point, it is still in FindHumanMove and is
contemplating a call toFindCompMove onD. However, FindHumanMove will return at most
40, since it is a min node. On the other hand, its max node parent has already found a
sequence that guarantees 44. Nothing that D does can possibly increase this value. Therefore,
D need not be evaluated. This pruning of the tree is known as pruning. To implement
pruning, GetCompMove passes its tentativeMaximum( ) to GetHumanMove. If the tentative
minimum of GetHumanMove falls below this value, then GetHumanMove returns
immediately.
A similar thing happens at nodes A and C. In this case, it is inFindCompMove and about to
make a call to FindHumanMove to evaluate C as shown in the figure. However, the
FindHumanMove, at themin level, which has calledFindCompMove, has already determined
that it can force a value of at most 44. SinceFindCompMove has a tentative maximum of 68,
nothing that C does will affect the result at the min level. Therefore, C need not be evaluated.
This type of pruning is known as pruning.
When both techniques are combined, it is known as pruning.
S.K.Vijai Anand cseannauniv.blogspot.com