unit 1 new
unit 1 new
Data Structures are the programmatic way of storing data so that data can be used efficiently.
Almost every enterprise application uses various types of data structures in one or the other
way. Understanding on Data Structures needed to understand the complexity of enterprise
level applications and need of algorithms, and data structures.
Why to Learn Data Structure and Algorithms?
As applications are getting complex and data rich, there are three common problems that
applications face now-a-days.
Data Search − Consider an inventory of 1 million(10 6) items of a store. If the application
is to search an item, it has to search an item in 1 million(10 6) items every time slowing
down the search. As data grows, search will become slower.
Processor speed − Processor speed although being very high, falls limited if the data
grows to billion records.
Multiple requests − As thousands of users can search data simultaneously on a web
server, even the fast server fails while searching the data.
To solve the above-mentioned problems, data structures come to rescue. Data can be
organized in a data structure in such a way that all items may not be required to be searched,
and the required data can be searched almost instantly.
Applications of Data Structure and Algorithms
Algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a
certain order to get the desired output. Algorithms are generally created independent of
underlying languages, i.e. an algorithm can be implemented in more than one programming
language.
From the data structure point of view, following are some important categories of algorithms −
Search − Algorithm to search an item in a data structure.
Sort − Algorithm to sort items in a certain order.
Insert − Algorithm to insert item in a data structure.
Update − Algorithm to update an existing item in a data structure.
Delete − Algorithm to delete an existing item from a data structure.
The following computer problems can be solved using Data Structures −
Example
Let's try to learn algorithm-writing by using an example.
Problem − Design an algorithm to add two numbers and display the result.
Step 1 − START
Step 2 − declare three integers a, b & c
Step 3 − define values of a & b
Step 4 − add values of a & b
Step 5 − store output of step 4 to c
Step 6 − print c
Step 7 − STOP
Algorithms tell the programmers how to code the program. Alternatively, the algorithm can be
written as −
Step 1 − START ADD
Step 2 − get values of a & b
Step 3 − c ← a + b
Step 4 − display c
Step 5 − STOP
In design and analysis of algorithms, usually the second method is used to describe an
algorithm. It makes it easy for the analyst to analyze the algorithm ignoring all unwanted
definitions. He can observe what operations are being used and how the process is flowing.
Writing step numbers, is optional.
We design an algorithm to get a solution of a given problem. A problem can be solved in more
than one ways.
Hence, many solution algorithms can be derived for a given problem. The next step is to
analyze those proposed solution algorithms and implement the best suitable solution.
Algorithm Analysis
Efficiency of an algorithm can be analyzed at two different stages, before implementation and
after implementation. They are the following −
A Priori Analysis − This is a theoretical analysis of an algorithm. Efficiency of an
algorithm is measured by assuming that all other factors, for example, processor speed,
are constant and have no effect on the implementation.
A Posterior Analysis − This is an empirical analysis of an algorithm. The selected
algorithm is implemented using programming language. This is then executed on target
computer machine. In this analysis, actual statistics like running time and space
required, are collected.
We shall learn about a priori algorithm analysis. Algorithm analysis deals with the execution or
running time of various operations involved. The running time of an operation can be defined
as the number of computer instructions executed per operation.
Algorithm Complexity
Suppose X is an algorithm and n is the size of input data, the time and space used by the
algorithm X are the two main factors, which decide the efficiency of X.
Time Factor − Time is measured by counting the number of key operations such as
comparisons in the sorting algorithm.
Space Factor − Space is measured by counting the maximum memory space required
by the algorithm.
The complexity of an algorithm f(n) gives the running time and/or the storage space required
by the algorithm in terms of n as the size of input data.
Space Complexity
Space complexity of an algorithm represents the amount of memory space required by the
algorithm in its life cycle. The space required by an algorithm is equal to the sum of the
following two components −
A fixed part that is a space required to store certain data and variables, that are
independent of the size of the problem. For example, simple variables and constants
used, program size, etc.
A variable part is a space required by variables, whose size depends on the size of the
problem. For example, dynamic memory allocation, recursion stack space, etc.
Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed part and
S(I) is the variable part of the algorithm, which depends on instance characteristic I. Following
is a simple example that tries to explain the concept −
Algorithm: SUM(A, B)
Step 1 - START
Step 2 - C ← A + B + 10
Step 3 - Stop
Here we have three variables A, B, and C and one constant. Hence S(P) = 1 + 3. Now, space
depends on data types of given variables and constant types and it will be multiplied
accordingly.
Time Complexity
Time complexity of an algorithm represents the amount of time required by the algorithm to run
to completion. Time requirements can be defined as a numerical function T(n), where T(n) can
be measured as the number of steps, provided each step consumes constant time.
time is T(n) = c ∗ n, where c is the time taken for the addition of two bits. Here, we observe that
For example, addition of two n-bit integers takes n steps. Consequently, the total computational
ASYMPTOTIC ANALYSIS
Ο Notation
Ω Notation
θ Notation
Big Oh Notation, Ο
The notation Ο(n) is the formal way to express the upper bound of an algorithm's running time.
It measures the worst case time complexity or the longest amount of time an algorithm can
possibly take to complete.
Omega Notation, Ω
The notation Ω(n) is the formal way to express the lower bound of an algorithm's running time.
It measures the best case time complexity or the best amount of time an algorithm can possibly
take to complete.
For example, for a function f(n)
Ω(f(n)) ≥ { g(n) : there exists c > 0 and n0 such that g(n) ≤ c.f(n) for
all n > n0. }
Theta Notation, θ
The notation θ(n) is the formal way to express both the lower bound and the upper bound of an
algorithm's running time. It is represented as follows −
θ(f(n)) = { g(n) if and only if g(n) = Ο(f(n)) and g(n) = Ω(f(n)) for
all n > n0. }
constant − Ο(1)
logarithmic − Ο(log n)
linear − Ο(n)
n log n − Ο(n log n)
quadratic − Ο(n2)
cubic − Ο(n3)
polynomial − nΟ(1)
exponential − 2Ο(n)
Asymptotic Notations
Whenever we want to perform analysis of an algorithm, we need to calculate the complexity of that algorithm.
But when we calculate the complexity of an algorithm it does not provide the exact amount of resource
required. So instead of taking the exact amount of resource, we represent that complexity in a general form
(Notation) which produces the basic nature of that algorithm. We use that general form (Notation) for analysis
process.
Note - In asymptotic notation, when we want to represent the complexity of an algorithm, we use only the
most significant terms in the complexity of that algorithm and ignore least significant terms in the complexity
of that algorithm (Here complexity can be Space Complexity or Time Complexity).
Algorithm 1 : 5n2 + 2n + 1
Algorithm 2 : 10n2 + 8n + 3
Generally, when we analyze an algorithm, we consider the time complexity for larger values of input data
(i.e. 'n' value). In above two time complexities, for larger value of 'n' the term '2n + 1' in algorithm 1 has least
significance than the term '5n2', and the term '8n + 3' in algorithm 2 has least significance than the term '10n2'.
Here, for larger value of 'n' the value of most significant terms ( 5n2 and 10n2 ) is very larger than the value of
least significant terms ( 2n + 1 and 8n + 3 ). So for larger value of 'n' we ignore the least significant terms to
represent overall time required by an algorithm. In asymptotic notation, we use only the most significant terms
to represent the time complexity of an algorithm.
Majorly, we use THREE types of Asymptotic Notations and those are as follows...
1. Big - Oh (O)
2. Big - Omega (Ω)
Big - Oh notation is used to define the upper bound of an algorithm in terms of Time Complexity.
That means Big - Oh notation always indicates the maximum time required by an algorithm for all input
values. That means Big - Oh notation describes the worst case of an algorithm time complexity.
Big - Oh Notation can be defined as follows...
Consider function f(n) as time complexity of an algorithm and g(n) is the most significant term. If f(n) <= C
g(n) for all n >= n0, C > 0 and n0 >= 1. Then we can represent f(n) as O(g(n)).
f(n) = O(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis and time
required is on Y-Axis
In above graph after a particular input value n0, always C g(n) is greater than f(n) which indicates the
algorithm's upper bound.
Example
⇒3n + 2 <= C n
f(n) <= C g(n)
Big - Omega notation is used to define the lower bound of an algorithm in terms of Time Complexity.
That means Big-Omega notation always indicates the minimum time required by an algorithm for all input
values. That means Big-Omega notation describes the best case of an algorithm time complexity.
Big - Omega Notation can be defined as follows...
Consider function f(n) as time complexity of an algorithm and g(n) is the most significant term. If f(n) >= C
g(n) for all n >= n0, C > 0 and n0 >= 1. Then we can represent f(n) as Ω(g(n)).
f(n) = Ω(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis and time
required is on Y-Axis
In above graph after a particular input value n0, always C g(n) is less than f(n) which indicates the algorithm's
lower bound.
Example
⇒3n + 2 >= C n
f(n) >= C g(n)
Big - Theta notation is used to define the average bound of an algorithm in terms of Time Complexity.
That means Big - Theta notation always indicates the average time required by an algorithm for all input
values. That means Big - Theta notation describes the average case of an algorithm time complexity.
Big - Theta Notation can be defined as follows...
Consider function f(n) as time complexity of an algorithm and g(n) is the most significant term. If C 1 g(n) <=
f(n) <= C2 g(n) for all n >= n0, C1 > 0, C2 > 0 and n0 >= 1. Then we can represent f(n) as Θ(g(n)).
f(n) = Θ(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis and time
required is on Y-Axis
In above graph after a particular input value n0, always C1 g(n) is less than f(n) and C2 g(n) is greater than f(n)
which indicates the algorithm's average bound.
Example
If f(n) = n3 ⇒ O(n3)
Example:
Similarly,
f(n) = Ω(f(n))
f(n) = Θ(f(n))
2. Symmetry:
f(n) = Θ(g(n)) if and only if g(n) = Θ(f(n))
Example:
If f(n) = n2 and g(n) = n2 then f(n) = Θ(n2) and g(n) = Θ(n2)
Proof:
1.
f(n) = Θ(g(n)) ⇒ g(n) = Θ(f(n))
Necessary part:
Example:
⇒ f(n) ≤ c1.g(n)
By the definition of Big-Oh(O), there exists positive constants c, no such that f(n) ≤ c.g(n) for all n ≥ no
⇒ g(n) ≤ c2.h(n)
⇒ f(n) ≤ c1.c2h(n)
⇒ f(n) ≤ c.h(n), where, c = c1.c2 By the definition, f(n) = O(h(n))
An algorithm is designed to achieve optimum solution for a given problem. In greedy algorithm
approach, decisions are made from the given solution domain. As being greedy, the closest
solution that seems to provide an optimum solution is chosen.
Greedy algorithms try to find a localized optimum solution, which may eventually lead to
globally optimized solutions. However, generally greedy algorithms do not provide globally
optimized solutions.
Counting Coins
This problem is to count to a desired value by choosing the least possible coins and the greedy
approach forces the algorithm to pick the largest possible coin. If we are provided coins of ₹ 1,
2, 5 and 10 and we are asked to count ₹ 18 then the greedy procedure will be −
1 − Select one ₹ 10 coin, the remaining count is 8
2 − Then select one ₹ 5 coin, the remaining count is 3
3 − Then select one ₹ 2 coin, the remaining count is 1
4 − And finally, the selection of one ₹ 1 coins solves the problem
Though, it seems to be working fine, for this count we need to pick only 4 coins. But if we
slightly change the problem then the same approach may not be able to produce the same
optimum result.
For the currency system, where we have coins of 1, 7, 10 value, counting coins for value 18 will
be absolutely optimum but for count like 15, it may use more coins than necessary. For
example, the greedy approach will use 10 + 1 + 1 + 1 + 1 + 1, total 6 coins. Whereas the same
problem could be solved by using only 3 coins (7 + 7 + 1)
Hence, we may conclude that the greedy approach picks an immediate optimized solution and
may fail where global optimization is a major concern.
Examples
Most networking algorithms use the greedy approach. Here is a list of few of them −
Merge Sort
Quick Sort
Binary Search
Strassen's Matrix Multiplication
Closest pair (points)
There are various ways available to solve any computer problem, but the mentioned are a
good example of divide and conquer approach.
DYNAMIC PROGRAMMING
Dynamic programming approach is similar to divide and conquer in breaking down the problem
into smaller and yet smaller possible sub-problems. But unlike, divide and conquer, these sub-
problems are not solved independently. Rather, results of these smaller sub-problems are
remembered and used for similar or overlapping sub-problems.
Dynamic programming is used where we have problems, which can be divided into similar sub-
problems, so that their results can be re-used. Mostly, these algorithms are used for
optimization. Before solving the in-hand sub-problem, dynamic algorithm will try to examine the
results of the previously solved sub-problems. The solutions of sub-problems are combined in
order to achieve the best solution.
So we can say that −
The problem should be able to be divided into smaller overlapping sub-problem.
An optimum solution can be achieved by using an optimum solution of smaller sub-
problems.
Dynamic algorithms use Memoization.
Comparison
In contrast to greedy algorithms, where local optimization is addressed, dynamic algorithms are
motivated for an overall optimization of the problem.
In contrast to divide and conquer algorithms, where solutions are combined to achieve an
overall solution, dynamic algorithms use the output of a smaller sub-problem and then try to
optimize a bigger sub-problem. Dynamic algorithms use Memoization to remember the output
of already solved sub-problems.
Example
The following computer problems can be solved using dynamic programming approach −
Linked lists are among the simplest and most common data structures. The principal benefits of a linked list over a conventional array
are:
The list elements can easily be inserted or removed without reallocation or reorganization of the entire structure because the
data items need not be stored contiguously in memory. Simultaneously, an array has to be declared in the source code
before compiling and running the program.
Linked lists allow the insertion and removal of nodes at any point in the list. They can do so with a constant number of
operations if the link to the previous node is maintained during the list traversal.
On the other hand, simple linked lists by themselves do not allow random access to the data or any form of efficient indexing. Thus,
many basic operations – such as obtaining the last node of the list, finding a node containing a given data, or locating the place
where a new node should be inserted – may require sequential scanning of most or all of the list elements.
An XOR-linking technique allows a doubly-linked list to be implemented using a single link field in each node.
Dynamic data structures such as stacks and queues can be implemented using a linked list and several other common
abstract data types, including lists and associative arrays.
Many modern operating systems use doubly linked lists to maintain references to active processes, threads, and other
dynamic objects.
A hash table may use linked lists to store the chains of items that hash to the same position in the hash table.
A binary tree can be seen as a type of linked list where the elements are themselves linked lists of the same nature. The
result is that each node may include a reference to the first node of one or two other linked lists, which, together with their
contents, form the subtrees below that node.
Note: Unless explicitly mentioned, a singly linked list will be referred to by a list or linked list throughout our website.
Linked List
A linked list is a sequence of data structures, which are connected together via links.
Linked List is a sequence of links which contains items. Each link contains a connection to another link. Linked list is
the second most-used data structure after array. Following are the important terms to understand the concept of Linked
List.
Link − Each link of a linked list can store a data called an element.
Next − Each link of a linked list contains a link to the next link called Next.
LinkedList − A Linked List contains the connection link to the first link called First.
Linked List Representation
Linked list can be visualized as a chain of nodes, where every node points to the next node.
As per the above illustration, following are the important points to be considered.
Basic Operations
Following are the basic operations supported by a list.
Insertion Operation
Adding a new node in linked list is a more than one step activity. We shall learn this with diagrams here. First, create a node using the same structure and find
the location where it has to be inserted.
Imagine that we are inserting a node B (NewNode), between A (LeftNode) and C (RightNode). Then point B.next to C −
NewNode.next −> RightNode;
This will put the new node in the middle of the two. The new list should look like this −
Similar steps should be taken if the node is being inserted at the beginning of the list. While inserting it at the end, the second last node of the list should point
to the new node and the new node will point to NULL.
Deletion Operation
Deletion is also a more than one step process. We shall learn with pictorial representation. First, locate the target node to be removed, by using searching
algorithms.
The left (previous) node of the target node now should point to the next node of the target node −
LeftNode.next −> TargetNode.next;
This will remove the link that was pointing to the target node. Now, using the following code, we will remove what the target node is pointing at.
TargetNode.next −> NULL;
We need to use the deleted node. We can keep that in memory otherwise we can simply deallocate memory and wipe off the target node completely.
Reverse Operation
This operation is a thorough one. We need to make the last node to be pointed by the head node and reverse the whole linked list.
First, we traverse to the end of the list. It should be pointing to NULL. Now, we shall make it point to its previous node −
We have to make sure that the last node is not the last node. So we'll have some temp node, which looks like the head node pointing to the last node. Now,
we shall make all left side nodes point to their previous nodes one by one.
Except the node (first node) pointed by the head node, all nodes should point to their predecessor, making them their new successor. The first node will point
to NULL.
We'll make the head node point to the new first node by using the temp node.
The following are some of the differences between Arrays and Linked Lists:
Size of linked lists is not fixed, they can expand and shrink during run time.
Insertion and Deletion Operations are fast and easier in Linked Lists.
Memory allocation is done during run-time (no need to allocate any fixed memory).
Data Structures like Stacks, Queues, and trees can be easily implemented using Linked list.
Memory consumption is more in Linked Lists when compared to arrays. Because each node contains a pointer in linked list and it requires extra
memory.
Elements cannot be accessed at random in linked lists.
Traversing from reverse is not possible in singly linked lists.