Lec 01 Why Ds
Lec 01 Why Ds
IITB India
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 1
What is data?
Example 1.1
Age of people, height of trees, price of stocks, and number of likes.
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 2
Data is big!
We are living in the age of big data!
Exercise 1.1
1. Estimate the number of messages exchanged for status level in Whatsapp.
2. How much text data was used to train ChatGPT?
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 3
We need to work on data
Example 1.2
1. Predict the weather
2. Find a webpage
3. Recognize fingerprint
Exercise 1.2
How much time do we need to find an element in an array?
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 4
Problems
Example 1.3
The problem of search consists of the following specifications
▶ Input specification: an array S of elements and an element e
▶ Output specification: position of e in S if exists. If not found, return -1.
Output specifications refer to the
variables in the input specifications
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 5
Algorithms
Exercise 1.3
1. What is an algorithm?
2. How is it different from a program?
Commentary: An algorithm is a step-by-step process that processes a small amount of data in each step and eventually computes the output. The formal definition of the
algorithm will be presented to you in CS310. It took the genius of Alan Turing to give the precise definition of an algorithm.
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 6
Example: an algorithm for search
Example 1.4
int search ( int * S , int n , int e ) {
// n is the length of the array S
// We are looking for element e in S
for ( int i =0; i < n ; i ++ ) {
if ( S [ i ] == e ) {
return i ;
}
}
return -1; // Not found
}
Exercise 1.4
How much time will it take to run the above algorithm if e is not in S?
Commentary: Answer: We count memory accesses, arithmetic operations (including comparisons), assignments, and jumps. The loop in the program will iterate n times. In
each iteration, there will be one memory access S[i] , three arithmetic operations i<n , S[i] == e and i++ , and two jumps. A the initialization, there is an assignment
i=0 . For the loop exit, there will be one more comparison and jump. Time = nTRead + (3n + 2)TArith + (2n + 1)Tjump + Treturn
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 7
Data needs structure
Storing data as a pile of stuff, will not work. We need structure.
Example 1.5
Store files in the order of the year. How do we store data at IIT Bombay Hospital?
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 8
Structured data helps us solve problems faster
We can exploit the structure to design efficient algorithms to solve our problems.
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 9
Example: search on well-structured data
Example 1.6
Let us consider the problem of search consisting of the following specifications
▶ Input specification: a non-decreasing array S and an element e
▶ Output specification: Position of e in S. If not found, return −1.
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 10
Example: search on well-structured data
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 11
A better search
Example 1.7 Commentary: Answer: There will be k iterations. In
each iteration, the function will follow the same path.
int BinarySearch ( int * S , int n , int e ){ In each iteration, there will be
▶ a memory access S[mid] ,(why only one)
// S is a sorted array ▶ five arithmetic operations first < last ,
int first = 0 , last = n ; S[mid] == e , S[i] > e , first+last , and
../2 ,
int mid = ( first + last ) / 2; ▶ one assignment last = mid ,(why?)
while ( first < last ) { ▶ three jumps because of two ifs and a loop
exit,
if ( S [ mid ] == e ) return mid ; For loop exit, there will be one additional comparison
and a jump at the loop head. In the initialization
if ( S [ mid ] > e ) { section, we have two assignments and two arithmetic
operations.
last = mid ; Time = kTRead + (6k + 5)TArith + (3k + 1)Tjump +
} else { Treturn
first = mid + 1;
}
Exercise 1.5
Let n = 2k−1 . How much time will it take
mid = ( first + last ) / 2;
to run the above algorithm if S[0] > e?
}
return -1;
}b n a
c CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 12
Topic 1.1
Big-O notation
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 13
How much resource does an algorithm need?
Commentary: Sometimes there is a trade-off between time and space. For example, inefficient linear search only needed one extra integer, but binary search needed three
extra integers. The difference of two integers may be a very minor issue, but it illustrates the trade-off.
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 14
Input size
We define the rough size of the input, usually in terms of important parameters of input.
Example 1.8
In the problem of search, we say that the number of elements in the array is the input size.
Commentary: Ideally, the number of bits in the binary representation of input is the size, which is too detailed and cumbersome to handle. In the case of search, we assume
that elements are drawn from the space of size 232 and can be represented using 32 bits. Therefore, the type of the element was int .
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 15
Best/Average/Worst case
For a given size of inputs, we may further make the following distinction.
1. Best case: Shortest running time for some input.
2. Worst case: Worst running time for some input.
3. Average case: Average running time on all the inputs of the given size.
Exercise 1.6
How can we modify almost any algorithm to have a good best-case running time?
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 16
Example: Best/Average/Worst case
Example 1.9
int BinarySearch ( int * S , int n , int e ){
// S is a sorted array
int first = 0 , last = n ;
int mid = ( first + last ) / 2; In BinarySearch, let n = 2k−1 .
while ( first < last ) { 1. Best case: e == S[n/2]
if ( S [ mid ] == e ) return mid ; TRead + 6TArith + Treturn ,
if ( S [ mid ] > e ) { 2. Worst case:e ∈ /S
last = mid ; we have seen the worst case.
} else { 3. Average case: ≈ Worst case
first = mid + 1; Most often loop will iterate k
} times.(why?)
mid = ( first + last ) / 2; Commentary: Analyzing the average case is hard. We
will mostly focus on worst-case analysis. For some
} important algorithms, we will do an average time anal-
ysis.
return -1;
}
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 17
Asymptotic behavior
For short inputs, an algorithm may use a shortcut for better running time.
To avoid such false comparisons, we look at the behavior of the algorithm in limit.
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 18
Big-O notation: approximate measure
Definition 1.1
Let f and g be functions N → N. We say f (n) ∈ O(g (n)) if there are c and n0 such that
Exercise 1.7
Which of the following are the true statements?
▶ 5n + 8 ∈ O(n) ▶ n2 + n ∈ O(n2 )
▶ 5n + 8 ∈ O(n2 ) ▶ 500000000000000000000000n2 ∈ O(n2 )
▶ 5n2 + 8 ∈ O(n) ▶ 50n2 logn + 60n2 ∈ O(n2 logn)
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 19
Example: Big-O of the worst case of BinarySearch
Example 1.10
Exercise 1.8
Prove that f ∈ O(g ) and g ∈ O(h), then f ∈ O(h).
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 20
What does Big-O says?
Expresses the approximate number of operations executed by the program as a function of input
size
Hierarchy of algorithms
▶ O(log n) algorithm is better than O(n)
▶ We say O(log n) < O(n) < O(n2 ) < O(2n )
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 21
Complexity of a problem
The complexity of a problem is the complexity of the best-known algorithm for the problem.
Exercise 1.9
What is the complexity of the following problem?
▶ sorting an array O(n2 ) ✗
Best algorithm is
▶ matrix multiplication O(n3 ) ✗
still not known
Exercise 1.10
What is the best-known complexity for the above problems?
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 22
Θ-Notation
There are more variations of the above definition. Please look at the end.
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 23
Names of complexity classes
▶ Constant: O(1)
▶ Logarithmic: O(logn)
▶ Linear: O(n)
▶ Quadratic: O(n2 )
▶ Polynomial : O(nk ) for some given k
▶ Exponential : O(2n )
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 24
Topic 1.2
Problem
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 25
Problem: Compute the exact running time of insertion sort.
Exercise 1.11
The following is the code for insertion sort. Compute the exact worst-case running time of the
code in terms of n and the cost of doing various machine operations.
for ( int j = 1; j < n ; j ++ ) {
int key = A [ j ];
int i = j -1;
while ( i >= 0 ) {
if ( A [ i ] > key ) {
A [ i +1] = A [ i ];
} else {
break ;
}
i - -;
}
A [ i +1] = key ;
cbna
} CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 26
Problem: additions and multiplication
Exercise 1.12
What is the time complexity of binary addition and multiplication? How much time does it take to
do unary addition?
Commentary: Solution: Assume two numbers A and B. In binary representation, their lengths (number of bits) are m and n. Then the time complexity of binary addition
would be O(m + n). This is because we can start from the right end and add (keeping carry in mind) from right to left. Each bit requires an O(1) computation since there are
only 8 combinations (2 each for bit 1, bit 2, and carry). Since the length of a number N in bits is log N, the time complexity is O(log A + log B) = O(log(AB)) = O(m + n).
Similarly, we can analyze long multiplication. The time complexity of multiplication is O(log A × log B) = O(mn). There are better algorithms than long multiplication
that have better time complexity. For example, Karatsuba’s algorithm. Unary addition is the concatenation of inputs. To produce the output the algorithm needs to output
concatenated string, therefore O(A + B).
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 27
Problem: hierarchy of complexity
Exercise 1.13
Given f (n) = a0 n0 + ... + ad nd and g (n) = b0 n0 + ... + be ne with d > e and ad > 0(why?), show
that f (n) ∈
/ O(g (n)).
Commentary: Solution: Let us begin by assuming the proposition is False, ergo, f (n) ∈ O(g (n)). By definition, then, there exists a constants c and n0 such that
∀n ≥ n0 , f (n) ≤ cg (n). Hence, we have
0 d 0 e
∀n ≥ n0 , a0 n + . . . + ad n ≤ cb0 n + . . . + be n
e
i i+1 d
X
∀n ≥ n0 , (ai − cbi )n + ai+1 n + . . . + ad n ≤ 0
i=0
By definition of limit
e
i i+1 d
X
lim (ai − cbi )n + ai+1 n + . . . + ad n ≤ 0 =⇒ ad ≤ 0
n→∞
i=0
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 28
Order of functions
Exercise 1.14
f (n) F (n)
▶ If f (n) ≤ F (n) and G (n) ≥ g (n) (in order sense) then show that ≤ .
G (n) g (n)
▶ Is f (n) the same order as f (n)|sin(n)|?
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 29
Topic 1.3
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 30
Ω notation
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 31
Small-o,ω notation
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 32
Size of functions
We can define the order over functions using the above notations
▶ f (n) ∈ O(g (n)) implies f (n) ≤ g (n)
▶ f (n) ∈ o(g (n)) implies f (n) < g (n)
▶ f (n) ∈ Ω(g (n)) implies f (n) ≥ g (n)
▶ f (n) ∈ ω(g (n)) implies f (n) > g (n)
▶ f (n) ∈ Θ(g (n)) implies f (n) = g (n)
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 33
End of Lecture 1
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 34