0% found this document useful (0 votes)
157 views

Midtermlecture

1. Asymptotic complexity analysis reveals fundamental mathematical truths about algorithms that are independent of hardware specifics or execution times. 2. Big O notation describes how fast an algorithm grows as the input size increases. Common time complexities include O(1) for constant time, O(n) for linear time, and O(n^2) for quadratic time. 3. Examples show that a simple for loop takes linear O(n) time, while nested for loops take quadratic O(n^2) time, as the number of operations increases proportionally with the square of the input size.

Uploaded by

Robinson Bars
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
157 views

Midtermlecture

1. Asymptotic complexity analysis reveals fundamental mathematical truths about algorithms that are independent of hardware specifics or execution times. 2. Big O notation describes how fast an algorithm grows as the input size increases. Common time complexities include O(1) for constant time, O(n) for linear time, and O(n^2) for quadratic time. 3. Examples show that a simple for loop takes linear O(n) time, while nested for loops take quadratic O(n^2) time, as the number of operations increases proportionally with the square of the input size.

Uploaded by

Robinson Bars
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Asymptotic Complexity

Asymptotic complexity is the key to comparing algorithms. Comparing absolute times is not
particularly meaningful, because they are specific to particular hardware. Asymptotic complexity
reveals deeper mathematical truths about algorithms that are independent of hardware.

Asymptotic notation

is a set of languages which allow us to express the performance of our algorithms in


relation to their input. Big O notation is used in Computer Science to describe the
performance or complexity of an algorithm. Big O specifically describes the worst-case
scenario, and can be used to describe the execution time required or the space used (e.g.
in memory or on disk) by an algorithm.

Big O complexity can be visualized with this graph:


As a programmer first and a mathematician second (or maybe third or last) here the best
way to understand Big O thoroughly examples in code. So, below are some common
orders of growth along with descriptions and examples where possible.

1. O(1)
void printFirstElementOfArray(int arr[])
{
printf("First element of array = %d",arr[0]);
}

This function runs in O(1) time (or "constant time") relative to its input. The input array
could be 1 item or 1,000 items, but this function would still just require one step.

2. O(n)
void printAllElementOfArray(int arr[], int size)
{
for (int i = 0; i < size; i++)
{
printf("%d\n", arr[i]);
}
}

This function runs in O(n) time (or "linear time"), where n is the number of items in the
array. If the array has 10 items, we have to print 10 times. If it has 1000 items, we have to
print 1000 times.

3. O(n2)
void printAllPossibleOrderedPairs(int arr[], int size)
{
for (int i = 0; i < size; i++)
{
for (int j = 0; j < size; j++)
{
printf("%d = %d\n", arr[i], arr[j]);
}
}
}

Here we're nesting two loops. If our array has n items, our outer loop runs n times and
our inner loop runs n times for each iteration of the outer loop, giving us n2 total prints.
Thus this function runs in O(n2) time (or "quadratic time"). If the array has 10 items, we
have to print 100 times. If it has 1000 items, we have to print 1000000 times.

4. O(2n)
int fibonacci(int num)
{
if (num <= 1) return num;
return fibonacci(num - 2) + fibonacci(num - 1);
}

An example of an O(2n) function is the recursive calculation of Fibonacci numbers. O(2n)


denotes an algorithm whose growth doubles with each addition to the input data set. The
growth curve of an O(2n) function is exponential - starting off very shallow, then rising
meteorically.

5. Drop the constants


When you're calculating the big O complexity of something, you just throw out the
constants. Like:
void printAllItemsTwice(int arr[], int size)
{
for (int i = 0; i < size; i++)
{
printf("%d\n", arr[i]);
}

for (int i = 0; i < size; i++)


{
printf("%d\n", arr[i]);
}
}

This is O(2n), which we just call O(n).


void printFirstItemThenFirstHalfThenSayHi100Times(int arr[], int size)
{
printf("First element of array = %d\n",arr[0]);

for (int i = 0; i < size/2; i++)


{
printf("%d\n", arr[i]);
}

for (int i = 0; i < 100; i++)


{
printf("Hi\n");
}
}

This is O(1 + n/2 + 100), which we just call O(n).

Why can we get away with this? Remember, for big O notation we're looking at what
happens as n gets arbitrarily large. As n gets really big, adding 100 or dividing by 2 has a
decreasingly significant effect.

6. Drop the less significant terms


void printAllNumbersThenAllPairSums(int arr[], int size)
{
for (int i = 0; i < size; i++)
{
printf("%d\n", arr[i]);
}

for (int i = 0; i < size; i++)


{
for (int j = 0; j < size; j++)
{
printf("%d\n", arr[i] + arr[j]);
}
}
}

Here our runtime is O(n + n2), which we just call O(n2).

Similarly:

 O(n3 + 50n2 + 10000) is O(n3)

 O((n + 30) * (n + 5)) is O(n2)

Again, we can get away with this because the less significant terms quickly become, well,
less significant as n gets big.

7. With Big-O, we're usually talking about the "worst case"


bool arrayContainsElement(int arr[], int size, int element)
{
for (int i = 0; i < size; i++)
{
if (arr[i] == element) return true;
}
return false;
}

Here we might have 100 items in our array, but the first item might be the that element,
in this case we would return in just 1 iteration of our loop.

In general we'd say this is O(n) runtime and the "worst case" part would be implied. But
to be more specific we could say this is worst case O(n) and best case O(1) runtime. For
some algorithms we can also make rigorous statements about the "average case"
runtime.

8. Other Examples

Let's take the following C example which contains a for loop, iterates from i = 0 to i <
10000 and prints each value of that i:
#include<stdio.h>
void print_values(int end)
{
for (int i = 0; i < end; i++)
{
printf("%d\n", i);
}
}
int main()
{
print_values(10000);
return 0;
}

We could put a timer at the beginning and the end of the line of code which calls this
function, this would then give us the running time of our print_values algorithm,
right?
#include<stdio.h>
#include<time.h>
void print_values(int end)
{
for (int i = 0; i < end; i++)
{
printf("%d\n", i);
}
}
int main()
{
clock_t t;
t = clock();

print_values(10000);

float diff = ((float)(clock() - t)) / CLOCKS_PER_SEC ;


printf ("\n\n diff=%f \n\n", diff);

return 0;
}

Maybe, but what if you run it again, three times, write down your results and then move
to another machine with a higher spec and run it another three times. I bet upon
comparison of the results you will get different running times!

This is where asymptotic notations are important. They provide us with a mathematical
foundation for representing the running time of our algorithms consistently.

We create this consistency by talking about operations our code has to perform.
Operations such as array lookups, print statements and variable assignments.

If we were to annotate print_values with the amount of times each line within the
function is executed for the input 10000, we would have something as follows:
void print_values(int end) //end = 10000
{
for (int i = 0; i < end; i++) //Execution count: 10000
{
printf("%d\n", i); // Execution count: 10000
}
}

If we were to change the input value of print_values function, our print statement
would be exercised more or less, depending on the value of that input.

If we were to put this into an arithmetic expression, we would get 10000+1, using
intuition we know that the 10000 is variable on the input size, if we call the input
value n, we would now have the expression n+1.

I could now argue that the worst case running time for print_values is O(n+1). n for the
loop block and 1 for the print statement.

In the grand scheme of things, the constant value 1 is pretty insignificant at the side of
the variable value n. So we simply reduce the above expression to O(n), and there we
have our Big-O running time of print_values.

As our code prints each and every value from 0 to the input , as the loop is the most
significant part of the code, we are able to say that our code is of running
time O(n) where n is the variable length of the array! Simples!

An algorithm of running time O(n) is said to be linear, which essentially means the
algorithms running time will increase linearly with its input (n).

9. Proving Big-O
We can prove, mathematically, that print_values is in-fact O(n), which brings us on to
the formal definition for Big-O:

f(n) = O(g(n)) if c and some initial value k are positive when f(n) <= c * g(n) for all n
> k is true.

We can turn this formal definition into an actual definition of our above code, which we
can then in turn prove.

We must first ask does print_values have a running time of O(n)?

If print_values <= c * n, when c = 1 then print_values does have a running


time of O(n) when n > k.
c can be any integer while k is the amount of iterations we must perform for the
expression to be true for every subsequent value of n.

As c is just 1, we can simplify our expression to print_values <= n.

N F(N) G(N) TRUE/FALSE

0 0 0 False

1 1 1 True

2 2 2 True

3 3 3 True

We can see that n must be greater than the value 0 of constant k in order to satisfy the
expression print_values <= n.

We can now say when n is 1:


1 <= 1 * 1 for 1 > 0 is true. We know this because 1 multiplied by 1 is 1 and 1 is greater
than our constant k which was 0.

The above must be true for all values of n greater than k (0), so if n was 10, 10 <= 1 * 10
for 10 > 0 is also true.

What we're basically saying here is that no matter our input (n), it must be greater than
or equal to our constant (c) when the size of our input (n) is more than another constant
value (k), in our case the iteration count of the function.

But where do our constants come from? Well they are just values, we typically start at 1
and work our way up to seek a constant which makes the expression f(n) <= c * g(n)
for all n > k true. If we cannot find such combination of constants, then our code does
not have a running time of O(n) and our hypothesis was incorrect.

10 Disproving Big-O
Lets take a new C function, which contains a for loop, iterates from i = 0 to i <
100 and an another nested for loop from j = 0 to j < 100 which prints each value of
that i and j:
void print_values_with_repeat(int end) //end = 100
{
for (int i = 0; i < end; i++)
{
for (int j = 0; j < end; j++)
{
printf("i = %d and j = %d\n", i, j);
}
}
}

If we were to annotate print_values_with_repeat with the amount of times each


line within the function is executed for the input 100, we would have something as
follows:
void print_values_with_repeat(int end) //end = 100
{
for (int i = 0; i < end; i++) //Execution count: 100
{
for (int j = 0; j < end; j++) //Execution count: 10000
{
printf("i = %d and j = %d\n", i, j); // Execution count: 1
}
}
}

Does print_values_with_repeat have a running time of O(n)?

N F(N) G(N) TRUE/FALSE

0 0 0 False
1 1 1 True

2 4 2 False

3 9 3 False

Suppose our constant c is 1, 1 <= 1 * 1 for 1 > 0, this is true - however our definition says
that g(n) must be greater than all values of f(n).

So if we take the value 2 of n, 2 <= 1 * 4 for 1 > 0, we can see that this is now false, which
disproves our hypothesis that print_values_with_repeat is O(n). Even if we
change our constant c to 2, this would still prove false eventually.

We can actually see that the order of growth in operations


in print_values_with_repeat is actually n2, so let's hypothesise now
that print_values_with_repeat is actually O(n2).

Does print_values_with_repeat have a running time of O(n2)?

N F(N) G(N) TRUE/FALSE

0 0 0 False

1 1 1 True

2 4 4 True

3 9 9 True

Suppose our constant c is still 1, our expression would now be 3 <= 3 * 32 for 3 > 0, this is
true, great! print_values_with_repeat is in-fact O(n2).

O(n2) is a quadratic time algorithm, as the running time of the algorithm increases
quadratically to the input.
Amortized complexity analysis is most commonly used with data structures that have state that persists
between operations

The worst-case time complexity for appending an element to an array of length n, using this algorithm, is Θ(n)

Amortized time

An amortized time analysis gives a much better understanding of the algorithm.

Consider a sequence of n append operations, where we start with an array of length 1. A


careful analysis shows that the total time of these operations is only Θ(n).

 There will be a total of n constant-time assignment and increment operations.


 The resizing will happen only at operation 1, 2, 4, …, 2 k, for a total of
1 + 2 + 4 + …+ 2k = 2·2k - 1 constant-time element copy operations. Since 2k ≤ n, this is
at most 2n - 1.

Best, Worst, and Average-Case Complexity

Using the RAM model of computation, we can count how many steps our algorithm will take on
any given input instance by simply executing it on the given input. However, to really understand
how good or bad an algorithm is, we must know how it works over all instances.

To understand the notions of the best, worst, and average-case complexity, one must think about
running an algorithm on all possible instances of data that can be fed to it. For the problem of
sorting, the set of possible input instances consists of all the possible arrangements of all the
possible numbers of keys. We can represent every input instance as a point on a graph, where
the x-axis is the size of the problem (for sorting, the number of items to sort) and the y-axis is the
number of steps taken by the algorithm on this instance. Here we assume, quite reasonably, that
it doesn't matter what the values of the keys are, just how many of them there are and how they
are ordered. It should not take longer to sort 1,000 English names than it does to sort 1,000
French names, for example.

1. The worst-case complexity of the algorithm is the function defined by the maximum
number of steps taken on any instance of size n. It represents the curve passing through
the highest point of each column.
2. The best-case complexity of the algorithm is the function defined by the minimum
number of steps taken on any instance of size n. It represents the curve passing through
the lowest point of each column.
3. Finally, the average-case complexity of the algorithm is the function defined by the
average number of steps taken on any instance of size n.
Analysis of Algorithms | Set 1 (Asymptotic Analysis)

Why performance analysis?

There are many important things that should be taken care of, like user friendliness, modularity,
security, maintainability, etc. Why to worry about performance?
The answer to this is simple, we can have all the above things only if we have performance. So
performance is like currency through which we can buy all the above things. Another reason for
studying performance is – speed is fun!
To summarize, performance == scale. Imagine a text editor that can load 1000 pages, but can
spell check 1 page per minute OR an image editor that takes 1 hour to rotate your image 90
degrees left OR … you get it. If a software feature can not cope with the scale of tasks users need
to perform – it is as good as dead.

Given two algorithms for a task, how do we find out which one is better?
One naive way of doing this is – implement both the algorithms and run the two programs on
your computer for different inputs and see which one takes less time. There are many problems
with this approach for analysis of algorithms.
1) It might be possible that for some inputs, first algorithm performs better than the second. And
for some inputs second performs better.
2) It might also be possible that for some inputs, first algorithm perform better on one machine
and the second works better on other machine for some other inputs.

Asymptotic Analysis is the big idea that handles above issues in analyzing algorithms. In
Asymptotic Analysis, we evaluate the performance of an algorithm in terms of input size (we
don’t measure the actual running time). We calculate, how does the time (or space) taken by an
algorithm increases with the input size.
For example, let us consider the search problem (searching a given item) in a sorted array. One
way to search is Linear Search (order of growth is linear) and other way is Binary Search (order of
growth is logarithmic). To understand how Asymptotic Analysis solves the above mentioned
problems in analyzing algorithms, let us say we run the Linear Search on a fast computer and
Binary Search on a slow computer. For small values of input array size n, the fast computer may
take less time. But, after certain value of input array size, the Binary Search will definitely start
taking less time compared to the Linear Search even though the Binary Search is being run on a
slow machine. The reason is the order of growth of Binary Search with respect to input size
logarithmic while the order of growth of Linear Search is linear. So the machine dependent
constants can always be ignored after certain values of input size.

Analysis of Algorithms | Set 2 (Worst, Average and Best Cases)


Asymptotic analysis overcomes the problems of naive way of analyzing algorithms. Take an
example of Linear Search and analyze it using Asymptotic analysis.

Three cases to analyze an algorithm:

1) Worst Case
2) Average Case
3) Best Case

Consider the following implementation of Linear Search.

// C++ implementation of the approach


#include <bits/stdc++.h>
using namespace std;

// Linearly search x in arr[].


// If x is present then return the index,
// otherwise return -1
int search(int arr[], int n, int x)
{
int i;
for (i=0; i<n; i++)
{
if (arr[i] == x)
return i;
}
return -1;
}

// Driver Code
int main()
{
int arr[] = {1, 10, 30, 15};
int x = 30;
int n = sizeof(arr)/sizeof(arr[0]);
cout << x << " is present at index "
<< search(arr, n, x);

getchar();
return 0;
}
Output:

30 is present at index 2

Worst Case Analysis (Usually Done)


In the worst case analysis, we calculate upper bound on running time of an algorithm. We must
know the case that causes maximum number of operations to be executed. For Linear Search,
the worst case happens when the element to be searched (x in the above code) is not present in
the array. When x is not present, the search() functions compares it with all the elements of arr[]
one by one. Therefore, the worst case time complexity of linear search would be Θ(n).

Average Case Analysis (Sometimes done)

In average case analysis, we take all possible inputs and calculate computing time for all of the
inputs. Sum all the calculated values and divide the sum by total number of inputs. We must
know (or predict) distribution of cases. For the linear search problem, let us assume that all cases
are uniformly distributed (including the case of x not being present in array). So we sum all the
cases and divide the sum by (n+1). Following is the value of average case time complexity.

best-case performance is used in computer science to describe an algorithm's behavior under optimal
conditions.
Tree represents the nodes connected by edges. We will discuss binary tree or binary search tree
specifically.

Binary Tree is a special data structure used for data storage purposes. A binary tree has a special
condition that each node can have a maximum of two children. A binary tree has the benefits of
both an ordered array and a linked list as search is as quick as in a sorted array and insertion or
deletion operation are as fast as in linked list.

Important Terms

Following are the important terms with respect to tree.

Path − Path refers to the sequence of nodes along the edges of a tree.

Root − The node at the top of the tree is called root. There is only one root per tree and one path
from the root node to any node.

Parent − Any node except the root node has one edge upward to a node called parent.

Child − The node below a given node connected by its edge downward is called its child node.

Leaf − The node which does not have any child node is called the leaf node.

Subtree − Subtree represents the descendants of a node.

Visiting − Visiting refers to checking the value of a node when control is on the node.

Traversing − Traversing means passing through nodes in a specific order.


Levels − Level of a node represents the generation of a node. If the root node is at level 0, then
its next child node is at level 1, its grandchild is at level 2, and so on.

keys − Key represents a value of a node based on which a search operation is to be carried out
for a node.

Binary Search Tree Representation

Binary Search tree exhibits a special behavior. A node's left child must have a value less than its
parent's value and the node's right child must have a value greater than its parent value.

Tree Node
The code to write a tree node would be similar to what is given below. It has a data part and
references to its left and right child nodes.

struct node {
int data;
struct node *leftChild;
struct node *rightChild;
};

In a tree, all nodes share common construct.

Basic Operations

The basic operations that can be performed on a binary search tree data structure, are the
following −

Insert − Inserts an element in a tree/create a tree.

Search − Searches an element in a tree.

Preorder Traversal − Traverses a tree in a pre-order manner.


Inorder Traversal − Traverses a tree in an in-order manner.

Postorder Traversal − Traverses a tree in a post-order manner.

Insert Operation

The very first insertion creates the tree. Afterwards, whenever an element is to be inserted, first
locate its proper location. Start searching from the root node, then if the data is less than the key
value, search for the empty location in the left subtree and insert the data. Otherwise, search for
the empty location in the right subtree and insert the data.

Algorithm

If root is NULL
then create root node
return

If root exists then


compare the data with node.data

while until insertion position is located

If data is greater than node.data


goto right subtree
else
goto left subtree

endwhile

insert data

end If
Insertion in a Binary Tree in level order

Given a binary tree and a key, insert the key into the binary tree at first position available in level
order.

Tree Traversals (Inorder, Preorder and Postorder)

Unlike linear data structures (Array, Linked List, Queues, Stacks, etc) which have only one logical
way to traverse them, trees can be traversed in different ways. Following are the generally used
ways for traversing trees

Depth First Traversals:

(a) Inorder (Left, Root, Right) : 4 2 5 1 3


(b) Preorder (Root, Left, Right) : 1 2 4 5 3
(c) Postorder (Left, Right, Root) : 4 5 2 3 1

Breadth First or Level Order Traversal : 1 2 3 4 5


Please see this post for Breadth First Traversal.
Inorder Traversal:
Algorithm Inorder(tree)
1. Traverse the left subtree, i.e., call Inorder(left-subtree)
2. Visit the root.
3. Traverse the right subtree, i.e., call Inorder(right-subtree)
Uses of Inorder
In case of binary search trees (BST), Inorder traversal gives nodes in non-decreasing order. To
get nodes of non-increasing order, a variation of Inorder traversal where Inorder traversal s
reversed can be used.
Example: Inorder traversal for the above-given figure is 4 2 5 1 3.
Preorder Traversal:
Algorithm Preorder(tree)
1. Visit the root.
2. Traverse the left subtree, i.e., call Preorder(left-subtree)
3. Traverse the right subtree, i.e., call Preorder(right-subtree)
Uses of Preorder
Preorder traversal is used to create a copy of the tree. Preorder traversal is also used to get
prefix expression on of an expression tree
Example: Preorder traversal for the above given figure is 1 2 4 5 3.

Postorder Traversal:
Algorithm Postorder(tree)
1. Traverse the left subtree, i.e., call Postorder(left-subtree)
2. Traverse the right subtree, i.e., call Postorder(right-subtree)
3. Visit the root.
Uses of Postorder
Postorder traversal is used to delete the tree. Please see the question for deletion of tree for
details. Postorder traversal is also useful to get the postfix expression of an expression tree.
Binary Search

Given a sorted array arr[] of n elements, write a function to search a given element x in arr[].
A simple approach is to do linear search.The time complexity of above algorithm is O(n). Another
approach to perform the same task is using Binary Search.

Binary Search: Search a sorted array by repeatedly dividing the search interval in half. Begin with
an interval covering the whole array. If the value of the search key is less than the item in the
middle of the interval, narrow the interval to the lower half. Otherwise narrow it to the upper
half. Repeatedly check until the value is found or the interval is empty.

The idea of binary search is to use the information that the array is sorted and reduce the time
complexity to O(Log n).

Basically ignore half of the elements just after one comparison.

Compare x with the middle element.

If x matches with middle element, we return the mid index.


Else If x is greater than the mid element, then x can only lie in right half subarray after the mid
element. So we recur for right half.
Else (x is smaller) recur for the left half.

You might also like