0% found this document useful (0 votes)
50 views32 pages

Unit 34 Algo

The document discusses various shortest path algorithms including Dijkstra's, Bellman-Ford, Floyd-Warshall, and topological sorting, outlining their methodologies, complexities, and limitations. Dijkstra's algorithm is efficient for non-negative graphs, while Bellman-Ford can handle negative weights and detect cycles. Floyd-Warshall is suited for dense graphs, and topological sorting is applicable for directed acyclic graphs, aiding in task scheduling and dependency resolution.

Uploaded by

sakinabohra0909
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views32 pages

Unit 34 Algo

The document discusses various shortest path algorithms including Dijkstra's, Bellman-Ford, Floyd-Warshall, and topological sorting, outlining their methodologies, complexities, and limitations. Dijkstra's algorithm is efficient for non-negative graphs, while Bellman-Ford can handle negative weights and detect cycles. Floyd-Warshall is suited for dense graphs, and topological sorting is applicable for directed acyclic graphs, aiding in task scheduling and dependency resolution.

Uploaded by

sakinabohra0909
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Unit 3

Shortest path algorithms:

Dijkstra algorithm:

Given a weighted undirected graph represented as an edge list and a source vertex src, find the shortest path
distances from the source vertex to all other vertices in the graph. The graph contains V vertices, numbered
from 0 to V - 1.
Note: The given graph does not contain any negative edge.
In Dijkstra's Algorithm, the goal is to find the shortest distance from a given source node to all other nodes
in the graph. As the source node is the starting point, its distance is initialized to zero. From there, we
iteratively pick the unprocessed node with the minimum distance from the source, this is where a
min-heap (priority queue) or a set is typically used for efficiency. For each picked node u, we update the
distance to its neighbors v using the formula: dist[v] = dist[u] + weight[u][v], but only if this new path
offers a shorter distance than the current known one. This process continues until all nodes have been
processed.
Step-by-Step Implementation
1. Set dist[source]=0 and all other distances as infinity.

2. Push the source node into the min heap as a pair <distance, node> → i.e., <0, source>.

3. Pop the top element (node with the smallest distance) from the min heap.

1. For each adjacent neighbor of the current node:


2. Calculate the distance using the formula:
dist[v] = dist[u] + weight[u][v]
If this new distance is shorter than the current dist[v], update it.
Push the updated pair <dist[v], v> into the min heap

4. Repeat step 3 until the min heap is empty.

5. Return the distance array, which holds the shortest distance from the source to all nodes.

Time Complexity: O(E*logV), Where E is the number of edges and V is the number of vertices.
Auxiliary Space: O(V), Where V is the number of vertices,

Illustration:
Bellman ford Algorithm:

Given a weighted graph with V vertices and E edges, along with a source vertex src, the task is to compute the
shortest distances from the source to all other vertices. If a vertex is unreachable from the source, its distance
should be marked as 108. In the presence of a negative weight cycle, return -1 to signify that shortest path
calculations are not feasible.

Approach: Bellman-Ford Algorithm - O(V*E) Time and O(V) Space

Negative weight cycle:

A negative weight cycle is a cycle in a graph, whose sum of edge weights is negative. If you traverse the cycle, the
total weight accumulated would be less than zero.

In the presence of negative weight cycle in the graph, the shortest path doesn't exist because with each traversal
of the cycle shortest path keeps decreasing.

Limitation of Dijkstra's Algorithm:

Since, we need to find the single source shortest path, we might initially think of using Dijkstra's algorithm.
However, Dijkstra is not suitable when the graph consists of negative edges. The reason is, it doesn't revisit those
nodes which have already been marked as visited. If a shorter path exists through a longer route with negative
edges, Dijkstra's algorithm will fail to handle it.

Principle of Relaxation of Edges

 Relaxation means updating the shortest distance to a node if a shorter path is found through another node.
For an edge (u, v) with weight w:
o If going through u gives a shorter path to v from the source node (i.e., distance[v] > distance[u] + w), we
update the distance[v] as distance[u] + w.
 In the bellman-ford algorithm, this process is repeated (V - 1) times for all the edges.

Why Relaxing Edges (V - 1) times gives us Single Source Shortest Path?

A shortest path between two vertices can have at most (V - 1) edges. It is not possible to have a simple path with
more than (V - 1) edges (otherwise it would form a cycle). Therefore, repeating the relaxation process (V - 1)
times ensures that all possible paths between source and any other node have been covered.

Detection of a Negative Weight Cycle

 As we have discussed earlier that, we need (V - 1) relaxations of all the edges to achieve single source shortest
path. If one additional relaxation (Vth) for any edge is possible, it indicates that some edges with overall
negative weight has been traversed once more. This indicates the presence of a negative weight cycle in the
graph.

Bellman-Ford is a single source shortest path algorithm. It effectively works in the cases of negative edges
and is able to detect negative cycles as well. It works on the principle of relaxation of the edges.
Floyd warshall algo: (Dynamic Programming approach)

Given a matrix dist[][] of size n x n, where dist[i][j] represents the weight of the edge from node i to node j. If
there is no direct edge, dist[i][j] is set to a large value (e.g., 10⁸) to represent infinity. The diagonal
entries dist[i][i] are 0, since the distance from a node to itself is zero. The graph may contain negative edge
weights, but it does not contain any negative weight cycles.
Your task is to determine the shortest path distance between all pair of nodes i and j in the graph.
The Floyd–Warshall algorithm works by maintaining a two-dimensional array that represents the distances
between nodes. Initially, this array is filled using only the direct edges between nodes. Then, the algorithm
gradually updates these distances by checking if shorter paths exist through intermediate nodes.
This algorithm works for both the directed and undirected weighted graphs and can handle graphs with
both positive and negative weight edges.
Note: It does not work for the graphs with negative cycles (where the sum of the edges in a cycle is negative).

Why Floyd Warshall Works (Correctness Proof)?


The algorithm relies on the principle of optimal substructure, meaning:
 If the shortest path from i to j passes through some vertex k, then the path from i to k and the path from k to j
must also be shortest paths.
 The iterative approach ensures that by the time vertex k is considered, all shortest paths using only vertices 0
to k-1 have already been computed.
By the end of the algorithm, all shortest paths are computed optimally because each possible intermediate vertex
has been considered.

Why Floyd-Warshall Algorithm better for Dense Graphs and not for Sparse Graphs?

 Dense Graph: A graph in which the number of edges are significantly much higher than the number of
vertices.
 Sparse Graph: A graph in which the number of edges are very much low.
 No matter how many edges are there in the graph the Floyd Warshall Algorithm runs for O(V3) times
therefore it is best suited for Dense graphs. In the case of sparse graphs, Johnson's Algorithm is more
suitable.
Step-by-step implementation
 Start by updating the distance matrix by treating each vertex as a possible intermediate node between all
pairs of vertices.
 Iterate through each vertex, one at a time. For each selected vertex k, attempt to improve the shortest paths
that pass through it.
 When we pick vertex number k as an intermediate vertex, we already have considered vertices {0, 1, 2, ..
k-1} as intermediate vertices.
 For every pair (i, j) of the source and destination vertices respectively, there are two possible cases.
o k is not an intermediate vertex in shortest path from i to j. We keep the value of dist[i][j] as it is.
o k is an intermediate vertex in shortest path from i to j. We update the value of dist[i][j] as dist[i][k] +
dist[k][j], if dist[i][j] > dist[i][k] + dist[k][j]
 Repeat this process for each vertex k until all intermediate possibilities have been considered.
Illustration:
Transitive closure of a graph :

Given a directed graph, determine if a vertex j is reachable from another vertex i for all vertex pairs (i, j) in the given
graph. Here reachable means that there is a path from vertex i to j. The reach-ability matrix is called the transitive
closure of a graph.
Example:
Input: Graph = [[0, 1, 1, 0], [0, 0, 1, 0], [1, 0, 0, 1], [0, 0, 0, 0]]

Approach:
The idea is to use the Floyd-Warshall algorithm to find the transitive closure of a directed graph. We
start by initializing a result matrix with the original adjacency matrix and set all diagonal elements to
1 (since every vertex can reach itself). Then for each intermediate vertex k, we check if vertex i can
reach vertex j either directly or through vertex k. If a path exists from i to k and from k to j, then we
mark that i can reach j in our result matrix.
Step by step approach:
1. Initialize result matrix with input graph and set all diagonal elements to 1.
2. For each intermediate vertex, check all possible vertex pairs.
3. If vertex i can reach intermediate vertex and intermediate vertex can reach vertex j, mark that i can reach j. This
iteratively builds paths of increasing length through intermediate vertices.
4. Return the final matrix where a value of 1 indicates vertex i can reach vertex j.

Time Complexity: O(v^3) where v is number of vertices in the given graph.


Auxiliary Space: O(v^2) to store the result.

We have discussed an O(V3) solution for this here. The solution was based on Floyd Warshall Algorithm. In this
post, DFS solution is discussed. So for dense graph, it would become O(V3) and for sparse graph, it would become
O(V2).
Below are the abstract steps of the algorithm.
 Create a matrix tc[V][V] that would finally have transitive closure of the given graph. Initialize all entries of
tc[][] as 0.
 Call DFS for every node of the graph to mark reachable vertices in tc[][]. In recursive calls to DFS, we don't call
DFS for an adjacent vertex if it is already marked as reachable in tc[][].
 The code uses adjacency list representation of input graph and builds a matrix tc[V][V] such that tc[u][v] would
be true if v is reachable from u.
Topological Sorting :

Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices such that for every directed
edge u-v, vertex u comes before v in the ordering.
Note: Topological Sorting for a graph is not possible if the graph is not a DAG.
Example:
Input: V = 6, edges = [[2, 3], [3, 1], [4, 0], [4, 1], [5, 0], [5, 2]]

Output: 5 4 2 3 1 0
Explanation: The first vertex in topological sorting is always a vertex with an in-degree of 0 (a vertex with
no incoming edges). A topological sorting of the following graph is "5 4 2 3 1 0". There can be more than
one topological sorting for a graph. Another topological sorting of the following graph is "4 5 2 3 1 0".

Topological Sorting vs Depth First Traversal (DFS):

In DFS, we print a vertex and then recursively call DFS for its adjacent vertices. In topological sorting, we need to
print a vertex before its adjacent vertices.
For example, In the above given graph, the vertex '5' should be printed before vertex '0', but unlike DFS, the
vertex '4' should also be printed before vertex '0'. So Topological sorting is different from DFS. For example, a DFS
of the shown graph is "5 2 3 1 0 4", but it is not a topological sorting.

Topological Sorting in Directed Acyclic Graphs (DAGs)

DAGs are a special type of graphs in which each edge is directed such that no cycle exists in the graph, before
understanding why Topological sort only exists for DAGs, lets first answer two questions:
 Why Topological Sort is not possible for graphs with undirected edges?
This is due to the fact that undirected edge between two vertices u and v means, there is an edge
from u to v as well as from v to u. Because of this both the nodes u and v depend upon each other
and none of them can appear before the other in the topological ordering without creating a
contradiction.
 Why Topological Sort is not possible for graphs having cycles?
Imagine a graph with 3 vertices and edges = {1 to 2 , 2 to 3, 3 to 1} forming a cycle. Now if we try
to topologically sort this graph starting from any vertex, it will always create a contradiction to our
definition. All the vertices in a cycle are indirectly dependent on each other hence topological
sorting fails.

Topological order may not be Unique: Topological sorting is a dependency problem in which completion of one
task depends upon the completion of several other tasks whose order can vary.
Algorithm for Topological Sorting using DFS:

Here’s a step-by-step algorithm for topological sorting using Depth First Search (DFS):
 Create a graph with n vertices and m-directed edges.
 Initialize a stack and a visited array of size n.
 For each unvisited vertex in the graph, do the following:
o Call the DFS function with the vertex as the parameter.
o In the DFS function, mark the vertex as visited and recursively call the DFS function for all unvisited
neighbors of the vertex.
o Once all the neighbors have been visited, push the vertex onto the stack.
 After all, vertices have been visited, pop elements from the stack and append them to the output list until the
stack is empty.
 The resulting list is the topologically sorted order of the graph.

Time Complexity: O(V+E). The above algorithm is simply DFS with an extra stack. So time complexity is the same
as DFS.
Auxiliary space: O(V). due to creation of the stack.

Topological Sorting Using BFS:

The BFS based algorithm for Topological Sort is called Kahn's Algorithm. The Kahn's algorithm has same time
complexity as the DFS based algorithm discussed above.

Kahn's Algorithm works by repeatedly finding vertices with no incoming edges, removing them from the
graph, and updating the incoming edges of the vertices connected from the removed removed edges. This
process continues until all vertices have been ordered.

Algorithm:

 Add all nodes with in-degree 0 to a queue.


 While the queue is not empty:
o Remove a node from the queue.
o For each outgoing edge from the removed node, decrement the in-degree of the destination node by 1.
o If the in-degree of a destination node becomes 0, add it to the queue.
 If the queue is empty and there are still nodes in the graph, the graph contains a cycle and cannot be
topologically sorted.
 The nodes in the queue represent the topological ordering of the graph.
How to find the in-degree of each node?

To find the in-degree of each node by initially calculating the number of incoming edges to each
node. Iterate through all the edges in the graph and increment the in-degree of the destination
node for each edge. This way, you can determine the in-degree of each node before starting the
sorting process.

Advantages of Topological Sort:

 Helps in scheduling tasks or events based on dependencies.


 Detects cycles in a directed graph.
 Efficient for solving problems with precedence constraints.

Disadvantages of Topological Sort:

 Only applicable to directed acyclic graphs (DAGs), not suitable for cyclic graphs.
 May not be unique, multiple valid topological orderings can exist.

Applications of Topological Sort:

 Task scheduling and project management.


 In software deployment tools like Makefile.
 Dependency resolution in package management systems.
 Determining the order of compilation in software build systems.
 Deadlock detection in operating systems.
 Course scheduling in universities.
 It is used to find shortest paths in weighted directed acyclic graphs

Network Flow Algorithms

The max flow problem is a classic optimization problem in graph theory that involves finding the maximum
amount of flow that can be sent through a network of pipes, channels, or other pathways, subject to capacity
constraints. The problem can be used to model a wide variety of real-world situations, such as transportation
systems, communication networks, and resource allocation.
In the max flow problem, we have a directed graph with a source node s and a sink node t, and each edge has a
capacity that represents the maximum amount of flow that can be sent through it. The goal is to find the
maximum amount of flow that can be sent from s to t, while respecting the capacity constraints on the edges.
One common approach to solving the max flow problem is the Ford-Fulkerson algorithm, which is based on the
idea of augmenting paths. The algorithm starts with an initial flow of zero, and iteratively finds a path from s to t
that has available capacity, and then increases the flow along that path by the maximum amount possible. This
process continues until no more augmenting paths can be found.
Another popular algorithm for solving the max flow problem is the Edmonds-Karp algorithm, which is a variant of
the Ford-Fulkerson algorithm that uses breadth-first search to find augmenting paths, and thus can be more
efficient in some cases.

Ford-Fulkerson Algorithm
The Ford-Fulkerson algorithm is a widely used algorithm to solve the maximum flow problem in a flow network.
The maximum flow problem involves determining the maximum amount of flow that can be sent from a source
vertex to a sink vertex in a directed weighted graph, subject to capacity constraints on the edges.
The algorithm works by iteratively finding an augmenting path, which is a path from the source to the sink in the
residual graph, i.e., the graph obtained by subtracting the current flow from the capacity of each edge. The
algorithm then increases the flow along this path by the maximum possible amount, which is the minimum
capacity of the edges along the path.

The following is simple idea of Ford-Fulkerson algorithm:


1. Start with initial flow as 0.
2. While there exists an augmenting path from the source to the sink:
 Find an augmenting path using any path-finding algorithm, such as breadth-first search or depth-first search.
 Determine the amount of flow that can be sent along the augmenting path, which is the minimum residual
capacity along the edges of the path.
 Increase the flow along the augmenting path by the determined amount.
3. Return the maximum flow.

Time Complexity : O(|V| * E^2) ,where E is the number of edges and V is the number of vertices.
Space Complexity :O(V) , as we created queue.
The above implementation of Ford Fulkerson Algorithm is called Edmonds-Karp Algorithm. The idea of
Edmonds-Karp is to use BFS in Ford Fulkerson implementation as BFS always picks a path with minimum number
of edges. When BFS is used, the worst case time complexity can be reduced to O(VE2).
(MST And BFS/DFS copy me)
Unit 4

Tractable and Intractable problems:


In the realm of computer science, "tractable" and "intractable" problems refer to the difficulty of finding solutions
using algorithms, particularly in terms of time complexity. Tractable problems can be solved in a reasonable
amount of time, even for large inputs, while intractable problems require exponentially long time to solve, even for
small inputs.

Tractable Problems:
A problem is considered tractable if there exists an algorithm that can solve it in polynomial time (O(n^k), where k
is a constant). This means the time required to solve the problem grows at most proportionally to a power of the
input size.

Characteristics:
Tractable problems are considered "easy" to solve because they can be solved in a reasonable amount of time,
even with large datasets. These are the "easy" problems — not necessarily easy in real life, but easy in terms of
time complexity.
Such problems belong to the complexity class P.

Intractable Problems:
A problem is considered intractable if there is no known algorithm that can solve it in polynomial time. This means
the time required to solve the problem grows exponentially with the input size.

Characteristics:
Intractable problems are considered "hard" to solve because the time required to find a solution becomes
prohibitively long as the input size increases.
These are the "hard" problems — no efficient solution is known for large inputs.
These often belong to classes like NP, NP-Complete, NP-Hard
Problems that “can be” solved but the amount of time it takes to solve is too large.
Can be solved in reasonable time only for small inputs. Or, can not be solved at all.
As their input grows large, we are unable to solve them in reasonable time.
Real-World Analogy

Imagine solving a Rubik’s cube:

 Tractable problem: You follow a known algorithm; it finishes quickly (like solving with steps).
 Intractable problem: You randomly try all possible moves — takes a huge amount of time.

Understanding tractable and intractable problems helps in:

 Designing efficient software


 Predicting performance
 Choosing the right algorithm
 Knowing when to approximate instead of finding the perfect solution
Computability of Algorithms:
Computability is a branch of theoretical computer science that studies what problems can or cannot be solved using
algorithms.Computability refers to the ability to solve a problem using a computer.
It's closely related to the existence of algorithms, which are step-by-step procedures that, when followed, will
always lead to a correct solution.
The theory of computability focuses on what problems can be solved algorithmically and what problems cannot.

In simple terms: "Can we design an algorithm that will always give the correct result in a finite amount of time for
every valid input?"

What Is an Algorithm?

An algorithm is a step-by-step procedure or set of rules to perform a task or solve a problem. It must be:

 Definite (clear steps)


 Finite (stops at some point)
 Effective (works correctly)
 Takes input and produces output

Why Is Computability Important?

 It tells us the boundaries of what computers can and cannot do.


 Helps prevent wasting time trying to solve problems that are unsolvable.
 Connects deeply with real-world issues like decision-making, automation, AI, and cryptography.

Two Main Categories of Problems in Computability

Type Description
Problems for which an algorithm exists that can solve every instance of the problem in a finite
Decidable Problems
amount of time.
Undecidable
Problems for which no algorithm exists that can solve all cases correctly in finite time.
Problems

Decidable Problems (Computable Problems)

These are problems we can solve with an algorithm.

Examples:

 Checking if a number is prime.


 Sorting a list of numbers.
 Calculating factorial of a number.
 Searching for an item in an array.

All problems in P and NP classes are decidable.

Undecidable Problems (Non-computable Problems)

These are problems for which no algorithm exists that will work for all inputs.

Famous Example: The Halting Problem


What is the Halting Problem?

Given a program and an input, determine whether the program will halt (stop) or run forever on that input.

Alan Turing proved that it is impossible to create a general algorithm that solves this problem for all programs and
inputs.

The halting problem is a fundamental issue in theory and computation. The problem is to determine whether a
computer program will halt or run forever.
Definition: The Halting Problem asks whether a given program or algorithm will eventually halt (terminate) or
continue running indefinitely for a particular input. "Halting" means that the program will either accept or reject
the input and then terminate, rather than going into an infinite loop.
The Core Question: Can we create an algorithm that determines whether any given program will halt for a
specific input?
Answer: No, it is impossible to design a generalized algorithm that can accurately determine whether any
arbitrary program will halt. The only way to know if a specific program halts is to run it and observe the outcome.
This makes the Halting Problem an undecidable problem.
We can refrain the halting problem question in such a way also: Given a program written in any programming
language (C/C++, Java, etc.), will it enter an infinite loop or will it terminate? This question cannot be answered
by a general algorithm, making it undecidable.

Turing Machines and Computability

A Turing Machine is a mathematical model of computation invented by Alan Turing. It is used to formally define the
concept of algorithm and computability.

Turing machines are an important tool for studying the limits of computation and for understanding the
foundations of computer science. They provide a simple yet powerful model of computation that has been
widely used in research and has had a profound impact on our understanding of algorithms and computation.
While one might consider using programming languages like C to study computation, Turing Machines are
preferred because:
 They are simpler to analyze.
 They provide a clear, mathematical model of computation.
 They possess infinite memory, making them even more powerful than real-world computers.
A Turing Machine consists of a tape of infinite length on which read and writes operation can be performed. The
tape consists of infinite cells on which each cell either contains input symbol or a special symbol called blank. It
also consists of a head pointer which points to cell currently being read and it can move in both directions.

Turing-Computable:

If a problem can be solved using a Turing machine, it is computable.

Church-Turing Thesis : A foundational theory that states:


"Anything that can be computed algorithmically can be computed by a Turing Machine."

This means all modern computers are equivalent in power to a Turing machine — though faster, they cannot
compute problems a Turing Machine can’t solve.

Examples of Computability in Real Life

Task Computable? Example


Sorting student grades ✅ Yes Use Merge Sort
Checking if a file exists ✅ Yes File system search
Proving if a computer program halts on all inputs ❌ No Halting Problem
Generating all possible chess games ❌ No (not in practical time) Exponential combinations
Translating languages accurately ❌ Not fully computable Requires understanding semantics

Implications of Computability

 Some problems will never be solvable with a computer.


 Even powerful computers (supercomputers, AI) cannot overcome computability limits.
 Understanding computability helps design smarter systems and avoid impossible goals.

Relation with Tractable vs. Intractable

Category Description
Intractable Problem Solvable, but takes too long (e.g., exponential time).
Undecidable Problem Not solvable by any algorithm, regardless of time.

All undecidable problems are intractable, but not all intractable problems are undecidable.

Deterministic Computation and the Class P

Deterministic computation always produces the same output for a given input when started from the same initial
state. There is no randomness involved—if the computation process and input do not change, the outcome will
always be the same.

Models of Deterministic Computation

One of the primary models for deterministic computation is the Deterministic Turing Machine (DTM).

Deterministic Turing Machine (DTM)

A deterministic one-tape Turing machine includes:

 A finite state control


 A read-write head
 A two-way tape with an infinite sequence of cells

This setup ensures a linear, predictable progression of states and transitions.

A program for a DTM specifies:


 A finite set of tape symbols (input symbols and a blank symbol)
 A finite set of states
 A transition function defining the behavior of the machine

The Class P

In algorithmic analysis, if a problem can be solved in polynomial time by a deterministic Turing machine, it is
classified under the Class P.

Nondeterministic Computation and the Class NP

To address certain computational problems more efficiently, another model called the Nondeterministic Turing
Machine (NDTM) is used.

Nondeterministic Turing Machine (NDTM)

While structurally similar to a DTM, an NDTM includes an extra component:

A guessing module with a write-only head

This allows the machine to "guess" a solution path, enabling it to explore multiple computation paths
simultaneously.

The Class NP

If a problem can be solved in polynomial time by a nondeterministic Turing machine, it belongs to the Class NP.

Computability classes – P, NP, NP complete and NP-hard:

In computer science, problems are divided into classes known as Complexity Classes. In complexity theory, a
Complexity Class is a set of problems with related complexity. With the help of complexity theory, we try to cover
the following.
 Problems that cannot be solved by computers.
 Problems that can be efficiently solved (solved in Polynomial time) by computers.
 Problems for which no efficient solution (only exponential time algorithms) exist.
The common resources required by a solution are are time and space, meaning how much time the algorithm
takes to solve a problem and the corresponding memory usage.
 The time complexity of an algorithm is used to describe the number of steps required to solve a problem, but it
can also be used to describe how long it takes to verify the answer.
 The space complexity of an algorithm describes how much memory is required for the algorithm to operate.
 An algorithm having time complexity of the form O(nk) for input n and constant k is called polynomial time
solution. These solutions scale well. On the other hand, time complexity of the form O(kn) is exponential time.
Complexity classes are useful in organizing similar types of problems.

1. P (Polynomial Time) — Easy Problems

P stands for problems that are easy to solve — your computer can solve them quickly, even for large inputs.The P in
the P class stands for Polynomial Time. It is the collection of decision problems(problems with a "yes" or "no"
answer) that can be solved by a deterministic machine (our computers) in polynomial time.

Key Idea: If you can solve it fast (in polynomial time, like n² or n³), it’s in P.

 The solution to P problems is easy to find.


 P is often a class of computational problems that are solvable and tractable. Tractable means that the
problems can be solved in theory as well as in practice. But the problems that can be solved in theory but not
in practice are known as intractable.

Example:

 Sorting a list
 Checking if a number is even or odd
 Finding the shortest path using Dijkstra’s Algorithm

You can solve it and also check the answer easily.

2. NP (Non-deterministic Polynomial Time) — Hard to Solve, Easy to Check

NP means: You might not know how to solve the problem quickly, but if someone gives you the answer, you can
verify it quickly.The NP in NP class stands for Non-deterministic Polynomial Time. It is the collection of decision
problems that can be solved by a non-deterministic machine (note that our computers are deterministic) in
polynomial time we aren't asking for a way to find a solution, but only to verify that an alleged solution really is
correct.Every problem in this class can be solved in exponential time using exhaustive search.

Features:
 The solutions of the NP class might be hard to find since they are being solved by a non-deterministic machine
but the solutions are easy to verify.
 Problems of NP can be verified by a deterministic machine in polynomial time.
Example:
Let us consider an example to better understand the NP class. Suppose there is a company having a total
of 1000 employees having unique employee IDs. Assume that there are 200 rooms available for them. A selection
of 200 employees must be paired together, but the CEO of the company has the data of some employees who
can't work in the same room due to personal reasons.
This is an example of an NP problem. Since it is easy to check if the given choice of 200 employees proposed by a
coworker is satisfactory or not i.e. no pair taken from the coworker list appears on the list given by the CEO. But
generating such a list from scratch seems to be so hard as to be completely impractical.
It indicates that if someone can provide us with the solution to the problem, we can find the correct and incorrect
pair in polynomial time. Thus for the NP class problem, the answer is possible, which can be calculated in
polynomial time.

Example:

Sudoku puzzle:
Solving is hard.
But if I show you a filled Sudoku, you can check if it’s correct easily.

You can check the answer fast, but solving may take a long time.

Relationship Between P and NP:

Question Answer
Is P a subset of NP? ✅ Yes. Because if you can solve it fast, you can also check it fast.
Is NP a subset of P? ❓ We don't know. This is the famous P vs NP problem.

3. Co-NP — Opposite of NP

Co-NP includes problems where you can easily verify a "no" answer, instead of a "yes" one.Co-NP stands for the
complement of NP Class. It means if the answer to a problem in Co-NP is No, then there is proof that can be
checked in polynomial time.
Features:
 If a problem X is in NP, then its complement X' is also in CoNP.
 For an NP and CoNP problem, there is no need to verify all the answers at once in polynomial time, there is a
need to verify only one particular answer "yes" or "no" in polynomial time for a problem to be in NP or CoNP.

Example:

 You are asked: "Does this equation NOT have a solution?"


 It might be easy to check that no solution exists, but hard to prove that a solution does exist.

NP = easy to check YES


Co-NP = easy to check NO

4. NP-Complete (NPC) — The Hardest in NP

NP-Complete problems are:


In NP
Every other NP problem can be converted to it

A problem is NP-complete if it is both NP and NP-hard. NP-complete problems are the hard problems in NP.

This means:

 If you solve one NP-Complete problem fast,


 You can solve all NP problems fast!

Features:
 NP-complete problems are special as any problem in NP class can be transformed or reduced into NP-complete
problems in polynomial time.
 If one could solve an NP-complete problem in polynomial time, then one could also solve any NP problem in
polynomial time.

Example:

 Sudoku (decision version)


 Traveling Salesman (with a limited distance)
 3-SAT (satisfiability problem)

NP-Complete = Hardest problems that are still in NP.

4. NP-Hard — Harder than NP

NP-Hard problems are even more difficult:

They may not even be in NP


Not necessarily easy to check the answer

An NP-hard problem is at least as hard as the hardest problem in NP and it is a class of problems such that every
problem in NP reduces to NP-hard.
Features:
 All NP-hard problems are not in NP.
 It takes a long time to check them. This means if a solution for an NP-hard problem is given then it takes a long
time to check whether it is right or not.
 A problem A is in NP-hard if, for every problem L in NP, there exists a polynomial-time reduction from L to A.

Some might not even have a "yes/no" answer (not decision problems).

Example

Finding the best way to schedule classes in college.

Chess: Will white win from this position?


Solving them fast would solve everything, but they can’t even be verified easily sometimes.

A problem is in the class NPC if it is in NP and is as hard as any problem in NP. A problem is NP-hard if all problems
in NP are polynomial time reducible to it, even though it may not be in NP itself.

NP-hard
If a polynomial time algorithm exists for any of these problems, all problems in NP would be polynomial time
solvable. These problems are called NP-complete. The phenomenon of NP-completeness is important for both
theoretical and practical reasons.

Definition of NP-Completeness
A language B is NP-complete if it satisfies two conditions

B is in NP

Every A in NP is polynomial time reducible to B.

If a language satisfies the second property, but not necessarily the first one, the language B is known as NP-Hard.
Informally, a search problem B is NP-Hard if there exists some NP-Complete problem A that Turing reduces to B.

The problem in NP-Hard cannot be solved in polynomial time, until P = NP. If a problem is proved to be NPC, there is
no need to waste time on trying to find an efficient algorithm for it. Instead, we can focus on design approximation
algorithm.

Cook–Levin Theorem:
Cook’s Theorem is one of the most important results in computational complexity theory.
It says: “The Boolean Satisfiability Problem (SAT) is NP-Complete.”

This means:

 SAT is in NP (we can verify answers quickly).


 Every problem in NP can be converted (or reduced) to SAT in polynomial time using a deterministic Turing
machine.

Cook’s Theorem is a very important result in computer science, especially in the topic of P vs NP.

It was proved by Stephen Cook in 1971, and it says:

"The Boolean Satisfiability Problem (SAT) is NP-Complete."


Who proved it?

 Stephen Cook (in 1971) – published the idea in his famous paper "The complexity of theorem-proving
procedures."
 Leonid Levin (in 1973) – independently proved the same result in the Soviet Union.
 So, it's also known as the Cook–Levin Theorem.
 Later, Richard Karp extended this idea and showed that 21 famous problems (like Hamiltonian path, vertex
cover, clique) can also be reduced to SAT. So, they are also NP-Complete.

What is the SAT Problem?

SAT (Satisfiability Problem):

Given a Boolean expression (logic formula), is there any assignment of true/false values to variables that
makes the whole expression true?

✅ If such an assignment exists → Expression is satisfiable


❌ If no such assignment exists → Expression is unsatisfiable

SAT = Boolean Satisfiability Problem

You are given a logic statement (formula) like:

(A ∨ B) ∧ (¬A ∨ C) ∧ (¬B ∨ ¬C)

Can you assign True/False values to A, B, and C so that the whole formula becomes True?

That’s the SAT problem.

SAT asks: “Is there any combination of values that makes the formula true?” This is called Formula-SAT.

Important Terminologies in SAT

Term Simple Explanation


Boolean Variable A variable that can be either True or False
Literal A variable like x or its negation ¬x
Clause Group of literals connected by OR (∨) like (x1 ∨ x2 ∨ ¬x3)
Expression Multiple clauses joined by AND (∧) to make a full formula
CNF (Conjunctive An expression where: each clause is inside () with ORs, and all clauses are joined using
Normal Form) ANDs. Example: (x1 ∨ ¬x2) ∧ (x2 ∨ x3 ∨ ¬x1)
3-CNF A CNF formula where each clause has exactly 3 literals

Why is SAT So Hard?

For n variables, you may need to try all 2ⁿ possible combinations to see if the expression becomes true.
This is called brute-force, and it takes exponential time, which is very slow as n increases.

Cook and Levin proved that SAT is as hard as any other problem in NP, so SAT is the first NP-Complete problem.

Cook’s Reduction Idea

Cook's idea was simple but powerful: “If I can take any problem in NP and turn it into SAT (using polynomial time),
and SAT is in NP, then SAT is the most difficult of all NP problems.”
This is called a polynomial-time reduction.

Before Cook’s Theorem, we didn’t know which problems were the "hardest" in NP.

Cook proved that SAT is the first NP-Complete problem.


That means:

 SAT is in NP
 All other NP problems can be converted (reduced) to SAT in polynomial time

Reduction = Changing one problem into another.

If you have a hard NP problem (like Sudoku), and you can turn it into a SAT problem,
then solving SAT would also solve Sudoku.

Cook showed that every problem in NP can be changed into an SAT problem.

Cook’s Theorem — Full Statement:

“Every problem in NP can be reduced to SAT in polynomial time. So, SAT is NP-Complete.”

This was the first problem proved to be NP-Complete.


It opened the door to discovering other NP-Complete problems.

Variants of SAT

There are three main types of SAT problems used in theory:

1. Circuit-SAT

 You’re given a logic circuit made from AND, OR, NOT gates
 You have to check: Is there any combination of inputs that makes the circuit output TRUE?
 Needs 2ⁿ combinations → exponential time → hard problem

2. CNF-SAT

 You’re given a Boolean expression in Conjunctive Normal Form.


 Example: (x1 ∨ x2 ∨ ¬x3) ∧ (¬x1 ∨ x2 ∨ x3)
 You must check if there is a truth assignment that makes the whole formula true.

3. 3-CNF-SAT (3-SAT)

 A special case of CNF-SAT.


 Each clause must have exactly 3 literals.
 Also a hard problem and widely used to prove other problems are NP-Complete.

Why Is This Theorem Important?

Reason Explanation
First NP-Complete Problem SAT was the first problem ever proven to be NP-Complete.
Foundation of Reductions Cook showed how to reduce all NP problems to SAT.
Led to More Discoveries Karp used it to show many other problems are also NP-Complete.
Basis of Complexity Theory Central to the famous question "Is P = NP?"
Some problems in computer science are very hard to solve exactly, especially when they belong to a class called
NP-Hard problems. This means that finding the perfect (optimal) solution takes too much time — possibly years or
centuries, even for a powerful computer.

So instead of finding the perfect answer, we try to find a "good enough" answer — and we do it quickly.
This is where Approximation Algorithms come in!

Definition: An Approximation Algorithm is an algorithm that gives near-optimal solutions to hard optimization
problems, usually in polynomial time.

So, you don't get the best solution, but you get a solution that's close enough, and you get it fast.

Features of Approximation Algorithm:

 Fast (Polynomial Time):


They run in polynomial time, meaning they work efficiently even for large inputs.
 Almost-Optimal Solutions:
The solution is not perfect but very close to the best — for example, within 1% of the optimum.
 Used for Hard Problems:
They are used to solve NP-Hard optimization problems when exact algorithms are too slow.

Example to Understand Easily

The Knapsack Problem:

Suppose you have:

 A bag that can hold 10 kg


 Items with weights and values

You need to maximize value, without going over 10 kg.

The exact solution checks all combinations of items, which takes a lot of time.

But an approximation algorithm might:

 Take the item with the highest value-to-weight ratio


 Add items greedily until the bag is full

This way, you get a very good solution fast, even if it's not the best possible.

Approximation Ratio (or Factor):

This tells us how close the approximation algorithm’s solution is to the optimal one.

Let:

 C = cost of the approximation algorithm’s solution


 C* = cost of the optimal solution

Then the approximation ratio is:

 For minimization: C / C*
 For maximization: C* / C
If this ratio is close to 1, it means the solution is very close to optimal.

Example: If an algorithm has an approximation ratio of 1.5, it means the solution is at most 50% worse than the
best one.

Performance Ratios in Approximation Algorithms:

Performance ratio shows how close the approximation is to the optimal solution.

Scenario 1: General Definition

Suppose:

 C is the cost (value) of the solution found by the approximation algorithm.


 C* is the cost of the optimal (best possible) solution.

Then we define a performance ratio P(n) as:

max(C/C*,C*/C)≤P(n))

This means:
The approximation solution is at most P(n) times worse than the optimal.

Scenario 2: Types of Problems

For Maximization Problems:

You want the largest possible value.

0 < C < C*

Ratio = C* / C
(How much better the best is than the approximation)

For Minimization Problems:

You want the smallest possible value.

0 < C* < C

Ratio = C / C*
(How much worse the approximation is than the best)

If an algorithm always stays within this ratio, we call it a P(n)-approximation algorithm.

Advantages

 Faster than exact algorithms


 Useful for real-world large problems
 Provides a guarantee on how close the result is to optimal

Disadvantages

 Does not give the perfect answer


 The approximation ratio may vary depending on the input
 Sometimes still complex (though better than exact solutions)
Types of Approximation Algorithms

Greedy Algorithms

 Make the best choice at each step


 Example: Fractional Knapsack

Local Search: Start with a solution and try to improve it step by step

Dynamic Programming Approximation: Modify DP algorithms to make them faster but slightly less accurate

Randomized Algorithm: Use randomness to find a good solution

Real-Life Examples of Approximation Algorithms:

1. Vertex Cover Problem

 Optimization Goal: Find the smallest number of vertices that cover all edges in a graph.
 Approximation Goal: Find a small number of such vertices (not necessarily the smallest).

2. Travelling Salesman Problem (TSP)

 Optimization Goal: Find the shortest possible tour that visits every city once.
 Approximation Goal: Find a tour that is close to the shortest.

3. Set Cover Problem

 You have a set of items and many subsets.


 Optimization Goal: Use the smallest number of subsets to cover all items.
 Approximation Goal: Cover all items using a small number of subsets (approximation uses logarithmic factor).

4. Subset Sum Problem

 Given numbers {x₁, x₂, x₃, ..., xₙ} and a target value t.
 Optimization Goal: Find a subset whose sum is as large as possible but ≤ t.
 Approximation Goal: Find a subset that is close to the target.

Vertex Cover Problem:

A vertex cover of a graph is a set of vertices such that:

Every edge in the graph has at least one end (vertex) inside this set.

In other words, for every edge (u, v), either u is in the set or v is in the set (or both).

Even though it’s called “vertex cover,” it actually covers all edges by selecting the right vertices.
Goal of the Problem

Given a graph, the goal is to find the smallest possible vertex cover — i.e., use the minimum number of vertices to
cover all edges.

Example

Imagine this simple graph:

A --- B
|
|
C

Edges:

A-B

A-C

✅ A valid vertex cover is: {A}


(Because both edges are connected to A)

✅ Another one is: {B, C}


(This also covers both edges — A-B is covered by B, A-C is covered by C)

❌ But {B} alone is not valid, because A-C is not covered.

Why Is This Problem Hard?

The Vertex Cover Problem is an NP-Complete problem.

That means:

 There is no fast (polynomial-time) method to solve it exactly, unless P = NP.


 We can solve small graphs exactly by trying every combination.
 But for large graphs, it takes too long, so we need approximation algorithms.

Naive (Brute Force) Approach

Here’s how the naive solution works:

List all possible subsets of vertices.

For each subset, check:

Does it cover all the edges?

Out of all the valid subsets, choose the one with the smallest number of vertices.

✅ Example: For a graph with vertices 0, 1, 2:

 Subsets = {0}, {1}, {2}, {0,1}, {0,2}, {1,2}, {0,1,2}


 Check which subset covers all the edges.
 Pick the smallest such subset.
Problem: This method is very slow, especially when the graph has many vertices (exponential time).

Approximate Algorithm for Vertex Cover (from CLRS book):

This is a fast algorithm that gives an answer close to the best, even if not perfect.

Steps (Easy Explanation):

 Start with an empty set → Result = {}


 Create a set E of all edges in the graph
 While there are still edges left in E:

a) Pick any edge (u, v)

b) Add both u and v to Result

c) Remove all edges connected to u or v from E

 When no edges are left, return the Result set as the vertex cover

Example

Let’s say you have a graph:

A --- B
|
|
C

Edges = {(A-B), (A-C)}

Pick edge (A-B)

Add A and B to result → Result = {A, B}

Remove edges touching A or B → E becomes empty

Done ✅

Result = {A, B}

Note: {A} is a better (smaller) solution, but {A, B} is not more than 2 times worse. That’s acceptable in
approximation.

How Good is This Algorithm?


 The size of the solution is never more than 2× the size of the minimum cover
 This is called an approximation ratio of 2
 It’s fast and works well for large graphs

Time & Space Complexity

Aspect Value
Time Complexity O(V + E)
Space Complexity O(V) (for visited array)

Set Cover Problem:

You might also like