Unit 34 Algo
Unit 34 Algo
Dijkstra algorithm:
Given a weighted undirected graph represented as an edge list and a source vertex src, find the shortest path
distances from the source vertex to all other vertices in the graph. The graph contains V vertices, numbered
from 0 to V - 1.
Note: The given graph does not contain any negative edge.
In Dijkstra's Algorithm, the goal is to find the shortest distance from a given source node to all other nodes
in the graph. As the source node is the starting point, its distance is initialized to zero. From there, we
iteratively pick the unprocessed node with the minimum distance from the source, this is where a
min-heap (priority queue) or a set is typically used for efficiency. For each picked node u, we update the
distance to its neighbors v using the formula: dist[v] = dist[u] + weight[u][v], but only if this new path
offers a shorter distance than the current known one. This process continues until all nodes have been
processed.
Step-by-Step Implementation
1. Set dist[source]=0 and all other distances as infinity.
2. Push the source node into the min heap as a pair <distance, node> → i.e., <0, source>.
3. Pop the top element (node with the smallest distance) from the min heap.
5. Return the distance array, which holds the shortest distance from the source to all nodes.
Time Complexity: O(E*logV), Where E is the number of edges and V is the number of vertices.
Auxiliary Space: O(V), Where V is the number of vertices,
Illustration:
Bellman ford Algorithm:
Given a weighted graph with V vertices and E edges, along with a source vertex src, the task is to compute the
shortest distances from the source to all other vertices. If a vertex is unreachable from the source, its distance
should be marked as 108. In the presence of a negative weight cycle, return -1 to signify that shortest path
calculations are not feasible.
A negative weight cycle is a cycle in a graph, whose sum of edge weights is negative. If you traverse the cycle, the
total weight accumulated would be less than zero.
In the presence of negative weight cycle in the graph, the shortest path doesn't exist because with each traversal
of the cycle shortest path keeps decreasing.
Since, we need to find the single source shortest path, we might initially think of using Dijkstra's algorithm.
However, Dijkstra is not suitable when the graph consists of negative edges. The reason is, it doesn't revisit those
nodes which have already been marked as visited. If a shorter path exists through a longer route with negative
edges, Dijkstra's algorithm will fail to handle it.
Relaxation means updating the shortest distance to a node if a shorter path is found through another node.
For an edge (u, v) with weight w:
o If going through u gives a shorter path to v from the source node (i.e., distance[v] > distance[u] + w), we
update the distance[v] as distance[u] + w.
In the bellman-ford algorithm, this process is repeated (V - 1) times for all the edges.
A shortest path between two vertices can have at most (V - 1) edges. It is not possible to have a simple path with
more than (V - 1) edges (otherwise it would form a cycle). Therefore, repeating the relaxation process (V - 1)
times ensures that all possible paths between source and any other node have been covered.
As we have discussed earlier that, we need (V - 1) relaxations of all the edges to achieve single source shortest
path. If one additional relaxation (Vth) for any edge is possible, it indicates that some edges with overall
negative weight has been traversed once more. This indicates the presence of a negative weight cycle in the
graph.
Bellman-Ford is a single source shortest path algorithm. It effectively works in the cases of negative edges
and is able to detect negative cycles as well. It works on the principle of relaxation of the edges.
Floyd warshall algo: (Dynamic Programming approach)
Given a matrix dist[][] of size n x n, where dist[i][j] represents the weight of the edge from node i to node j. If
there is no direct edge, dist[i][j] is set to a large value (e.g., 10⁸) to represent infinity. The diagonal
entries dist[i][i] are 0, since the distance from a node to itself is zero. The graph may contain negative edge
weights, but it does not contain any negative weight cycles.
Your task is to determine the shortest path distance between all pair of nodes i and j in the graph.
The Floyd–Warshall algorithm works by maintaining a two-dimensional array that represents the distances
between nodes. Initially, this array is filled using only the direct edges between nodes. Then, the algorithm
gradually updates these distances by checking if shorter paths exist through intermediate nodes.
This algorithm works for both the directed and undirected weighted graphs and can handle graphs with
both positive and negative weight edges.
Note: It does not work for the graphs with negative cycles (where the sum of the edges in a cycle is negative).
Why Floyd-Warshall Algorithm better for Dense Graphs and not for Sparse Graphs?
Dense Graph: A graph in which the number of edges are significantly much higher than the number of
vertices.
Sparse Graph: A graph in which the number of edges are very much low.
No matter how many edges are there in the graph the Floyd Warshall Algorithm runs for O(V3) times
therefore it is best suited for Dense graphs. In the case of sparse graphs, Johnson's Algorithm is more
suitable.
Step-by-step implementation
Start by updating the distance matrix by treating each vertex as a possible intermediate node between all
pairs of vertices.
Iterate through each vertex, one at a time. For each selected vertex k, attempt to improve the shortest paths
that pass through it.
When we pick vertex number k as an intermediate vertex, we already have considered vertices {0, 1, 2, ..
k-1} as intermediate vertices.
For every pair (i, j) of the source and destination vertices respectively, there are two possible cases.
o k is not an intermediate vertex in shortest path from i to j. We keep the value of dist[i][j] as it is.
o k is an intermediate vertex in shortest path from i to j. We update the value of dist[i][j] as dist[i][k] +
dist[k][j], if dist[i][j] > dist[i][k] + dist[k][j]
Repeat this process for each vertex k until all intermediate possibilities have been considered.
Illustration:
Transitive closure of a graph :
Given a directed graph, determine if a vertex j is reachable from another vertex i for all vertex pairs (i, j) in the given
graph. Here reachable means that there is a path from vertex i to j. The reach-ability matrix is called the transitive
closure of a graph.
Example:
Input: Graph = [[0, 1, 1, 0], [0, 0, 1, 0], [1, 0, 0, 1], [0, 0, 0, 0]]
Approach:
The idea is to use the Floyd-Warshall algorithm to find the transitive closure of a directed graph. We
start by initializing a result matrix with the original adjacency matrix and set all diagonal elements to
1 (since every vertex can reach itself). Then for each intermediate vertex k, we check if vertex i can
reach vertex j either directly or through vertex k. If a path exists from i to k and from k to j, then we
mark that i can reach j in our result matrix.
Step by step approach:
1. Initialize result matrix with input graph and set all diagonal elements to 1.
2. For each intermediate vertex, check all possible vertex pairs.
3. If vertex i can reach intermediate vertex and intermediate vertex can reach vertex j, mark that i can reach j. This
iteratively builds paths of increasing length through intermediate vertices.
4. Return the final matrix where a value of 1 indicates vertex i can reach vertex j.
We have discussed an O(V3) solution for this here. The solution was based on Floyd Warshall Algorithm. In this
post, DFS solution is discussed. So for dense graph, it would become O(V3) and for sparse graph, it would become
O(V2).
Below are the abstract steps of the algorithm.
Create a matrix tc[V][V] that would finally have transitive closure of the given graph. Initialize all entries of
tc[][] as 0.
Call DFS for every node of the graph to mark reachable vertices in tc[][]. In recursive calls to DFS, we don't call
DFS for an adjacent vertex if it is already marked as reachable in tc[][].
The code uses adjacency list representation of input graph and builds a matrix tc[V][V] such that tc[u][v] would
be true if v is reachable from u.
Topological Sorting :
Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices such that for every directed
edge u-v, vertex u comes before v in the ordering.
Note: Topological Sorting for a graph is not possible if the graph is not a DAG.
Example:
Input: V = 6, edges = [[2, 3], [3, 1], [4, 0], [4, 1], [5, 0], [5, 2]]
Output: 5 4 2 3 1 0
Explanation: The first vertex in topological sorting is always a vertex with an in-degree of 0 (a vertex with
no incoming edges). A topological sorting of the following graph is "5 4 2 3 1 0". There can be more than
one topological sorting for a graph. Another topological sorting of the following graph is "4 5 2 3 1 0".
In DFS, we print a vertex and then recursively call DFS for its adjacent vertices. In topological sorting, we need to
print a vertex before its adjacent vertices.
For example, In the above given graph, the vertex '5' should be printed before vertex '0', but unlike DFS, the
vertex '4' should also be printed before vertex '0'. So Topological sorting is different from DFS. For example, a DFS
of the shown graph is "5 2 3 1 0 4", but it is not a topological sorting.
DAGs are a special type of graphs in which each edge is directed such that no cycle exists in the graph, before
understanding why Topological sort only exists for DAGs, lets first answer two questions:
Why Topological Sort is not possible for graphs with undirected edges?
This is due to the fact that undirected edge between two vertices u and v means, there is an edge
from u to v as well as from v to u. Because of this both the nodes u and v depend upon each other
and none of them can appear before the other in the topological ordering without creating a
contradiction.
Why Topological Sort is not possible for graphs having cycles?
Imagine a graph with 3 vertices and edges = {1 to 2 , 2 to 3, 3 to 1} forming a cycle. Now if we try
to topologically sort this graph starting from any vertex, it will always create a contradiction to our
definition. All the vertices in a cycle are indirectly dependent on each other hence topological
sorting fails.
Topological order may not be Unique: Topological sorting is a dependency problem in which completion of one
task depends upon the completion of several other tasks whose order can vary.
Algorithm for Topological Sorting using DFS:
Here’s a step-by-step algorithm for topological sorting using Depth First Search (DFS):
Create a graph with n vertices and m-directed edges.
Initialize a stack and a visited array of size n.
For each unvisited vertex in the graph, do the following:
o Call the DFS function with the vertex as the parameter.
o In the DFS function, mark the vertex as visited and recursively call the DFS function for all unvisited
neighbors of the vertex.
o Once all the neighbors have been visited, push the vertex onto the stack.
After all, vertices have been visited, pop elements from the stack and append them to the output list until the
stack is empty.
The resulting list is the topologically sorted order of the graph.
Time Complexity: O(V+E). The above algorithm is simply DFS with an extra stack. So time complexity is the same
as DFS.
Auxiliary space: O(V). due to creation of the stack.
The BFS based algorithm for Topological Sort is called Kahn's Algorithm. The Kahn's algorithm has same time
complexity as the DFS based algorithm discussed above.
Kahn's Algorithm works by repeatedly finding vertices with no incoming edges, removing them from the
graph, and updating the incoming edges of the vertices connected from the removed removed edges. This
process continues until all vertices have been ordered.
Algorithm:
To find the in-degree of each node by initially calculating the number of incoming edges to each
node. Iterate through all the edges in the graph and increment the in-degree of the destination
node for each edge. This way, you can determine the in-degree of each node before starting the
sorting process.
Only applicable to directed acyclic graphs (DAGs), not suitable for cyclic graphs.
May not be unique, multiple valid topological orderings can exist.
The max flow problem is a classic optimization problem in graph theory that involves finding the maximum
amount of flow that can be sent through a network of pipes, channels, or other pathways, subject to capacity
constraints. The problem can be used to model a wide variety of real-world situations, such as transportation
systems, communication networks, and resource allocation.
In the max flow problem, we have a directed graph with a source node s and a sink node t, and each edge has a
capacity that represents the maximum amount of flow that can be sent through it. The goal is to find the
maximum amount of flow that can be sent from s to t, while respecting the capacity constraints on the edges.
One common approach to solving the max flow problem is the Ford-Fulkerson algorithm, which is based on the
idea of augmenting paths. The algorithm starts with an initial flow of zero, and iteratively finds a path from s to t
that has available capacity, and then increases the flow along that path by the maximum amount possible. This
process continues until no more augmenting paths can be found.
Another popular algorithm for solving the max flow problem is the Edmonds-Karp algorithm, which is a variant of
the Ford-Fulkerson algorithm that uses breadth-first search to find augmenting paths, and thus can be more
efficient in some cases.
Ford-Fulkerson Algorithm
The Ford-Fulkerson algorithm is a widely used algorithm to solve the maximum flow problem in a flow network.
The maximum flow problem involves determining the maximum amount of flow that can be sent from a source
vertex to a sink vertex in a directed weighted graph, subject to capacity constraints on the edges.
The algorithm works by iteratively finding an augmenting path, which is a path from the source to the sink in the
residual graph, i.e., the graph obtained by subtracting the current flow from the capacity of each edge. The
algorithm then increases the flow along this path by the maximum possible amount, which is the minimum
capacity of the edges along the path.
Time Complexity : O(|V| * E^2) ,where E is the number of edges and V is the number of vertices.
Space Complexity :O(V) , as we created queue.
The above implementation of Ford Fulkerson Algorithm is called Edmonds-Karp Algorithm. The idea of
Edmonds-Karp is to use BFS in Ford Fulkerson implementation as BFS always picks a path with minimum number
of edges. When BFS is used, the worst case time complexity can be reduced to O(VE2).
(MST And BFS/DFS copy me)
Unit 4
Tractable Problems:
A problem is considered tractable if there exists an algorithm that can solve it in polynomial time (O(n^k), where k
is a constant). This means the time required to solve the problem grows at most proportionally to a power of the
input size.
Characteristics:
Tractable problems are considered "easy" to solve because they can be solved in a reasonable amount of time,
even with large datasets. These are the "easy" problems — not necessarily easy in real life, but easy in terms of
time complexity.
Such problems belong to the complexity class P.
Intractable Problems:
A problem is considered intractable if there is no known algorithm that can solve it in polynomial time. This means
the time required to solve the problem grows exponentially with the input size.
Characteristics:
Intractable problems are considered "hard" to solve because the time required to find a solution becomes
prohibitively long as the input size increases.
These are the "hard" problems — no efficient solution is known for large inputs.
These often belong to classes like NP, NP-Complete, NP-Hard
Problems that “can be” solved but the amount of time it takes to solve is too large.
Can be solved in reasonable time only for small inputs. Or, can not be solved at all.
As their input grows large, we are unable to solve them in reasonable time.
Real-World Analogy
Tractable problem: You follow a known algorithm; it finishes quickly (like solving with steps).
Intractable problem: You randomly try all possible moves — takes a huge amount of time.
In simple terms: "Can we design an algorithm that will always give the correct result in a finite amount of time for
every valid input?"
What Is an Algorithm?
An algorithm is a step-by-step procedure or set of rules to perform a task or solve a problem. It must be:
Type Description
Problems for which an algorithm exists that can solve every instance of the problem in a finite
Decidable Problems
amount of time.
Undecidable
Problems for which no algorithm exists that can solve all cases correctly in finite time.
Problems
Examples:
These are problems for which no algorithm exists that will work for all inputs.
Given a program and an input, determine whether the program will halt (stop) or run forever on that input.
Alan Turing proved that it is impossible to create a general algorithm that solves this problem for all programs and
inputs.
The halting problem is a fundamental issue in theory and computation. The problem is to determine whether a
computer program will halt or run forever.
Definition: The Halting Problem asks whether a given program or algorithm will eventually halt (terminate) or
continue running indefinitely for a particular input. "Halting" means that the program will either accept or reject
the input and then terminate, rather than going into an infinite loop.
The Core Question: Can we create an algorithm that determines whether any given program will halt for a
specific input?
Answer: No, it is impossible to design a generalized algorithm that can accurately determine whether any
arbitrary program will halt. The only way to know if a specific program halts is to run it and observe the outcome.
This makes the Halting Problem an undecidable problem.
We can refrain the halting problem question in such a way also: Given a program written in any programming
language (C/C++, Java, etc.), will it enter an infinite loop or will it terminate? This question cannot be answered
by a general algorithm, making it undecidable.
A Turing Machine is a mathematical model of computation invented by Alan Turing. It is used to formally define the
concept of algorithm and computability.
Turing machines are an important tool for studying the limits of computation and for understanding the
foundations of computer science. They provide a simple yet powerful model of computation that has been
widely used in research and has had a profound impact on our understanding of algorithms and computation.
While one might consider using programming languages like C to study computation, Turing Machines are
preferred because:
They are simpler to analyze.
They provide a clear, mathematical model of computation.
They possess infinite memory, making them even more powerful than real-world computers.
A Turing Machine consists of a tape of infinite length on which read and writes operation can be performed. The
tape consists of infinite cells on which each cell either contains input symbol or a special symbol called blank. It
also consists of a head pointer which points to cell currently being read and it can move in both directions.
Turing-Computable:
This means all modern computers are equivalent in power to a Turing machine — though faster, they cannot
compute problems a Turing Machine can’t solve.
Implications of Computability
Category Description
Intractable Problem Solvable, but takes too long (e.g., exponential time).
Undecidable Problem Not solvable by any algorithm, regardless of time.
All undecidable problems are intractable, but not all intractable problems are undecidable.
Deterministic computation always produces the same output for a given input when started from the same initial
state. There is no randomness involved—if the computation process and input do not change, the outcome will
always be the same.
One of the primary models for deterministic computation is the Deterministic Turing Machine (DTM).
The Class P
In algorithmic analysis, if a problem can be solved in polynomial time by a deterministic Turing machine, it is
classified under the Class P.
To address certain computational problems more efficiently, another model called the Nondeterministic Turing
Machine (NDTM) is used.
This allows the machine to "guess" a solution path, enabling it to explore multiple computation paths
simultaneously.
The Class NP
If a problem can be solved in polynomial time by a nondeterministic Turing machine, it belongs to the Class NP.
In computer science, problems are divided into classes known as Complexity Classes. In complexity theory, a
Complexity Class is a set of problems with related complexity. With the help of complexity theory, we try to cover
the following.
Problems that cannot be solved by computers.
Problems that can be efficiently solved (solved in Polynomial time) by computers.
Problems for which no efficient solution (only exponential time algorithms) exist.
The common resources required by a solution are are time and space, meaning how much time the algorithm
takes to solve a problem and the corresponding memory usage.
The time complexity of an algorithm is used to describe the number of steps required to solve a problem, but it
can also be used to describe how long it takes to verify the answer.
The space complexity of an algorithm describes how much memory is required for the algorithm to operate.
An algorithm having time complexity of the form O(nk) for input n and constant k is called polynomial time
solution. These solutions scale well. On the other hand, time complexity of the form O(kn) is exponential time.
Complexity classes are useful in organizing similar types of problems.
P stands for problems that are easy to solve — your computer can solve them quickly, even for large inputs.The P in
the P class stands for Polynomial Time. It is the collection of decision problems(problems with a "yes" or "no"
answer) that can be solved by a deterministic machine (our computers) in polynomial time.
Key Idea: If you can solve it fast (in polynomial time, like n² or n³), it’s in P.
Example:
Sorting a list
Checking if a number is even or odd
Finding the shortest path using Dijkstra’s Algorithm
NP means: You might not know how to solve the problem quickly, but if someone gives you the answer, you can
verify it quickly.The NP in NP class stands for Non-deterministic Polynomial Time. It is the collection of decision
problems that can be solved by a non-deterministic machine (note that our computers are deterministic) in
polynomial time we aren't asking for a way to find a solution, but only to verify that an alleged solution really is
correct.Every problem in this class can be solved in exponential time using exhaustive search.
Features:
The solutions of the NP class might be hard to find since they are being solved by a non-deterministic machine
but the solutions are easy to verify.
Problems of NP can be verified by a deterministic machine in polynomial time.
Example:
Let us consider an example to better understand the NP class. Suppose there is a company having a total
of 1000 employees having unique employee IDs. Assume that there are 200 rooms available for them. A selection
of 200 employees must be paired together, but the CEO of the company has the data of some employees who
can't work in the same room due to personal reasons.
This is an example of an NP problem. Since it is easy to check if the given choice of 200 employees proposed by a
coworker is satisfactory or not i.e. no pair taken from the coworker list appears on the list given by the CEO. But
generating such a list from scratch seems to be so hard as to be completely impractical.
It indicates that if someone can provide us with the solution to the problem, we can find the correct and incorrect
pair in polynomial time. Thus for the NP class problem, the answer is possible, which can be calculated in
polynomial time.
Example:
Sudoku puzzle:
Solving is hard.
But if I show you a filled Sudoku, you can check if it’s correct easily.
You can check the answer fast, but solving may take a long time.
Question Answer
Is P a subset of NP? ✅ Yes. Because if you can solve it fast, you can also check it fast.
Is NP a subset of P? ❓ We don't know. This is the famous P vs NP problem.
3. Co-NP — Opposite of NP
Co-NP includes problems where you can easily verify a "no" answer, instead of a "yes" one.Co-NP stands for the
complement of NP Class. It means if the answer to a problem in Co-NP is No, then there is proof that can be
checked in polynomial time.
Features:
If a problem X is in NP, then its complement X' is also in CoNP.
For an NP and CoNP problem, there is no need to verify all the answers at once in polynomial time, there is a
need to verify only one particular answer "yes" or "no" in polynomial time for a problem to be in NP or CoNP.
Example:
A problem is NP-complete if it is both NP and NP-hard. NP-complete problems are the hard problems in NP.
This means:
Features:
NP-complete problems are special as any problem in NP class can be transformed or reduced into NP-complete
problems in polynomial time.
If one could solve an NP-complete problem in polynomial time, then one could also solve any NP problem in
polynomial time.
Example:
An NP-hard problem is at least as hard as the hardest problem in NP and it is a class of problems such that every
problem in NP reduces to NP-hard.
Features:
All NP-hard problems are not in NP.
It takes a long time to check them. This means if a solution for an NP-hard problem is given then it takes a long
time to check whether it is right or not.
A problem A is in NP-hard if, for every problem L in NP, there exists a polynomial-time reduction from L to A.
Some might not even have a "yes/no" answer (not decision problems).
Example
A problem is in the class NPC if it is in NP and is as hard as any problem in NP. A problem is NP-hard if all problems
in NP are polynomial time reducible to it, even though it may not be in NP itself.
NP-hard
If a polynomial time algorithm exists for any of these problems, all problems in NP would be polynomial time
solvable. These problems are called NP-complete. The phenomenon of NP-completeness is important for both
theoretical and practical reasons.
Definition of NP-Completeness
A language B is NP-complete if it satisfies two conditions
B is in NP
If a language satisfies the second property, but not necessarily the first one, the language B is known as NP-Hard.
Informally, a search problem B is NP-Hard if there exists some NP-Complete problem A that Turing reduces to B.
The problem in NP-Hard cannot be solved in polynomial time, until P = NP. If a problem is proved to be NPC, there is
no need to waste time on trying to find an efficient algorithm for it. Instead, we can focus on design approximation
algorithm.
Cook–Levin Theorem:
Cook’s Theorem is one of the most important results in computational complexity theory.
It says: “The Boolean Satisfiability Problem (SAT) is NP-Complete.”
This means:
Cook’s Theorem is a very important result in computer science, especially in the topic of P vs NP.
Stephen Cook (in 1971) – published the idea in his famous paper "The complexity of theorem-proving
procedures."
Leonid Levin (in 1973) – independently proved the same result in the Soviet Union.
So, it's also known as the Cook–Levin Theorem.
Later, Richard Karp extended this idea and showed that 21 famous problems (like Hamiltonian path, vertex
cover, clique) can also be reduced to SAT. So, they are also NP-Complete.
Given a Boolean expression (logic formula), is there any assignment of true/false values to variables that
makes the whole expression true?
Can you assign True/False values to A, B, and C so that the whole formula becomes True?
SAT asks: “Is there any combination of values that makes the formula true?” This is called Formula-SAT.
For n variables, you may need to try all 2ⁿ possible combinations to see if the expression becomes true.
This is called brute-force, and it takes exponential time, which is very slow as n increases.
Cook and Levin proved that SAT is as hard as any other problem in NP, so SAT is the first NP-Complete problem.
Cook's idea was simple but powerful: “If I can take any problem in NP and turn it into SAT (using polynomial time),
and SAT is in NP, then SAT is the most difficult of all NP problems.”
This is called a polynomial-time reduction.
Before Cook’s Theorem, we didn’t know which problems were the "hardest" in NP.
SAT is in NP
All other NP problems can be converted (reduced) to SAT in polynomial time
If you have a hard NP problem (like Sudoku), and you can turn it into a SAT problem,
then solving SAT would also solve Sudoku.
Cook showed that every problem in NP can be changed into an SAT problem.
“Every problem in NP can be reduced to SAT in polynomial time. So, SAT is NP-Complete.”
Variants of SAT
1. Circuit-SAT
You’re given a logic circuit made from AND, OR, NOT gates
You have to check: Is there any combination of inputs that makes the circuit output TRUE?
Needs 2ⁿ combinations → exponential time → hard problem
2. CNF-SAT
3. 3-CNF-SAT (3-SAT)
Reason Explanation
First NP-Complete Problem SAT was the first problem ever proven to be NP-Complete.
Foundation of Reductions Cook showed how to reduce all NP problems to SAT.
Led to More Discoveries Karp used it to show many other problems are also NP-Complete.
Basis of Complexity Theory Central to the famous question "Is P = NP?"
Some problems in computer science are very hard to solve exactly, especially when they belong to a class called
NP-Hard problems. This means that finding the perfect (optimal) solution takes too much time — possibly years or
centuries, even for a powerful computer.
So instead of finding the perfect answer, we try to find a "good enough" answer — and we do it quickly.
This is where Approximation Algorithms come in!
Definition: An Approximation Algorithm is an algorithm that gives near-optimal solutions to hard optimization
problems, usually in polynomial time.
So, you don't get the best solution, but you get a solution that's close enough, and you get it fast.
The exact solution checks all combinations of items, which takes a lot of time.
This way, you get a very good solution fast, even if it's not the best possible.
This tells us how close the approximation algorithm’s solution is to the optimal one.
Let:
For minimization: C / C*
For maximization: C* / C
If this ratio is close to 1, it means the solution is very close to optimal.
Example: If an algorithm has an approximation ratio of 1.5, it means the solution is at most 50% worse than the
best one.
Performance ratio shows how close the approximation is to the optimal solution.
Suppose:
max(C/C*,C*/C)≤P(n))
This means:
The approximation solution is at most P(n) times worse than the optimal.
0 < C < C*
Ratio = C* / C
(How much better the best is than the approximation)
0 < C* < C
Ratio = C / C*
(How much worse the approximation is than the best)
Advantages
Disadvantages
Greedy Algorithms
Local Search: Start with a solution and try to improve it step by step
Dynamic Programming Approximation: Modify DP algorithms to make them faster but slightly less accurate
Optimization Goal: Find the smallest number of vertices that cover all edges in a graph.
Approximation Goal: Find a small number of such vertices (not necessarily the smallest).
Optimization Goal: Find the shortest possible tour that visits every city once.
Approximation Goal: Find a tour that is close to the shortest.
Given numbers {x₁, x₂, x₃, ..., xₙ} and a target value t.
Optimization Goal: Find a subset whose sum is as large as possible but ≤ t.
Approximation Goal: Find a subset that is close to the target.
Every edge in the graph has at least one end (vertex) inside this set.
In other words, for every edge (u, v), either u is in the set or v is in the set (or both).
Even though it’s called “vertex cover,” it actually covers all edges by selecting the right vertices.
Goal of the Problem
Given a graph, the goal is to find the smallest possible vertex cover — i.e., use the minimum number of vertices to
cover all edges.
Example
A --- B
|
|
C
Edges:
A-B
A-C
That means:
Out of all the valid subsets, choose the one with the smallest number of vertices.
This is a fast algorithm that gives an answer close to the best, even if not perfect.
When no edges are left, return the Result set as the vertex cover
Example
A --- B
|
|
C
Done ✅
Result = {A, B}
Note: {A} is a better (smaller) solution, but {A, B} is not more than 2 times worse. That’s acceptable in
approximation.
Aspect Value
Time Complexity O(V + E)
Space Complexity O(V) (for visited array)