Greedy Approximate Algorithm for Set Cover Problem
Last Updated :
14 Jun, 2023
Given a universe U of n elements, a collection of subsets of U say S = {S1, S2...,Sm} where every subset Si has an associated cost. Find a minimum cost subcollection of S that covers all elements of U.
Example:
U = {1,2,3,4,5}
S = {S1,S2,S3}
S1 = {4,1,3}, Cost(S1) = 5
S2 = {2,5}, Cost(S2) = 10
S3 = {1,4,3,2}, Cost(S3) = 3
Output: Minimum cost of set cover is 13 and
set cover is {S2, S3}
There are two possible set covers {S1, S2} with cost 15
and {S2, S3} with cost 13.
Why is it useful?
It was one of Karp’s NP-complete problems, shown to be so in 1972. Other applications: edge covering, vertex cover Interesting example: IBM finds computer viruses (wikipedia) Elements- 5000 known viruses Sets- 9000 substrings of 20 or more consecutive bytes from viruses, not found in ‘good’ code. A set cover of 180 was found. It suffices to search for these 180 substrings to verify the existence of known computer viruses.
Another example:
Consider General Motors needs to buy a certain amount of varied supplies and there are suppliers that offer various deals for different combinations of materials (Supplier A: 2 tons of steel + 500 tiles for $x; Supplier B: 1 ton of steel + 2000 tiles for $y; etc.). You could use set covering to find the best way to get all the materials while minimizing cost
Source: http://math.mit.edu/~goemans/18434S06/setcover-tamara.pdf
Set Cover is NP-Hard: There is no polynomial time solution available for this problem as the problem is a known NP-Hard problem. There is a polynomial time Greedy approximate algorithm, the greedy algorithm provides a Logn approximate algorithm.
2-Approximate Greedy Algorithm: Let U be the universe of elements, {S1, S2, ... Sm} be collection of subsets of U and Cost(S1), C(S2), ... Cost(Sm) be costs of subsets.
1) Let I represents set of elements included so far. Initialize I = {}
2) Do following while I is not same as U.
a) Find the set Si in {S1, S2, ... Sm} whose cost effectiveness is
smallest, i.e., the ratio of cost C(Si) and number of newly added
elements is minimum.
Basically we pick the set for which following value is minimum.
Cost(Si) / |Si - I|
b) Add elements of above picked Si to I, i.e., I = I U Si
Example: Let us consider the above example to understand Greedy Algorithm.
First Iteration: I = {} The per new element cost for S1 = Cost(S1)/|S1 - I| = 5/3
The per new element cost for S2 = Cost(S2)/|S2 - I| = 10/2
The per new element cost for S3 = Cost(S3)/|S3 - I| = 3/4
Since S3 has minimum value S3 is added, I becomes {1,4,3,2}.
Second Iteration: I = {1,4,3,2} The per new element cost for S1 = Cost(S1)/|S1 - I| = 5/0 Note that S1 doesn't add any new element to I. The per new element cost for S2 = Cost(S2)/|S2 - I| = 10/1 Note that S2 adds only 5 to I. The greedy algorithm provides the optimal solution for above example, but it may not provide optimal solution all the time.
Consider the following example.
S1 = {1, 2}
S2 = {2, 3, 4, 5}
S3 = {6, 7, 8, 9, 10, 11, 12, 13}
S4 = {1, 3, 5, 7, 9, 11, 13}
S5 = {2, 4, 6, 8, 10, 12, 13}
Let the cost of every set be same.
The greedy algorithm produces result as {S3, S2, S1}
The optimal solution is {S4, S5}
Proof that the above greedy algorithm is Logn approximate.
Let OPT be the cost of optimal solution. Say (k-1) elements are covered before an iteration of above greedy algorithm. The cost of the k'th element <= OPT / (n-k+1) (Note that cost of an element is evaluated by cost of its set divided by number of elements added by its set). How did we get this result? Since k'th element is not covered yet, there is a Si that has not been covered before the current step of greedy algorithm and it is there in OPT. Since greedy algorithm picks the most cost effective Si, per-element-cost in the picked set must be smaller than OPT divided by remaining elements. Therefore cost of k'th element <= OPT/|U-I| (Note that U-I is set of not yet covered elements in Greedy Algorithm).
The value of |U-I| is n - (k-1) which is n-k+1.
Cost of Greedy Algorithm = Sum of costs of n elements
[putting k = 1, 2..n in above formula]
<= (OPT/n + OPT(n-1) + ... + OPT/n)
<= OPT(1 + 1/2 + ...... 1/n)
[Since 1 + 1/2 + .. 1/n ? Log n]
<= OPT * Logn
Source: http://math.mit.edu/~goemans/18434S06/setcover-tamara.pdf
The Set Cover problem is a classic NP-hard problem that involves finding the minimum number of sets that cover all elements in a given universe. In other words, given a universe U and a collection S of subsets of U, the Set Cover problem is to find a subset C of S such that every element in U is contained in at least one set in C and the size of C is minimized.
One approach to solving the Set Cover problem is to use a greedy algorithm, which iteratively selects the set that covers the most uncovered elements until all elements are covered. Here's how the greedy algorithm works:
Initialize an empty set C to be the cover.
While there are uncovered elements:
a. Select the set S that covers the most uncovered elements.
b. Add S to C.
c. Remove all covered elements from the set of uncovered elements.
Return C as the cover.
This algorithm provides an approximate solution to the Set Cover problem. The approximation factor is ln(n), where n is the number of elements in the universe U. In other words, the greedy algorithm will always find a cover that is at most ln(n) times larger than the optimal cover.
Advantages:
- The greedy algorithm is simple and easy to implement.
- It runs in polynomial time, with a time complexity of O(nm), where n is the number of elements in U and m is the number of sets in S.
- The approximation factor of ln(n) is a proven guarantee, so we know that the solution is at most ln(n) times larger than the optimal solution.
Disadvantages:
- The greedy algorithm may not always find the optimal solution, so it is only an approximation algorithm.
- The greedy algorithm relies heavily on the initial ordering of the sets, which can affect the quality of the solution.
- The approximation factor of ln(n) can still be large, especially for small values of n.
Similar Reads
Greedy Algorithms
Greedy algorithms are a class of algorithms that make locally optimal choices at each step with the hope of finding a global optimum solution. At every step of the algorithm, we make a choice that looks the best at the moment. To make the choice, we sometimes sort the array so that we can always get
3 min read
Greedy Algorithm Tutorial
Greedy is an algorithmic paradigm that builds up a solution piece by piece, always choosing the next piece that offers the most obvious and immediate benefit. Greedy algorithms are used for optimization problems. An optimization problem can be solved using Greedy if the problem has the following pro
9 min read
Greedy Algorithms General Structure
A greedy algorithm solves problems by making the best choice at each step. Instead of looking at all possible solutions, it focuses on the option that seems best right now.Example of Greedy Algorithm - Fractional KnapsackProblem structure:Most of the problems where greedy algorithms work follow thes
5 min read
Difference between Greedy Algorithm and Divide and Conquer Algorithm
Greedy algorithm and divide and conquer algorithm are two common algorithmic paradigms used to solve problems. The main difference between them lies in their approach to solving problems. Greedy Algorithm:The greedy algorithm is an algorithmic paradigm that follows the problem-solving heuristic of m
3 min read
Greedy Approach vs Dynamic programming
Greedy approach and Dynamic programming are two different algorithmic approaches that can be used to solve optimization problems. Here are the main differences between these two approaches: Greedy Approach:The greedy approach makes the best choice at each step with the hope of finding a global optim
2 min read
Comparison among Greedy, Divide and Conquer and Dynamic Programming algorithm
Greedy algorithm, divide and conquer algorithm, and dynamic programming algorithm are three common algorithmic paradigms used to solve problems. Here's a comparison among these algorithms:Approach:Greedy algorithm: Makes locally optimal choices at each step with the hope of finding a global optimum.
4 min read
Standard Greedy algorithms
Activity Selection Problem | Greedy Algo-1
Given n activities with their start and finish times given in array start[] and finish[]. Select the maximum number of activities that can be performed by a single person, assuming that a person can only work on a single activity at a time. Note: Duration of the activity includes both starting and f
13 min read
Job Sequencing Problem
Given two arrays: deadline[] and profit[], where the index of deadline[] represents a job ID, and deadline[i] denotes the deadline for that job and profit[i] represents profit of doing ith job. Each job takes exactly one unit of time to complete, and only one job can be scheduled at a time. A job ea
13 min read
Huffman Coding | Greedy Algo-3
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding characters. The variable-length codes assigned to input characters are Prefix Codes, means the codes (
12 min read
Huffman Decoding
We have discussed Huffman Encoding in a previous post. In this post, decoding is discussed. Examples: Input Data: AAAAAABCCCCCCDDEEEEEFrequencies: A: 6, B: 1, C: 6, D: 2, E: 5 Encoded Data: 0000000000001100101010101011111111010101010 Huffman Tree: '#' is the special character usedfor internal nodes
15 min read
Water Connection Problem
You are given n houses in a colony, numbered from 1 to n, and p pipes connecting these houses. Each house has at most one outgoing pipe and at most one incoming pipe. Your goal is to install tanks and taps efficiently.A tank is installed at a house that has one outgoing pipe but no incoming pipe.A t
8 min read
Greedy Algorithm for Egyptian Fraction
Every positive fraction can be represented as sum of unique unit fractions. A fraction is unit fraction if numerator is 1 and denominator is a positive integer, for example 1/3 is a unit fraction. Such a representation is called Egyptian Fraction as it was used by ancient Egyptians. Following are a
11 min read
Policemen catch thieves
Given an array arr, where each element represents either a policeman (P) or a thief (T). The objective is to determine the maximum number of thieves that can be caught under the following conditions:Each policeman (P) can catch only one thief (T).A policeman can only catch a thief if the distance be
12 min read
Fitting Shelves Problem
Given length of wall w and shelves of two lengths m and n, find the number of each type of shelf to be used and the remaining empty space in the optimal solution so that the empty space is minimum. The larger of the two shelves is cheaper so it is preferred. However cost is secondary and first prior
9 min read
Assign Mice to Holes
There are N Mice and N holes are placed in a straight line. Each hole can accommodate only 1 mouse. A mouse can stay at his position, move one step right from x to x + 1, or move one step left from x to x -1. Any of these moves consumes 1 minute. Assign mice to holes so that the time when the last m
8 min read
Greedy algorithm on Array
Minimum product subset of an array
INTRODUCTION: The minimum product subset of an array refers to a subset of elements from the array such that the product of the elements in the subset is minimized. To find the minimum product subset, various algorithms can be used, such as greedy algorithms, dynamic programming, and branch and boun
13 min read
Maximize array sum after K negations using Sorting
Given an array of size n and an integer k. We must modify array k number of times. In each modification, we can replace any array element arr[i] by -arr[i]. The task is to perform this operation in such a way that after k operations, the sum of the array is maximum.Examples : Input : arr[] = [-2, 0,
10 min read
Minimum sum of product of two arrays
Find the minimum sum of Products of two arrays of the same size, given that k modifications are allowed on the first array. In each modification, one array element of the first array can either be increased or decreased by 2.Examples: Input : a[] = {1, 2, -3} b[] = {-2, 3, -5} k = 5 Output : -31 Exp
14 min read
Minimum sum of absolute difference of pairs of two arrays
Given two arrays a[] and b[] of equal length n. The task is to pair each element of array a to an element in array b, such that sum S of absolute differences of all the pairs is minimum.Suppose, two elements a[i] and a[j] (i != j) of a are paired with elements b[p] and b[q] of b respectively, then p
7 min read
Minimum increment/decrement to make array non-Increasing
Given an array a, your task is to convert it into a non-increasing form such that we can either increment or decrement the array value by 1 in the minimum changes possible. Examples : Input : a[] = {3, 1, 2, 1}Output : 1Explanation : We can convert the array into 3 1 1 1 by changing 3rd element of a
11 min read
Sorting array with reverse around middle
Consider the given array arr[], we need to find if we can sort array with the given operation. The operation is We have to select a subarray from the given array such that the middle element(or elements (in case of even number of elements)) of subarray is also the middle element(or elements (in case
6 min read
Sum of Areas of Rectangles possible for an array
Given an array, the task is to compute the sum of all possible maximum area rectangles which can be formed from the array elements. Also, you can reduce the elements of the array by at most 1. Examples: Input: a = {10, 10, 10, 10, 11, 10, 11, 10} Output: 210 Explanation: We can form two rectangles o
13 min read
Largest lexicographic array with at-most K consecutive swaps
Given an array arr[], find the lexicographically largest array that can be obtained by performing at-most k consecutive swaps. Examples : Input : arr[] = {3, 5, 4, 1, 2} k = 3 Output : 5, 4, 3, 2, 1 Explanation : Array given : 3 5 4 1 2 After swap 1 : 5 3 4 1 2 After swap 2 : 5 4 3 1 2 After swap 3
9 min read
Partition into two subsets of lengths K and (N - k) such that the difference of sums is maximum
Given an array of non-negative integers of length N and an integer K. Partition the given array into two subsets of length K and N - K so that the difference between the sum of both subsets is maximum. Examples : Input : arr[] = {8, 4, 5, 2, 10} k = 2 Output : 17 Explanation : Here, we can make firs
7 min read
Greedy algorithm on Operating System
Program for First Fit algorithm in Memory Management
Prerequisite : Partition Allocation MethodsIn the first fit, the partition is allocated which is first sufficient from the top of Main Memory.Example : Input : blockSize[] = {100, 500, 200, 300, 600}; processSize[] = {212, 417, 112, 426};Output:Process No. Process Size Block no. 1 212 2 2 417 5 3 11
8 min read
Program for Best Fit algorithm in Memory Management
Prerequisite : Partition allocation methodsBest fit allocates the process to a partition which is the smallest sufficient partition among the free available partitions. Example: Input : blockSize[] = {100, 500, 200, 300, 600}; processSize[] = {212, 417, 112, 426}; Output: Process No. Process Size Bl
8 min read
Program for Worst Fit algorithm in Memory Management
Prerequisite : Partition allocation methodsWorst Fit allocates a process to the partition which is largest sufficient among the freely available partitions available in the main memory. If a large process comes at a later stage, then memory will not have space to accommodate it. Example: Input : blo
8 min read
Program for Shortest Job First (or SJF) CPU Scheduling | Set 1 (Non- preemptive)
The shortest job first (SJF) or shortest job next, is a scheduling policy that selects the waiting process with the smallest execution time to execute next. SJN, also known as Shortest Job Next (SJN), can be preemptive or non-preemptive. Â Characteristics of SJF Scheduling: Shortest Job first has th
13 min read
Job Scheduling with two jobs allowed at a time
Given a 2d array jobs[][] of order n * 2, where each element jobs[i], contains two integers, representing the start and end time of the job. Your task is to check if it is possible to complete all the jobs, provided that two jobs can be done simultaneously at a particular moment. Note: If a job star
6 min read
Optimal Page Replacement Algorithm
In operating systems, whenever a new page is referred and not present in memory, page fault occurs, and Operating System replaces one of the existing pages with newly needed page. Different page replacement algorithms suggest different ways to decide which page to replace. The target for all algorit
3 min read
Greedy algorithm on Graph