Greedy Note
Greedy Note
An algorithm which always takes the best immediate, or local, solution while finding an answer. Greedy algorithms will always find the overall, or globally, optimal solution for some optimization problems, but may find less-than-optimal solutions for some instances of other problems.
A greedy algorithm always makes the choice that looks best at the moment. It makes a local optimal choice in the hope that this choice will lead to a globally optimal solution. Greedy algorithms yield optimal solutions for many (but not all) problems.
Alternatively, we could have fashioned our optimal substructure with a greedy choice in mind. Designing a greedy algorithm
1. Cast the optimization problem as one in which we make a choice and are left with one subproblem to solve. 2. Prove that there is always an optimal solution to the original problem that makes the greedy choice, so that the greedy choice is always safe. 3. Demonstrate that, having made the greedy choice , what remains is a subproblem with the property that if we combine an optimal solution to the subproblem with the greedy choice we have made, we arrive at an optimal solution to the original problem. No general way to tell if a greedy algorithm is optimal, but two key ingredients are: Greedy-choice property: A global optimal solution can be achieved by making a local optimal (optimal) choice. Optimal substructure: An optimal solution to the problem within its optimal solution to subproblem.
Greedy Algorithm
Start with a solution to a small subproblem Build up to a solution to the whole problem Make choices that look good in the short term Disadvantage: Greedy algorithms dont always work ( Short term solutions can be diastrous in the long term). Hard to prove correct Advantage: Greedy algorithm work fast when they work. Simple algorithm, easy to implement
Initially the set of chosen items is empty i.e., solution set. At each step o item will be added in a solution set by using selection function. o IF the set would no longer be feasible reject items under consideration (and is never consider again). o ELSE IF set is still feasible THEN add the current item.
Definitions of feasibility
A feasible set (of candidates) is promising if it can be extended to produce not merely a solution, but an optimal solution to the problem. In particular, the empty set is always promising why? (because an optimal solution always exists) Unlike Dynamic Programming, which solves the subproblems bottom-up, a greedy strategy usually progresses in a top-down fashion, making one greedy choice after another, reducing each problem to a smaller one.
Greedy-Choice Property
The "greedy-choice property" and "optimal substructure" are two ingredients in the problem that lend to a greedy strategy.
Greedy-Choice Property
It says that a globally optimal solution can be arrived at by making a locally optimal choice.
Activities i and j are compatible if the half-open internal [si, fi) and [sj, fj) do not overlap, that is, i and j are compatible if si fj and sj fi
The Activity Selection Problem -Consider the problem of scheduling activities into a room or auditorium over some period of time. Let X = the set of activities that have requested time in the auditorium where xi denotes the ith such request and it is delineated by its start and end times: xi = (si, fi)
Let S be the set of scheduled activities, and P the problem of scheduling the largest number of requests in the given time interval from T0 to Tf. P(xi) will be true if, after ordering (or selecting) according to a greedy choice property, the ith activity selected can be scheduled without causing a conflict with any previously scheduled activity. The first requirement is to determine an appropriate greedy-choice property and then to show that selection based solely upon this property will always yield an optimal solution. There are three possible bases for the greedy-choice property: 1. select according to earliest start time, 2. select according to earliest finish time, 3. select according to shortest duration. Although the third choice is intuitively appealing, the second choice -- select according to earliest finish time -- is the correct one. We need to prove that this is so.
Proof by contradiction -Suppose S is a maximal solution set and it does not contain the object with the earliest finish time x1. Then a maximal set S' can be formed by removing the first item from S and replacing it with x1. (where by first we mean the object in S with the earliest finish time.) No confilct with any of the other objects in S can occur since they must all start later than the end time of the object that was removed. Since nothing better can be done in scheduling the first event, the problem then becomes the task of scheduling activities in the reduced time interval from f1 to Tf (the interval from the end of the first activity to the end of the scheduling period) from among the remaining choices with start times later than f1. (Note that all of the activities that start before x1 finishes can be eliminated because at best all they can do is replace the activity x1 with one that ends later.) The problem now is the same one as before with a shorter time interval and a reduced set of requests, so we apply the same argument in showing that the non-overlapping activity with the earliest end time can be included in any maximal set for this sub-problem, and continue until the set of requests has been exhausted..
Part II.
1. 2. 3. 4. 5. 6. 7. 8.
n = length [s] A={i} j=1 for i = 2 to n do if si fj then A= AU{i} j=i return set A
Example
Activities 1 2 3 4 5 6 7 11 13 2 3 3 1 0 2 4 7 10 11 15 12
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Time
Activities 1 2 3 1 7
8 8 9
15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Time
Activities 1 2 3 4 5 6 7 1 2 3 3 4 Time 7 10 11 15 11 13 12
0 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Analysis
Part I requires O(n lg n) time (use merge of heap sort). Part II i.e line no.4 to 7 requires (n) time assuming that activities were already sorted in part I by their finish time. Proof that greedy solution is optimal
1. It is easy to see that there is an optimal solution to the problem that makes the greedy choice.
Proof of 1.
Let A be an optimal solution. Let activity 1 be the greedy choice. If 1 A the proof is done. If 1 A, we will show that A=A-{a}+{1} is another optimal solution that includes 1.
Let a be the activity with minimum finish time in A. Since activities are sorted by finishing time in the algorithm, f(1) f(a). If f(1) s(a) we could add 1 to A and it could not be optimal. So s(1) < f(a), and 1 and a overlap. Since f(1) f(a), if we remove a and add 1 we get another compatible solution A=A-{a}+{1} and |A|=|A| 2. If we combine the optimal solution of the remaining subproblem with the greedy choice we have an optimal solution to the original problem.
Proof of 2.
Let activity 1 be the greedy choice. Let S be the subset of activities that do not overlap with 1. S={i|i =1,,n and si f(1)}. Let B be an optimal solution for S. From the definition of S, A={1}+B is compatible, and a solution to the original problem.
2. If we combine the optimal solution of the remaining subproblem with the greedy choice we
have an optimal solution to the original problem. Proof of 2 continued. The proof is by contradiction. Assume that A is not an optimal solution to the original problem. Let A be an optimal solution that contains 1. So |A|<|A|, and |A-{1}|>|A-{1}|=|B|. But A-{1} is also a solution to the problem of S, contradicting the assumption that B is an optimal solution to S.
Knapsack problem
Given some items, pack the knapsack to get the maximum total value. Each item has some weight and some value. Total weight that we can carry is no more than some fixed number W. So we must consider weights of items as well as their values.
Statement
A thief robbing a store and can carry a maximal weight of w into their knapsack. There are n items and ith item weigh wi and profit is worth pi dollars. What items should thief take?
Fractional knapsack problem The setup is same, but the thief can take fractions of items, meaning that the items can be broken into smaller pieces so that thief may decide to carry only a fraction of xi of item i, where 0 xi 1. Exhibit greedy choice property.
????? 0-1 knapsack problem The setup is the same, but the items may not be broken into smaller pieces, so thief may decide either to take an item or to leave it (binary choice), but may not take a fraction of an item.
The Fractional Knapsack Problem (By Greedy Method) There are n items in a store. For i =1,2, . . . , n, item i has weight wi > 0 and profit pi > 0. Thief can carry a maximum weight of M pounds in a knapsack. In this version of a problem the items can be broken into smaller piece, so the thief may decide to carry only a fraction xi of object i, where 0 xi 1. Item i contributes xiwi to the total weight in the knapsack, and xip i to the prpfit of the load. Given : We are given n objects and a knapsack. Object i has a weight wi and the knapsack has a capacity M. If a fraction xi, 0 xi 1, of object I is placed into the knapsack the a profit of pixi is earned. Goal :- Choose items with maximum total benefit but with weight at most M. The objective is to obtain a filling of the knapsack that maximizes the total weight of all chosen objects to be at most M
pi xi
subject to constraint
1in
wi xi M
and 0 xi 1 , 1 i n
It is clear that an optimal solution must fill the knapsack exactly, for otherwise we could add a fraction of one of the remaining objects and increase the value of the load. Thus in an optimal solution xiwi = W.
What is the run time? O(n 2n) GREEDY APPROACH WITH EXAMPLE
There are 5 objects that have a price and weight list below, the knapsack can contain at most 100 Lbs.
20 10
30 20
66 30
40 40
60 50
Method 1 choose the least weight first Total Weight = 10 + 20 + 30 + 40 = 100 Total Price = 20 + 30 + 66 + 40 = 156
Method 2 choose the most expensive first Total Weight = 30 + 50 + 20 = 100 Total Price = 66 + 60 + 20 = 146 Method 3 choose the most price/ weight first
20 10 2
30 20 1.5
66 30 2.2
40 40 1
60 50 1.2
The optimal solution is given by Method 3 from the above three methods
The greedy algorithm uses the maximum benefit per unit selection criteria 1. Sort items in decreasing pi / wi. 2. Add items to knapsack (starting at the first) until there are no more items, or the next item to be added exceeds W. 3. If knapsack is not yet full, fill knapsack with a fraction of next unselected item.
Algorithm GREEDY_KNAPSACK(m,n) 1. //P[1:n] and W[1:n] contain the profits and weights respectively of the n objects ordered so that
P[i]/W[i] > P[i+1]/W[i+1]. m is the knapsack size and X(1:n) is the solution vector// 2. real P(1:n), W(1:n), X(1:n),U, m; 3. { 4. for i= 1to n do X[i] = 0.0; 5. U = m ; 6. for i = 1 to n do 7. { 8. if (W[i] > U) then break; 9. X[i] = 1.0 ; U = U W[i]; 10. } 11. if ( I <= n) then X[i] = U / W[i]; 12. }
Running time: Given a collection S of n items, such that each item i has a benefit pi and weight wi, we can construct a maximum-benefit subset of S, allowing for fractional amounts, that has a total weight W in O(nlogn) time. 1. Use heap-based priority queue to store S 2. Removing the item with the highest value takes O(logn) time 3. In the worst case, need to remove all items
Part I requires O(n lg n) time for sorting the pi/wi in nonincreasing order(use merge or heap sort). Part II i.e line no.4 to 5 and line 6 to 10 requires O(n) and O(n) time so combinigly it is O(n) time assuming that activities were already sorted in part I by their finish time.
Correctness: Suppose there is a better solution 4. 5. 6. 7. there is an item i with higher value than a chosen item j, but xi<wi, xj>0 and vi<vj If we substitute some i with j, we get a better solution How much of i: min{wi-xi, xj} Thus, there is no better solution than the greedy one
Proving Optimality
Let p1/w1 > p2/w2 > > pn/wn Let X = (x1,x2,,xn) be the solution generated by GREEDY_KNAPSACK Let Y = (y1,y2,,yn) be any feasible solution
1 1 1 1 1 1 1 1 1 1 1 1
If all the xi are 1, then the solution is clearly optimal (It is the only solution) otherwise, let
1 2
k
.. .. 1
k
xkk
n 0 0 .. ..
n
xkk
1 2 1 1 .. .. 1
n k 1 i =1
0 0 .. .. 0 0
pi wi + ( xk yk ) wk pk wk
(x y ) p
i =1 i i
( xi yi )wi
1
i = k +1
( x y )w
i i
pi wi
( x y )w
i =1 i i
k 1
pi wi
( x y )w
i =1 i i
k 1
pk wk
( xk yk ) wk
pk wk
( xk yk ) wk
pk wk
i = k +1
( x y )w
i i
pi wi
i = k +1
( x y )w
i i
pk wk
( xi yi ) pi
i =1
( x y )w
i =1 i i
pk wk
pk wk
( x y )w
i =1 i i
Since W always > 0, therefore
(x y ) p
i =1 i i
A greedy algorithm:
Iteratively constructing the schedule by adding at each step the job with the highest profit pi among those not yet considered provided that resulting set of jobs remains feasible. Definition: A set of jobs is Feasible if there exists one (feasible) sequence that allows all jobs in the set to meet their respective deadlines.
To find the optimal solution and feasibility of jobs we are required to find a subset J such that each job of this subset can be completed by its deadline. The value of a feasible solution J is the sum of profits of all the jobs in J. Steps in finding the subset J are as follows: a. pi i J is the objective function chosen for optimization measure. b. Using this measure, the next job to be included should be the one which increases pi i J. c. Begin with J = and pi =0 i J d. Add a job to J which has the largest profit e. Add another job to this J keeping in mind the following condition: i. Search for job which has the next maximum profit. ii. See if this job is union with J is feasible or not. iii. If yes go to step (e) and continue else go to (iv) iv. Search for the job with next maximum profit and go to step (b) f. Terminate when addition of no more jobs is feasible.
Illustration Consider 5 jobs with profits (p1,p2,p3,p4,p5) = (20,15,10,5,1) and maximum delay allowed (d1,d2,d3,d4,d5) = (2,2,1,3,3). Here maximum number of jobs that can be completed is = Min(n,maxdelay(di)) = Min(5,3) = 3. Hence there is a possibility of doing 3 jobs. There are 3 units of time Time Slot [0-1] [1-2] [2-3] Profit Job 1 - yes - 20 2 yes - - 15 3 cannot accommodate 4 - - yes 5 40 In the first unit of time job 2 is done and a profit of 15 is gained, in the second unit job 1 is done and a profit 20 is obtained finally in the 3rd unit since the third job is not available 4th job is done and 5 is obtained as the profit in the above job 3 and 5 could not be accommodated due to their deadlines. Algorithm: Step 1: Sort pi into nonincreasing order. After sorting p1 p2 p3 pi. Step 2: Add the next job i to the solution set if i can be completed by its deadline. Assign i to time slot [r-1, r], where r is the largest integer such that 1 r di and [r-1, r] is free. Step 3: Stop if all jobs are examined. Otherwise, go to step 2. Algorithm JS(d,j,n) 1. //d[i] >=1 , 1<=i<=n are the deadlines, n>=1. the jobs are ordered such that p[1]>=p[2]>= >=p[n].j[i] is the Ith job in the optimal solution , 1<=i<=k. also at teremination d[j[i]]<=d[j[i+1]], 1<=i<k.// 2. { 3. d[0] = j[0] =0; 4. j[1] = 1; 5. k =1; 6. for i= 2 to n do 7. { 8. //consider jobs in nonincreasing order of p[i]. Find position for I and check feasibility of insertion// 9. r = k; 10. While ((d[j[r]] > d[i]) and (d[j[r]] != r)) do r =r-1; 11. If ((d[j[r]] <= d[i]) and (d[i] > r)) then 12. { 13. //insert i into j[]// 14. For q =k to (r+1) step-1 do j[q+1] = j[q]; 15. J[r+1] = I; k = k+1; 16. } 17. } 18. Return k; 19. }
Analysis In the above algorithm we have two parameters like n is the no. of jobs and s is the no. of jobs included in the solution j. The while loop in line no 10 iterated at most k times. And each iteration takes (1) time. If the condition in line 11 is true then lines 14 and 15 are executed these lines take (k-r) times to insert job i. Hence the total time for each iteration of for loop of line no 6 is (k). this loop is iterated n-1 times . if s is the final value of k i.e s is the no of jobs in the final solution then the total time needed by the algorithm is (sn) .since s<=n then the worst case time as a function of n is (n2). This computing time of above algorithm can be reduced from O(n2) to O(n) by using the disjoint set union and find algorithms. Proof of Correctness The correctness of the algorithm: suppose that the greedy method chooses the set I and the optimal set is J with J I. We consider only two feasible sequences. Suppose that the job a is scheduled in SI and the corresponding position of SJ is a gap. Then J U{a}is feasible with more profit. This is impossible. similarly one can see that if b is scheduled in SJ, then the corresponding position of SI must be scheduled as well. Suppose there are two jobs a and b with a J; b I at the same position of SI and SJ. The case ga > gb is impossible since otherwise J {a} U {b} is feasible and more profitable than the optimal J. If gb > ga, then J {a} U {b} is feasible and the greedy algorithm would have chosen b before a. Therefore only the case ga = gb remains. This shows the greedy algorithm works well.
GENERAL METHOD Growing a MST(Generic Algorithm) GENERIC_MST(G,w) 1 A:={} 2 while A does not form a spanning tree do 3 find an edge (u,v) that is safe for A 4 A:=A U {(u,v)} 5 return A Set A is always a subset of some minimum spanning tree. This property is called the invariant Property. An edge (u,v) is a safe edge for A if adding the edge to A does not destroy the invariant. A safe edge is just the CORRECT edge to choose to add to T.
Greedy Algorithms Kruskal's algorithm. Start with T = . Consider edges in ascending order of cost. Insert edge e in T unless doing so would create a cycle. Prim's algorithm. Start with some root node s and greedily grow a tree T from s outward. At each step, add the cheapest edge e to T that has exactly one endpoint in T.
The algorithms of Kruskal and Prim The two algorithms are elaborations of the generic algorithm. They each use a specific rule to determine a safe edge in line 3 of GENERIC_MST. In Kruskal's algorithm, The set A is a forest. The safe edge added to A is always a least-weight edge in the graph that connects two distinct components. In Prim's algorithm, The set A forms a single tree. The safe edge added to A is always a least-weight edge connecting the tree to a vertex not in the tree. Kruskals Algorithm Basics of Kruskals Algorithm Attempts to add edges to A in increasing order of weight (lightest edge first) If the next edge does not induce a cycle among the current set of edges, then it is added to A. If it does, then this edge is passed over, and we consider the next edge in order. As this algorithm runs, the edges of A will induce a forest on the vertices and the trees of this forest are merged together until we have a single tree containing all vertices Detecting a Cycle We can perform a DFS on subgraph induced by the edges of A, but this takes too much time. Use disjoint set UNION-FIND data structure. This data structure supports 3 operations: Make-Set(u): create a set containing u. Find-Set(u): Find the set that contains u. Union(u, v): Merge the sets containing u and v. Each can be performed in O(lg n) time. The vertices of the graph will be elements to be stored in the sets; the sets will be vertices in each tree of A (stored as a simple list of edges). MST-Kruskal(G, w) 1. A // initially A is empty 2. for each vertex v V[G] // line 2-3 takes O(V) time 3. do MAKE_SET(v) // make set for each vertex 4. sort the edges of E by nondecreasing weight w 5. for each edge (u,v) E, in order by nondecreasing weight 6. do if Find-Set(u) Find-Set(v) // u&v on different trees 7. then A A {(u,v)} 8. Union(u,v) 9. return A Total running time is O(|E| log|E|) Our implementation uses a disjoint-set data structure to maintain several disjoint sets of elements. Each set contains the vertices in a tree of the current forest. The operation FIND_SET(u) returns a representative element from the set that contains u.
Thus, we can determine whether two vertices u and v belong to the same tree by testing whether FIND_SET(u) equals FIND_SET(v). The combining of trees is accomplished by the UNION procedure. Running time O(|E| lg (|E|)). (The analysis is not required.)
EXAMPLE The edges are considered by the algorithm in sorted order by weight. The edge under consideration at each step is shown with a bold line edge.
Analysis of Kruskal Lines 1-3 (initialization): O(V) Line 4 (sorting): O(E lg E) Lines 6-8 (set-operation): O(E log E) Total: O(E log E) Correctness Consider the edge (u, v) that the algorithm seeks to add next, and suppose that this edge does not induce a cycle in A. Let A denote the tree of the forest A that contains vertex u. Consider the cut (A, V-A). Every edge crossing the cut is not in A, and so this cut respects A, and (u, v) is the light edge across the cut (because any lighter edge would have been considered earlier by the algorithm). Thus, by the MST Lemma, (u,v) is safe. Example with disjoint set data structure
Prims Algorithm Prims Algorithm constructs the minimum-cost spanning tree by selecting edges one at a time like Kruskals The greedy criterion: From the remaining edges, select a least-cost edge whose addition to the set of selected edges forms a tree Consequently, at each stage the set of selected edges forms a tree Prims algorithm Step 1: x V, Let A = {x}, B = V - {x} Step 2: Select (u, v) E, u A, v B (u, v) has the smallest weight between A and B Step 3: (u, v) is in the tree. A = A {v}, B = B - {v} Step 4: If B = , stop; otherwise, go to Step 2. Time complexity : O(n2), n = |V|.
WRITE DOWN THE ALGORITHM OF PRIMS FROM SAHANI AND ITS ANALYSIS EXAMPLE