Unit I - Daa Notes
Unit I - Daa Notes
INTRODUCTION
Algorithm
● An algorithm is a set of rules for carrying out calculation either by hand or on a machine.
● An algorithm is a finite step-by-step procedure to achieve a required result.
● An algorithm is a sequence of computational steps that transform the input into the
output.
● An algorithm is a sequence of operations performed on data that have to be organized
in data structures.
● An algorithm is an abstraction of a program to be executed on a physical machine
(model of Computation).
problem
algorithm
Algorithms
● It is not depended on programming language, machine.
● Are mathematical entities, which can be thought of as running on some sort of idealized
computer with an infinite random access memory.
● Algorithm design is all about the mathematical theory behind the design of good
programs.
● Algorithmic is a branch of computer science that consists of designing and analyzing
computer algorithms.
● The “design” pertain to
● The description of algorithm at an abstract level by means of a pseudo language, and
● Proof of correctness that is, the algorithm solves the given problem in all cases.
● The “analysis” deals with performance evaluation (complexity analysis).
● Random Access Machine (RAM) model.
Euclid’s Algorithm
Problem: Find gcd(m,n), the greatest common divisor of two nonnegative, not both zero
integers m and n.
● Examples: gcd(60,24) = 12, gcd(60,0) = 60, gcd(0,0) = ?
● Euclid’s algorithm is based on repeated application of equality
- gcd(m,n) = gcd(n, m mod n)
until the second number becomes 0, which makes the problem trivial.
● Example: gcd(60,24) = gcd(24,12) = gcd(12,0) = 12.
ALGORITHM Euclid(m, n)
//Computes gcd(m, n) by Euclid’s algorithm
//Input: Two nonnegative, not-both-zero integersm and n
//Output: Greatest common divisor of m and n
whilen _= 0 do
r←m mod n
m←n
n←r
returnm
Example: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Prove correctness: correct output for every legitimate input in finite time
Tracing can prove incorrectness, it cannot prove correctness in the general case.
Coding
How the objects and operations in the algorithm are represented in the chosen programming
language? Can every problem be solved by an algorithm?
● An algorithm that can solve the problem exactly is called an exact algorithm. The
algorithm that can solve the problem approximately is called an approximation algorithm.
The problems that can be approximately are
• extracting square roots
• Solving non-linear equations
• Evaluate definite integrals
for some, algorithm for solving a problem exactly is not acceptable because it can be slow
due to its intrinsic complexity of that problem. For ex, like travelling salesman problem which
finds shortest tour through n cities.
● Present: pseudocode
● Earlier: flowchart
● Modus ponens
Analysis of algorithms
● time efficiency
● space efficiency
1.3 IMPORTANT PROBLEM TYPES
● Sorting
● Searching
● String processing
● Graph problems
● Combinatorial problems
● Geometric problems
● Numerical problems
Sorting
The sorting problem is to rearrange the items of a given list in nondecreasing order.
A sorting algorithm is called stable if it preserves the relative order of any two equal elements
in its input.
-In other words, if an input list contains two equal elements in positions i and j where i < j, then
in the sorted list they have to be in positions i’ andj’, respectively, such that i’< j’.
The second notable feature of a sorting algorithm is the amount of extra memory the algorithm
requires. An algorithm is said to be in-place if it does not require extra memory, except,
possibly, for a few memory units.
Searching
The searching problem deals with finding a given value, called a search key, in a
given set.
In applications where the underlying data may change frequently relative to the Number of
searches, searching has to be considered in conjunction with two other Operations: an addition
to and deletion from the data set of an item. In such Situations, data structures and algorithms
should be chosen to strike a balance among the requirements of each operation.
String processing
A string is a sequence of characters from an alphabet. Strings of particular interest are text
strings, which comprise letters, numbers, and special characters; bit strings, which comprise
zeros and ones; and gene sequences, which can be modeled by strings of characters from the
four-character alphabet {A,C, G, T}. It should be pointed out, however, that string-processing
algorithms have been important for computer science for a long time in conjunction with
computer languages and compiling issues.
One particular problem—that of searching for a given word in a text—has attracted special
attention from researchers. They call it string matching. Several algorithms that exploit the
special nature of this type of searching have been invented.
Graph problems:
Graph can be thought of as a collection of points called vertices, some of which are connected
by line segments called edges.
The traveling salesman problem (TSP) is the problem of finding the shortest tour through n
cities that visits every city exactly once.
The graph-coloring problem seeks to assign the smallest number of colors to the vertices of
a graph so that no two adjacent vertices are the same color. This problem arises in several
applications, such as event scheduling: if the events are represented by vertices that are
connected by an edge if and only if the corresponding events cannot be scheduled at the same
time, a solution to the graph-coloring problem yields an optimal schedule.
Combinatorial problems
The travelling salesman problem and the graph-coloring problem are examples of
combinatorial problems.
combinatorial problems are the most difficult problems in computing, from both a theoretical
and practical standpoint. Their difficulty stems from the following facts. First, the number of
combinatorial objects typically grows extremely fast with a problem’s size, reaching
unimaginable magnitudes even for moderate-sized instances. Second, there are no known
algorithms for solving most such problems exactly in an acceptable amount of time.
Some combinatorial problems can be solved by efficient algorithms, but they should be
considered fortunate exceptions to the rule. The shortest-path problem mentioned earlier is
among such exceptions.
Geometric problems:
Geometric algorithms deal with geometric objects such as points, lines, and polygons.
Two classic problems of computational geometry:
The closest-pair problem is self-explanatory: given n points in the plane, find the closest pair
among them.
The convex-hull problem asks to find the smallest convex polygon that would include all the
points of a given set.
Numerical problems:
Numerical problems, another large special area of applications, are problems that involve
mathematical objects of continuous nature: solving equations and systems of equations,
computing definite integrals, evaluating functions, and so on.
Let’s start with the obvious observation that almost all algorithms run longer on larger inputs.
For example, it takes longer to sort larger arrays, multiply larger matrices, and so on. Therefore,
it is logical to investigate an algorithm’s efficiency as a function of some parameter n indicating
the algorithm’s input size.1 In most cases, selecting such a parameter is quite straightforward.
For example, it will be the size of the list for problems of sorting, searching, finding the list’s
smallest element, and most other problems dealing with lists. For the problem of evaluating a
polynomial p(x) = anx n + . . . + a0 of degree n, it will be the polynomial’s degree or the number
of its coefficients, which is larger by 1 than its degree. You’ll see from the discussion that such
a minor difference is inconsequential for the efficiency analysis.
There are situations, of course, where the choice of a parameter indicating an input size does
matter. One such example is computing the product of two n × n matrices. There are two natural
measures of size for this problem. The first and more frequently used is the matrix order n. But
the other natural contender is the total number of elements N in the matrices being multiplied.
(The latter is also more general since it is applicable to matrices that are not necessarily square.)
The choice of an appropriate size metric can be influenced by operations of the algorithm
in question. For example, how should we measure an input’s size for a spell-checking
algorithm? If the algorithm examines individual characters of its input, we should measure the
size by the number of characters; if it works by processing words, we should count their number
in the input.
We should make a special note about measuring input size for algorithms solving
problems such as checking primality of a positive integer n. Here, the input is just one number,
and it is this number’s magnitude that determines the input size. In such situations, it is
preferable to measure size by the number b of bits in the n’s binary representation: b = log2 n
+ 1.
This metric usually gives a better idea about the efficiency of algorithms in question.
A way of comparing functions that ignores constant factors and small input sizes.
O(g(n)): set of all functions f(n) with a smaller or same order of growth as g(n) (to within a
constant multiple, as n!).
"(g(n)): set of all functions f(n) with a larger or same order of growth as g(n) (to within a
constant multiple, as n!).
(g(n)): set of all functions f(n) with the same order of growth as g(n) (to within a constant
multiple, as n!).
Big-Oh Notation
Let t(n) = O(g(n)): A function t(n) is said to be in O(g(n)), if it is bounded above by some
constant multiple of g(n) for all large nThere exist positive constant c and non-negative integer
n0 such that t(n) # c g(n) for every n $ n0
Let t(n) =(g(n)): A function t(n) is said to be in O(g(n)), if it is bounded below by some
constant multiple of g(n) for all large n There exist positive constant c and non-negative integer
n0 such that t(n) $ c g(n) for every n $ n0
Let t(n) =(g(n)): A function t(n) is said to be in (g(n)), if it is bounded both above and below
by some constant multiples of g(n) for all large nThere exist positive constants c1
andc2 and non-negative integer n0 such that c2g(n) $ t(n) $ c1g(n) for every n $ n0
Compute the limit of the ratio of two functions under consideration using the limit-based
approach is more convenient than the one based in the definition.
Let f be an eventually-nonnegative sequence of real numbers and let X be a set. The function
O( f |X) yields a non empty set of functions from N to R and is defined as follows:
Let f be an eventually-nonnegative sequence of real numbers and let X be a set. The function
Let f be an eventually-nonnegative sequence of real numbers and let X be a set. The function (
f|X) yields a non empty set of functions from N to R and is defined by the condition (f |X) =
{t;t ranges over elements of RN:
Let f be a sequence of real numbers and let b be a natural number. The functor fb yielding a
sequence of real numbers is defined as follows:
f is eventually-nondecreasing and fb 2 O( f ).
Suppose f is smooth and b _ 2 and t 2 O( f |{bn : n ranges over natural numbers}). Then
This notation is now frequently also used in computational complexity theory to describe an
algorithm's usage of computational resources: the worst case or average case running time or
memory usage of an algorithm is often expressed as a function of the length of its input using
big O notation. This allows algorithm designers to predict the behavior of their algorithms and
to determine which of multiple algorithms to use, in a way that is independent of computer
architecture or clock rate. Big O notation is also used in many other fields to provide similar
estimates.
A description of a function in terms of big O notation usually only provides an upper bound on
the growth rate of the function; associated with big O notation are several related notations,
using the symbols o, Ω, ω, and Θ, to describe other kinds of bounds on asymptotic growth
rates.
If a function f(n) can be written as a finite sum of other functions, then the fastest growing one
determines the order of f(n).
When we defined the Rational class, we didn't reduce the fractions, so we ended up with
numbers like 10/8 and 4/6. To avoid this, we could reduce the fractions in the constructor by
dividing both the numerator and the denominator by the GCD of the two.
The greatest common divisor (GCD) of two integersm and n is the greatest integer that
divides both m and n with no remainder.
Given integers m and n such that m>= n> 0, find the GCD of m and n.
To solve this problem, the mathematical definition isn't enough. It tells us what we want, but
not how to get it. Before we can write a procedure, we need to know how to compute the
value.
v v
PROCE
PROCEDURE s ALGORITHM s
SS
. .
Before considering possible GCD algorithms, let's design algorithms for some simpler
problems.
Example: factorial
Recall that n! = n*(n-1)*(n-2)*...*2*1, and that 0! = 1. In other words,
int factorial(int n) {
if (n == 0)
return 1;
else
return (n * factorial(n-1));
}
factorial(0)
=> 1
factorial(3)
3 * factorial(2)
3 * 2 * factorial(1)
3 * 2 * 1 * factorial(0)
3*2*1*1
=> 6
This looks good. We've done a reduction of the factorial problem to a smaller instance of the
same problem. This idea of reducing a problem to itself is known as recursion.
int factorial(int n) {
if (n == 0) // Termination condition (base case)
return 1; // Result for base case
else
return (n * factorial(n-1));
// ^ --------------
// | ^
// | |___ Recursive call
// |
// |___ Combining step (used to combine other values with
// results of the recursive call)
}
Another way to think about the execution of a recursive procedure is with the "actors" model
(or dataflow model). In this simple model, we draw a box for each procedure invocation and
show the flow of data into and out of each box. To do this, we put the output arrow on the left
with the input.
For example,
Run-time analysis
Run-time analysis is a theoretical classification that estimates and anticipates the increase in
running time (or run-time) of an algorithm as its input size (usually denoted as n) increases.
Run-time efficiency is a topic of great interest in computer science: A program can take
seconds, hours or even years to finish executing, depending on which algorithm it implements
(see also performance analysis, which is the analysis of an algorithm's run-time in practice).
Take as an example a program that looks up a specific entry in a sortedlist of size n. Suppose
this program were implemented on Computer A, a state-of-the-art machine, using a linear
search algorithm, and on Computer B, a much slower machine, using a binary search algorithm.
Benchmark testing on the two computers running their respective programs might look
something like the following:
15 7 100,000
65 32 150,000
Based on these metrics, it would be easy to jump to the conclusion that Computer A is
running an algorithm that is far superior in efficiency to that of Computer B. However, if the
size of the input-list is increased to a sufficient number, that conclusion is dramatically
demonstrated to be in error:
15 7 100,000
65 32 150,000
Computer A, running the linear search program, exhibits a linear growth rate. The program's
run-time is directly proportional to its input size. Doubling the input size doubles the run
time, quadrupling the input size quadruples the run-time, and so forth. On the other hand,
Computer B, running the binary search program, exhibits a logarithmic growth rate. Doubling
the input size only increases the run time by a constant amount (in this example, 50,000 ns).
Even though Computer A is ostensibly a faster machine, Computer B will inevitably surpass
Computer A in run-time because it's running an algorithm with a much slower growth rate.
Orders of growth
Main article: Big O notation
Informally, an algorithm can be said to exhibit a growth rate on the order of a mathematical
function if beyond a certain input size n, the function f(n) times a positive constant provides an
upper bound or limit for the run-time of that algorithm. In other words, for a given input size n
greater than some n0 and a constant c, the running time of that algorithm will never be larger
than c × f(n). This concept is frequently expressed using Big O notation. For example, since
the run-time of insertion sortgrows quadratically as its input size increases, insertion sort can
be said to be of order O(n²).
Big O notation is a convenient way to express the worst-case scenario for a given algorithm,
although it can also be used to express the average-case — for example, the worst-case scenario
for quicksort is O(n²), but the average-case run-time is O(n log n).[7]
Empirical orders of growth
Assuming the execution time follows power rule, t ≈ k na, the coefficient a can be found [8] by
taking empirical measurements of run time at some problem-size points ,
and calculating so that . If the order of
growth indeed follows the power rule, the empirical value of a will stay constant at different
ranges, and if not, it will change - but still could serve for comparison of any two given
algorithms as to their empirical local orders of growth behaviour. Applied to the above table:
15 7 100,000
1,000,00
500,000 1.00 500,000 0.10
0
4,000,00
2,000,000 1.00 550,000 0.07
0
16,000,0
8,000,000 1.00 600,000 0.06
00
It is clearly seen that the first algorithm exhibits a linear order of growth indeed following the
power rule. The empirical values for the second one are diminishing rapidly, suggesting it
follows another rule of growth and in any case has much lower local orders of growth (and
improving further still), empirically, than the first one.
Evaluating run-time complexity
The run-time complexity for the worst-case scenario of a given algorithm can sometimes be
evaluated by examining the structure of the algorithm and making some simplifying
assumptions. Consider the following pseudocode:
A given computer will take a discrete amount of time to execute each of the instructions
involved with carrying out this algorithm. The specific amount of time to carry out a given
instruction will vary depending on which instruction is being executed and which computer is
executing it, but on a conventional computer, this amount will be deterministic.[9] Say that the
actions carried out in step 1 are considered to consume time T1, step 2 uses time T2, and so
forth.
In the algorithm above, steps 1, 2 and 7 will only be run once. For a worst-case evaluation, it
should be assumed that step 3 will be run as well. Thus the total amount of time to run steps
1-3 and step 7 is:
The loops in steps 4, 5 and 6 are trickier to evaluate. The outer loop test in step 4 will execute
( n + 1 ) times (note that an extra step is required to terminate the for loop, hence n + 1 and
not n executions), which will consume T4( n + 1 ) time. The inner loop, on the other hand, is
governed by the value of i, which iterates from 1 to i. On the first pass through the outer loop,
j iterates from 1 to 1: The inner loop makes one pass, so running the inner loop body (step 6)
consumes T6 time, and the inner loop test (step 5) consumes 2T5 time. During the next pass
through the outer loop, j iterates from 1 to 2: the inner loop makes two passes, so running the
inner loop body (step 6) consumes 2T6 time, and the inner loop test (step 5) consumes 3T5
time.
Altogether, the total time required to run the inner loop body can be expressed as an
arithmetic progression:
The total time required to run the outer loop test can be evaluated similarly:
which reduces to
As a rule-of-thumb, one can assume that the highest-order term in any given function
dominates its rate of growth and thus defines its run-time order. In this example, n² is the
highest-order term, so one can conclude that f(n) = O(n²). Formally this can be proven as
follows:
Prove that
(for n ≥ 0)
(for n ≥ 1)
Therefore
for
A more elegant approach to analyzing this algorithm would be to declare that [T1..T7] are all
equal to one unit of time, in a system of units chosen so that one unit is greater than or equal
to the actual times for these steps. This would mean that the algorithm's running time breaks
down as follows:[11]
(for n ≥ 1)
Let us discuss these steps one at a time. There are several different goals one can pursue in
analyzing algorithms empirically. They include checking the accuracy of a theoretical assertion
about the algorithm’s efficiency, comparing the efficiency of several algorithms for solving the
same problem or different implementations of the same algorithm, developing a hypothesis
about the algorithm’s efficiency class, and ascertaining the efficiency of the program
implementing the algorithm on a particular machine. Obviously, an experiment’s design should
de-pend on the question the experimenter seeks to answer.
In particular, the goal of the experiment should influence, if not dictate, how the algorithm’s
efficiency is to be measured. The first alternative is to insert a counter (or counters) into a
program implementing the algorithm to count the number of times the algorithm’s basic
operation is executed. This is usually a straightforward operation; you should only be mindful
of the possibility that the basic operation is executed in several places in the program and that
all its executions need to be accounted for. As straightforward as this task usually is, you should
always test the modified program to ensure that it works correctly, in terms of both the problem
it solves and the counts it yields.
The second alternative is to time the program implementing the algorithm in question. The
easiest way to do this is to use a system’s command, such as the time command in UNIX.
Alternatively, one can measure the running time of a code fragment by asking for the system
time right before the fragment’s start (tstart) and just after its completion (tfinish), and then
computing the difference between thetwo (tfinish− tstart).7 In C and C++, you can use the function
clock for this purpose; in Java, the method currentTimeMillis() in the System class is available.
It is important to keep several facts in mind, however. First, a system’s time is typically not
very accurate, and you might get somewhat different results on repeated runs of the same
program on the same inputs. An obvious remedy is to make several such measurements and
then take their average (or the median) as the sample’s observation point. Second, given the
high speed of modern com-puters, the running time may fail to register at all and be reported
as zero. The standard trick to overcome this obstacle is to run the program in an extra loop
many times, measure the total running time, and then divide it by the number of the loop’s
repetitions. Third, on a computer running under a time-sharing system such as UNIX, the
reported time may include the time spent by the CPU on other programs, which obviously
defeats the purpose of the experiment. Therefore, you should take care to ask the system for
the time devoted specifically to execution of your program. (In UNIX, this time is called the
“user time,” and it is automatically provided by the time command.)
Thus, measuring the physical running time has several disadvantages, both principal
(dependence on a particular machine being the most important of them) and technical, not
shared by counting the executions of a basic operation. On the other hand, the physical running
time provides very specific information about an algorithm’s performance in a particular
computing environment, which can be of more importance to the experimenter than, say, the
algorithm’s asymptotic efficiency class. In addition, measuring time spent on different
segments of a program can pinpoint a bottleneck in the program’s performance that can be
missed by an abstract deliberation about the algorithm’s basic operation. Getting such data—
called profiling—is an important resource in the empirical analysis of an algorithm’s running
time; the data in question can usually be obtained from the system tools available in most
computing environments.
Whether you decide to measure the efficiency by basic operation counting or by time clocking,
you will need to decide on a sample of inputs for the experiment. Often, the goal is to use a
sample representing a “typical” input; so the challenge is to understand what a “typical” input
is. For some classes of algorithms—e.g., for algorithms for the traveling salesman problem that
we are going to discuss later in the book—researchers have developed a set of instances they
use for benchmark-ing. But much more often than not, an input sample has to be developed by
the experimenter. Typically, you will have to make decisions about the sample size (it is
sensible to start with a relatively small sample and increase it later if necessary), the range of
instance sizes (typically neither trivially small nor excessively large), and a procedure for
generating instances in the range chosen. The instance sizes can either adhere to some pattern
(e.g., 1000, 2000, 3000, . . . , 10,000 or 500, 1000, 2000, 4000, . . . , 128,000) or be generated
randomly within the range chosen.
The principal advantage of size changing according to a pattern is that its impact is easier to
analyze. For example, if a sample’s sizes are generated by doubling, you can compute the ratios
M(2n)/M(n) of the observed metric M (the count or the time) to see whether the ratios exhibit
a behavior typical of algorithms in one of the basic efficiency classes discussed in Section 2.2.
The major disadvantage of nonrandom sizes is the possibility that the algorithm under
investigation exhibits atypical behavior on the sample chosen. For example, if all the sizes in
a sample are even and your algorithm runs much more slowly on odd-size inputs, the empirical
results will be quite misleading.
Much more often than not, an empirical analysis requires generating random numbers. Even if
you decide to use a pattern for input sizes, you will typically want instances themselves
generated randomly. Generating random numbers on a digital computer is known to present a
difficult problem because, in principle, the problem can be solved only approximately. This is
the reason computer scien-tists prefer to call such numbers pseudorandom. As a practical
matter, the easiest and most natural way of getting such numbers is to take advantage of a
random number generator available in computer language libraries. Typically, its output will
be a value of a (pseudo)random variable uniformly distributed in the interval between 0 and 1.
If a different (pseudo)random variable is desired, an appro-priate transformation needs to be
made. For example, if x is a continuous ran-dom variable uniformly distributed on the interval
0 ≤ x < 1, the variable y = l+ x(r − l) will be uniformly distributed among the integer values
between integers l and r − 1 (l < r).
Alternatively, you can implement one of several known algorithms for gener-ating
(pseudo)random numbers. The most widely used and thoroughly studied of such algorithms is
the linear congruential method.
//Input: A positive integer n and positive integer parameters m, seed, a, b //Output: A sequence
r1, . . . , rn of n pseudorandom integers uniformly distributed among integer values between 0
and m − 1 //Note: Pseudorandom numbers between 0 and 1 can be obtained by treating the
integers generated as digits after the decimal point
r0 ← seed
for i ← 1 to n do
ri ← (a ∗ ri−1 + b) mod m
The simplicity of this pseudocode is misleading because the devil lies in the details of choosing
the algorithm’s parameters. Here is a partial list of recommen-dations based on the results of a
sophisticated mathematical analysis (see [KnuII, pp. 184–185] for details): seed may be chosen
arbitrarily and is often set to the current date and time; m should be large and may be
conveniently taken as 2w, where w is the computer’s word size; a should be selected as an
integer between 0.01m and 0.99m with no particular pattern in its digits but such that a mod 8
= 5; and the value of b can be chosen as 1.
The empirical data obtained as the result of an experiment need to be recorded and then
presented for an analysis. Data can be presented numerically in a table or graphically in a
scatterplot, i.e., by points in a Cartesian coordinate system. It is a good idea to use both these
options whenever it is feasible because both methods have their unique strengths and
weaknesses.
The principal advantage of tabulated data lies in the opportunity to manip-ulate it easily. For
example, one can compute the ratios M(n)/g(n) where g(n) is a candidate to represent the
efficiency class of the algorithm in question. If the algorithm is indeed in (g(n)), most likely
these ratios will converge to some pos-itive constant as n gets large. (Note that careless novices
sometimes assume that this constant must be 1, which is, of course, incorrect according to the
definition of (g(n)).) Or one can compute the ratios M(2n)/M(n) and see how the running time
reacts to doubling of its input size. As we discussed in Section 2.2, such ratios should change
only slightly for logarithmic algorithms and most likely converge to 2, 4, and 8 for linear,
quadratic, and cubic algorithms, respectively—to name the most obvious and convenient cases.
On the other hand, the form of a scatterplot may also help in ascertaining the algorithm’s
probable efficiency class. For a logarithmic algorithm, the scat-terplot will have a concave
shape (Figure 2.7a); this fact distinguishes it from all the other basic efficiency classes. For a
linear algorithm, the points will tend to aggregate around a straight line or, more generally, to
be contained between two straight lines (Figure 2.7b). Scatterplots of functions in (n lg n)
and (n2) will have a convex shape (Figure 2.7c), making them difficult to differentiate. A
scatterplot of a cubic algorithm will also have a convex shape, but it will show a much more
rapid increase in the metric’s values. An exponential algorithm will most probably require a
logarithmic scale for the vertical axis, in which the val-ues of loga M(n) rather than those of
M(n) are plotted. (The commonly used logarithm base is 2 or 10.) In such a coordinate system,
a scatterplot of a truly exponential algorithm should resemble a linear function because M(n)
≈ can im-plies logb M(n) ≈ logb c + n logb a, and vice versa.
One of the possible applications of the empirical analysis is to predict the al-gorithm’s
performance on an instance not included in the experiment sample. For example, if you observe
that the ratios M(n)/g(n) are close to some constant c for the sample instances, it could be
sensible to approximate M(n) by the prod-uct cg(n) for other instances, too. This approach
should be used with caution, especially for values of n outside the sample range.
(Mathematicians call such predictions extrapolation, as opposed to interpolation, which deals
with values within the sample range.) Of course, you can try unleashing the standard tech-
niques of statistical data analysis and prediction. Note, however, that the majority of such
techniques are based on specific probabilistic assumptions that may or may not be valid for the
experimental data in question.
It seems appropriate to end this section by pointing out the basic differ-ences between
mathematical and empirical analyses of algorithms. The princi-pal strength of the mathematical
analysis is its independence of specific inputs; its principal weakness is its limited applicability,
especially for investigating the average-case efficiency. The principal strength of the empirical
analysis lies in its applicability to any algorithm, but its results can depend on the particular
sample of instances and the computer used in the experiment.
Exercises 2.6
1. Consider the following well-known sorting algorithm, which is studied later in the
book, with a counter inserted to count the number of key comparisons.
for i ← 1 to n − 1 do
v ← A[i] j ← i − 1
A[j + 1] ← A[j ] j ← j − 1
Is the comparison counter inserted in the right place? If you believe it is, prove it; if you believe
it is not, make an appropriate correction.
2. a. Run the program of Problem 1, with a properly inserted counter (or coun-ters) for
the number of key comparisons, on 20 random arrays of sizes 1000, 2000, 3000, . . . , 20,000.
Analyze the data obtained to form a hypothesis about the algorithm’s average-case
efficiency.
Estimate the number of key comparisons we should expect for a randomly generated
array of size 25,000 sorted by the same algorithm.
5. What scale transformation will make a logarithmic scatterplot look like a linear one?
6. How can one distinguish a scatterplot for an algorithm in (lg lg n) from a scatterplot
for an algorithm in (lg n)?
7. a. Find empirically the largest number of divisions made by Euclid’s algo-rithm for
computing gcd(m, n) for 1≤ n ≤ m ≤ 100.
For each positive integer k, find empirically the smallest pair of integers 1≤ n ≤ m ≤
100 for which Euclid’s algorithm needs to make k divisions in order to find gcd(m, n)
This corresponds very closely to what actually happens on the execution stack in the
computer's memory.
Nonrecursive Algorithms
for i ← 0 to n-2 do
min ← i
for j = i + 1 to n – 1 do
if A[j] < A[min]
min ← j
swap A[i] and A[min]
Inner loop:
n-1
S(i) =∑1 = (n-1) – (i + 1) + 1 = n – 1 - i
j = i+1
Outer loop:
n-2n-2n-2n-2
Op =∑ S(i) = ∑ (n – 1 – i) = ∑ (n – 1) - ∑ i
i = 0i = 0 i = 0i = 0
Basic formula:
n
∑i = n(n+1) / 2
i=0
Recursive Algorithms
T(n) = T(n-1) + c
T(1) = d
Solution: T(n) = (n-1)c + d
T(n) = T(n-1) + cn
T(1) = d
Solution: T(n) = [n(n+1)/2 – 1] c + d
T(n) = T(n/2) + c
T(1) = d
Solution: T(n) = c log n + d
Example 1
n! = n*(n-1)!
0! = 1
T(n) = T(n-1) + 1
T(1) = 1
Telescoping:
T(n) = T(n-1) + 1
T(n-1) = T(n-2) + 1
T(n-2) = T(n-3) + 1
…
T(2) = T(1 ) + 1
T(n) = T(n/2) + 1
T(n/2) = T(n/4) + 1
…
T(2) = T(1) + 1
Homogeneous equations
Let us first determine the number of solutions. It appears that we must know the values of a1,
a2 . . . akto compute the values of the sequence according to the recurrence. In absence of this
there can be different solutions based on different boundary conditions. Given the k boundary
conditions, we can uniquelydetermine the values of the sequence. Note that this is not true for
a non-linear recurrence like
This observation (of unique solution) makes it somewhat easier for us to guess some solution
and verify.Let us guess a solution of the form ar = Aαr where A is some constant. This may be
justified from the solution of Example A.1. By substituting this in the homogeneous linear
recurrence and simplification, we obtain the following equation
c0αk + c1αk−1 . . . + ck = 0
This is called the characteristic equation of the recurrence relation and this degree k equation
has k roots, say α1, α2 . . . αk. If these are all distinct then the following is a solution to the
recurrence ar(h) = A1αr1+ A2αr2 + . . .Akαrk which is also called the homogeneous solution
to linear recurrence. The values of A1,A2. . .Ak can be determined from the k boundary
conditions (by solving k simultaneous equations).
When the roots are not unique, i.e. some roots have multiplicity then for multiplicity m, αn,
nαn, n2αn . . . nm−1αn are the associated solutions. This follows from the fact that if α is a
multiple root of the characteristic equation, then it is also the root of the derivative of the
equation.
Inhomogeneous equations
If f(n) 6= 0, then there is no general methodology. Solutions are known for some particular
cases, known as particular solutions. Let a(h) n be the solution by ignoring f(n) and let a(p) n
be a particular solution then it can be verified that
da constant B
dn B1n + B0
Here B,B0,B1,B2 are constants to be determined from initial conditions. When f(n) = f1(n) +
f2(n) is a sum of the above functions then we solve the equation for f1(n) and f2(n) separately
and then add them in the end to obtain a particular solution for the f(n).
Use a recurrence equation to specify how it is related to other terms in the sequence.
R Solve takes recurrence equations and solves them to get explicit formulas for a[n].
Out[1]=
This takes the solution and makes an explicit table of the first ten a[n].
Out[2]=
Solving a recurrence equation.
Out[3]=
Out[4]=
Out[5]=
RSolve can be thought of as a discrete analog of DSolve. Many of the same functions
generated in solving differential equations also appear in finding symbolic solutions to
recurrence equations.
Out[6]=
Out[7]=
RSolve does not require you to specify explicit values for terms such as a[1]. Like DSolve, it
automatically introduces undetermined constants C[i] to give a general solution.
Out[8]=
RSolve can solve equations that do not depend only linearly on a[n]. For nonlinear equations,
however, there are sometimes several distinct solutions that must be given. Just as for
differential equations, it is a difficult matter to find symbolic solutions to recurrence
equations, and standard mathematical functions only cover a limited set of cases.
Here is the general solution to a nonlinear recurrence equation.
In[9]:= RSolve[{a[n] == a[n + 1] a[n - 1]}, a[n], n]
Out[9]=
This gives two distinct solutions.
In[10]:= RSolve[a[n] == (a[n + 1] a[n - 1])^2, a[n], n]
Out[10]=
RSolve can solve not only ordinary difference equations in which the arguments of a differ
by integers, but also q-difference equations in which the arguments of a are related by
multiplicative factors.
Out[11]=
Out[12]=
Out[13]=
Just as one can set up partial differential equations that involve functions of several variables,
so one can also set up partial recurrence equations that involve multidimensional sequences.
Just as in the differential equations case, general solutions to partial recurrence equations can
involve undetermined functions.
Out[14]=
1.9 ALGORITHM VISUALIZATION
In addition to the mathematical and empirical analyses of algorithms, there is yet a third way
to study algorithms. It is called algorithm visualization and can be defined as the use of images
to convey some useful information about algorithms. That information can be a visual
illustration of an algorithm’s operation, of its per-formance on different kinds of inputs, or of
its execution speed versus that of other algorithms for the same problem. To accomplish this
goal, an algorithm visualiza-tion uses graphic elements—points, line segments, two- or three-
dimensional bars, and so on—to represent some “interesting events” in the algorithm’s
operation.
Static algorithm visualization shows an algorithm’s progress through a series of still images.
Algorithm animation, on the other hand, shows a continuous, movie-like presentation of an
algorithm’s operations. Animation is an arguably more sophisticated option, which, of course,
is much more difficult to implement.
Early efforts in the area of algorithm visualization go back to the 1970s. The watershed event
happened in 1981 with the appearance of a 30-minute color sound film titled Sorting Out
Sorting. This algorithm visualization classic was produced at the University of Toronto by
Ronald Baecker with the assistance of D. Sherman [Bae81, Bae98]. It contained visualizations
of nine well-known sorting algorithms (more than half of them are discussed later in the book)
and provided quite a convincing demonstration of their relative speeds.
The success of Sorting Out Sorting made sorting algorithms a perennial fa-vorite for algorithm
animation. Indeed, the sorting problem lends itself quite naturally to visual presentation via
vertical or horizontal bars or sticks of different heights or lengths, which need to be rearranged
according to their sizes (Figure 2.8). This presentation is convenient, however, only for
illustrating actions of a typical sorting algorithm on small inputs. For larger files, Sorting Out
Sorting used the ingenious idea of presenting data by a scatterplot of points on a coordinate
plane, with the first coordinate representing an item’s position in the file and the second one
representing the item’s value; with such a representation, the process of sorting looks like a
transformation of a “random” scatterplot of points into the points along a frame’s diagonal
(Figure 2.9). In addition, most sorting algorithms
work by comparing and exchanging two given items at a time—an event that can be animated
relatively easily.
Since the appearance of Sorting Out Sorting, a great number of algorithm animations have been
created, especially after the appearance of Java and the
World Wide Web in the 1990s. They range in scope from one particular algorithm to a group
of algorithms for the same problem (e.g., sorting) or the same applica-tion area (e.g., geometric
algorithms) to general-purpose animation systems. At the end of 2010, a catalog of links to
existing visualizations, maintained under the NSF-supported AlgoVizProject, contained over
500 links. Unfortunately, a survey of existing visualizations found most of them to be of low
quality, with the content heavily skewed toward easier topics such as sorting [Sha07].
There are two principal applications of algorithm visualization: research and education.
Potential benefits for researchers are based on expectations that algo-rithm visualization may
help uncover some unknown features of algorithms. For example, one researcher used a
visualization of the recursive Tower of Hanoi algo-rithm in which odd- and even-numbered
disks were colored in two different colors. He noticed that two disks of the same color never
came in direct contact during the algorithm’s execution. This observation helped him in
developing a better non-recursive version of the classic algorithm. To give another example,
Bentley and McIlroy [Ben93] mentioned using an algorithm animation system in their work on
improving a library implementation of a leading sorting algorithm.
To summarize, although some successes in both research and education have been reported in
the literature, they are not as impressive as one might expect. A deeper understanding of human
perception of images will be required before the true potential of algorithm visualization is
fulfilled.