Dip Maths Pre
Dip Maths Pre
Dr. Hariprasad.S
Asst. Professor
Dept. of CINTEL
• A is square if m= n.
• A is diagonal if all off-diagonal elements are 0, and not all
diagonal elements are 0.
• A is the identity matrix ( I ) if it is diagonal and all diagonal
elements are 1.
• A is the zero or null matrix ( 0 ) if all its elements are 0.
• The trace of A equals the sum of the elements along its main
diagonal.
• Two matrices A and B are equal iff the have the same
number of rows and columns, and aij = bij .
Definitions (Con’t)
is defined as
We are interested particularly in real vector spaces of real m×1 column matrices.
We denote such spaces by m , with vector addition and multiplication by scalars
being as defined earlier for matrices. Vectors (column matrices) in m are written as
Vectors and Vector Spaces (Con’t)
Example
The vector space with which we are most familiar is the two-dimensional real vector
space 2 , in which we make frequent use of graphical representations for operations
such as vector addition, subtraction, and multiplication by a scalar. For instance,
consider the two vectors
Example (Con’t)
The following figure shows the familiar graphical representation of the preceding
vector operations, as well as multiplication of vector a by scalar c = −0.5.
Vectors and Vector Spaces (Con’t)
There are numerous norms that are used in practice. In our work, the norm most
often used is the so-called 2-norm, which, for a vector x in real m, space is defined
as
which is recognized as the Euclidean distance from the origin to point x; this gives the
expression the familiar name Euclidean norm. The expression also is recognized as
the length of a vector x, with origin at point 0. From earlier discussions, the norm
also can be written as
Vector Norms (Con’t)
where is the angle between vectors x and y. From these expressions it follows
that the inner product of two vectors can be written as
Thus, the inner product can be expressed as a function of the norms of the
vectors and the angle between the vectors.
Vector Norms (Con’t)
From the preceding results, two vectors in m are orthogonal if and only if their
inner product is zero. Two vectors are orthonormal if, in addition to being
orthogonal, the length of each vector is 1.
From the concepts just discussed, we see that an arbitrary vector a is turned into a
vector an of unit length by performing the operation an = a/||a||. Clearly, then,
||an|| = 1.
A set of vectors is said to be an orthogonal set if every two vectors in the set are
orthogonal. A set of vectors is orthonormal if every two vectors in the set are
orthonormal.
Some Important Aspects of Orthogonality
Definition: The eigenvalues of a real matrix M are the real numbers for which
there is a nonzero vector e such that
Me = e.
The eigenvectors of M are the nonzero vectors e for which there is a real number
such that Me = e.
Numerous theoretical and truly practical results in the application of matrices and
vectors stem from this beautifully simple definition.
Eigenvalues & Eigenvectors (Con’t)
and
The following properties, which we give without proof, are essential background in
the use of vectors and matrices in digital image processing. In each case, we
assume a real matrix of order m×m although, as stated earlier, these results are
equally applicable to complex numbers.
Example
Suppose that we have a random population of vectors, denoted by {x}, with
covariance matrix (see the review of probability):
From Property 6, we know that Cy=ACxAT is a diagonal matrix with the eigenvalues of Cx
along its main diagonal. The elements along the main diagonal of a covariance matrix
are the variances of the components of the vectors in the population. The off diagonal
elements are the covariances of the components of these vectors.
The fact that Cy is diagonal means that the elements of the vectors in the population {y}
are uncorrelated (their covariances are 0). Thus, we see that application of the linear
transformation y = Ax involving the eigenvectors of Cx decorrelates the data, and the
elements of Cy along its main diagonal give the variances of the components of the y's
along the eigenvectors. Basically, what has
Eigenvalues & Eigenvectors (Con’t)
been accomplished here is a coordinate transformation that aligns the data along
the eigenvectors of the covariance matrix of the population.
The preceding concepts are illustrated in the following figure. Part (a) shows a
data population {x} in two dimensions, along with the eigenvectors of Cx (the black
dot is the mean). The result of performing the transformation y=A(x − mx) on the
x's is shown in Part (b) of the figure.
The fact that we subtracted the mean from the x's caused the y's to have zero
mean, so the population is centered on the coordinate system of the transformed
data. It is important to note that all we have done here is make the eigenvectors
the
Eigenvalues & Eigenvectors (Con’t)
new coordinate system (y1,y2). Because the covariance matrix of the y's is
diagonal, this in fact also decorrelated the data. The fact that the main data
spread is along e1 is due to the fact that the rows of the transformation matrix A
were chosen according the order of the eigenvalues, with the first row being the
eigenvector corresponding to the largest eigenvalue.
Eigenvalues & Eigenvectors (Con’t)
30-01-2025 Dr. HARIPRASAD S, AP/CINTEL
Objective
With reference to the following figure, we define a system as a unit that converts an
input function f(x) into an output (or response) function g(x), where x is an independent
variable, such as time or, as in the case of images, spatial position. We assume for
simplicity that x is a continuous variable, but the results that will be derived are equally
applicable to discrete variables.
Some Definitions (Con’t)
It is required that the system output be determined completely by the input, the
system properties, and a set of initial conditions. From the figure in the previous page,
we write
for all fi(x) and fj(x) belonging to {f(x)}, where the a's are arbitrary constants and
The system described by a linear operator is called a linear system (with respect to
the same class of inputs as the operator). The property that performing a linear
process on the sum of inputs is the same that performing the operations individually
and then summing the results is called the property of additivity. The property that
the response of a linear system to a constant times an input is the same as the
response to the original input multiplied by a constant is called the property of
homogeneity.
Some Definitions (Con’t)
An operator H is called time invariant (if x represents time), spatially invariant (if x is
a spatial variable), or simply fixed parameter, for some class of inputs {f(x)} if
for all fi(x) {f(x)} and for all x0. A system described by a fixed-parameter operator is
said to be a fixed-parameter system. Basically all this means is that offsetting the
independent variable of the input by x0 causes the same offset in the independent
variable of the output. Hence, the input-output relationship remains the same.
Some Definitions (Con’t)
Finally, a linear system H is said to be stable if its response to any bounded input is
bounded. That is, if
Example: Suppose that operator H is the integral operator between the limits − and
x. Then, the output in terms of the input is given by
Example: Consider now the system operator whose output is the inverse of the input
so that
In this case,
so this system is not linear. The system, however, is fixed parameter and causal.
Linear System Characterization-Convolution
From the previous sections, the output of a system is given by g(x) = H[f(x)]. But, we
can express f(x) in terms of the impulse just defined, so
System Characterization (Con’t)
The term
is called the impulse response of H. In other words, h(x, ) is the response of the linear
system to a unit impulse located at coordinate x (the origin of the impulse is the value
of that produces (0); in this case, this happens when = x).
System Characterization (Con’t)
The expression
is called the superposition (or Fredholm) integral of the first kind. This expression is
a fundamental result that is at the core of linear system theory. It states that, if the
response of H to a unit impulse [i.e., h(x, )], is known, then response to any input f
can be computed using the preceding integral. In other words, the response of a
linear system is characterized completely by its impulse response.
System Characterization (Con’t)
This expression is called the convolution integral. It states that the response of a
linear, fixed-parameter system is completely characterized by the convolution of the
input with the system impulse response. As will be seen shortly, this is a powerful and
most practical result.
System Characterization (Con’t)
In other words,
System Characterization (Con’t)
The term inside the inner brackets is the Fourier transform of the term h(x − ). But,
System Characterization (Con’t)
so,
We have succeeded in proving the important result that the Fourier transform of the
convolution of two functions is the product of their Fourier transforms. As noted
below, this result is the foundation for linear filtering
System Characterization (Con’t)
Following a similar development, it is not difficult to show that the inverse Fourier
transform of the convolution of H(u) and F(u) [i.e., H(u)*F(u)] is the product f(x)g(x).
This result is known as the convolution theorem, typically written as
and
where " " is used to indicate that the quantity on the right is obtained by taking the
Fourier transform of the quantity on the left, and, conversely, the quantity on the left
is obtained by taking the inverse Fourier transform of the quantity on the right.
System Characterization (Con’t)
The mechanics of convolution are explained in detail in the book. We have just filled in
the details of the proof of validity in the preceding paragraphs.
if we take the Fourier transform of both sides of this expression, it follows from the
convolution theorem that
System Characterization (Con’t)
These results are the basis for all the filtering work done in Chapter 4, and some of the
work in Chapter 5 of Digital Image Processing. Those chapters extend the results to
two dimensions, and illustrate their application in considerable detail.
Objective
The set of all integers less than 10 is specified using the notation
which we read as "C is the set of integers such that each members of the set is less
than 10." The "such that" condition is denoted by the symbol “ | “ . As shown in the
previous two equations, the elements of the set are enclosed by curly brackets.
The set with no elements is called the empty or null set, denoted in this review by the
symbol Ø.
Sets and Set Operations (Con’t)
Two sets A and B are said to be equal if and only if they contain the same
elements. Set equality is denoted by
If the elements of two sets are not the same, we say that the sets are not equal,
and denote this by
Finally, we consider the concept of a universal set, which we denote by U and define
to be the set containing all elements of interest in a given situation. For example, in
an experiment of tossing a coin, there are two possible (realistic) outcomes: heads or
tails. If we denote heads by H and tails by T, the universal set in this case is {H,T}.
Similarly, the universal set for the experiment of throwing a single die has six possible
outcomes, which normally are denoted by the face value of the die, so in this case U
= {1,2,3,4,5,6}. For obvious reasons, the universal set is frequently called the sample
space, which we denote by S. It then follows that, for any set A, we assume that Ø
A S, and for any element a, a S and a Ø.
Some Basic Set Operations
The operations on sets associated with basic probability theory are straightforward.
The union of two sets A and B, denoted by
The difference of two sets A and B, denoted A − B, is the set of elements that
belong to A, but not to B. In other words,
Set Operations (Con’t)
The union operation is applicable to multiple sets. For example the union of sets
A1,A2,…,An is the set of points that belong to at least one of these sets. Similar
comments apply to the intersection of multiple sets.
It often is quite useful to represent sets and sets operations in a so-called Venn
diagram, in which S is represented as a rectangle, sets are represented as areas
(typically circles), and points are associated with elements. The following
example shows various uses of Venn diagrams.
Example: The following figure shows various examples of Venn diagrams. The
shaded areas are the result (sets of points) of the operations indicated in the figure.
The diagrams in the top row are self explanatory. The diagrams in the bottom row
are used to prove the validity of the expression
The term nH/n is called the relative frequency of the event we have denoted by H,
and similarly for nT/n. If we performed the tossing experiment a large number of
times, we would find that each of these relative frequencies tends toward a stable,
limiting value. We call this value the probability of the event, and denoted it by
P(event).
Relative Frequency & Prob. (Con’t)
In the current discussion the probabilities of interest are P(H) and P(T). We know in
this case that P(H) = P(T) = 1/2. Note that the event of an experiment need not signify
a single outcome. For example, in the tossing experiment we could let D denote the
event "heads or tails," (note that the event is now a set) and the event E, "neither
heads nor tails." Then, P(D) = 1 and P(E) = 0.
That is, the probability of an event is a positive number bounded by 0 and 1. For
the certain event, S,
Relative Frequency & Prob. (Con’t)
Here the certain event means that the outcome is from the universal or sample set,
S. Similarly, we have that for the impossible event, Sc
This is the probability of an event being outside the sample set. In the example
given at the end of the previous paragraph, S = D and Sc = E.
Relative Frequency & Prob. (Con’t)
The event that either events A or B or both have occurred is simply the union of A
and B (recall that events can be sets). Earlier, we denoted the union of two sets by A
B. One often finds the equivalent notation A+B used interchangeably in
discussions on probability. Similarly, the event that both A and B occurred is given by
the intersection of A and B, which we denoted earlier by A B. The equivalent
notation AB is used much more frequently to denote the occurrence of both events
in an experiment.
Relative Frequency & Prob. (Con’t)
Suppose that we conduct our experiment n times. Let n1 be the number of times
that only event A occurs; n2 the number of times that B occurs; n3 the number of
times that AB occurs; and n4 the number of times that neither A nor B occur.
Clearly, n1+n2+n3+n4=n. Using these numbers we obtain the following relative
frequencies:
Relative Frequency & Prob. (Con’t)
and
If A and B are mutually exclusive it follows that the set AB is empty and,
consequently, P(AB) = 0.
Relative Frequency & Prob. (Con’t)
The relative frequency of event A occurring, given that event B has occurred, is given
by
This conditional probability is denoted by P(A/B), where we note the use of the
symbol “ / ” to denote conditional occurrence. It is common terminology to refer
to P(A/B) as the probability of A given B.
Relative Frequency & Prob. (Con’t)
We call this relative frequency the probability of B given A, and denote it by P(B/A).
Relative Frequency & Prob. (Con’t)
and
which is known as Bayes' theorem, so named after the 18th century mathematician
Thomas Bayes.
Relative Frequency & Prob. (Con’t)
so,
If A and B are statistically independent, then P(B/A) = P(B) and it follows that
and
It was stated earlier that if sets (events) A and B are mutually exclusive, then A B
= Ø from which it follows that P(AB) = P(A B) = 0. As was just shown, the two
sets are statistically independent if P(AB)=P(A)P(B), which we assume to be
nonzero in general. Thus, we conclude that for two events to be statistically
independent, they cannot be mutually exclusive.
Relative Frequency & Prob. (Con’t)
and
Relative Frequency & Prob. (Con’t)
In general, for N events to be statistically independent, it must be true that, for all
combinations 1 i j k . . . N
Relative Frequency & Prob. (Con’t)
Example: (a) An experiment consists of throwing a single die twice. The probability
of any of the six faces, 1 through 6, coming up in either experiment is 1/6. Suppose
that we want to find the probability that a 2 comes up, followed by a 4. These two
events are statistically independent (the second event does not depend on the
outcome of the first). Thus, letting A represent a 2 and B a 4,
We would have arrived at the same result by defining "2 followed by 4" to be a
single event, say C. The sample set of all possible outcomes of two throws of a
die is 36. Then, P(C)=1/36.
Relative Frequency & Prob. (Con’t)
Example (Con’t): (b) Consider now an experiment in which we draw one card from
a standard card deck of 52 cards. Let A denote the event that a king is drawn, B
denote the event that a queen or jack is drawn, and C the event that a diamond-
face card is drawn. A brief review of the previous discussion on relative frequencies
would show that
and
Relative Frequency & Prob. (Con’t)
and
Events A and B are mutually exclusive (we are drawing only one card, so it would be
impossible to draw a king and a queen or jack simultaneously). Thus, it follows from
the preceding discussion that P(AB) = P(A B) = 0 [and also that P(AB) P(A)P(B)].
Relative Frequency & Prob. (Con’t)
Example (Con’t): (c) As a final experiment, consider the deck of 52 cards again,
and let A1, A2, A3, and A4 represent the events of drawing an ace in each of four
successive draws. If we replace the card drawn before drawing the next card, then
the events are statistically independent and it follows that
Relative Frequency & Prob. (Con’t)
Example (Con’t): Suppose now that we do not replace the cards that are drawn.
The events then are no longer statistically independent. With reference to the
results in the previous example, we write
Thus we see that not replacing the drawn card reduced our chances of drawing
fours successive aces by a factor of close to 10. This significant difference is perhaps
larger than might be expected from intuition.
Random Variables
Random variables often are a source of confusion when first encountered. This
need not be so, as the concept of a random variable is in principle quite simple. A
random variable, x, is a real-valued function defined on the events of the sample
space, S. In words, for each event in S, there is a real number that is the
corresponding value of the random variable. Viewed yet another way, a random
variable maps each event in S onto the real line. That is it. A simple, straightforward
definition.
Random Variables (Con’t)
Part of the confusion often found in connection with random variables is the fact
that they are functions. The notation also is partly responsible for the problem. In
other words, although typically the notation used to denote a random variable is as
we have shown it here, x, or some other appropriate variable, to be strictly formal, a
random variable should be written as a function x(·) where the argument is a
specific event being considered. However, this is seldom done, and, in our
experience, trying to be formal by using function notation complicates the issue
more than the clarity it introduces. Thus, we will opt for the less formal notation,
with the warning that it must be keep clearly in mind that random variables are
functions.
Random Variables (Con’t)
Example: Consider again the experiment of drawing a single card from a standard
deck of 52 cards. Suppose that we define the following events. A: a heart; B: a
spade; C: a club; and D: a diamond, so that S = {A, B, C, D}. A random variable is
easily defined by letting x = 1 represent event A, x = 2 represent event B, and so on.
Note the important fact in the examples just given that the probability of the events
have not changed; all a random variable does is map events onto the real line.
Random Variables (Con’t)
Thus far we have been concerned with random variables whose values are discrete.
To handle continuous random variables we need some additional tools. In the
discrete case, the probabilities of events are numbers between 0 and 1. When
dealing with continuous quantities (which are not denumerable) we can no longer
talk about the "probability of an event" because that probability is zero. This is not
as unfamiliar as it may seem. For example, given a continuous function we know
that the area of the function between two limits a and b is the integral from a to b of
the function. However, the area at a point is zero because the integral from,say, a to
a is zero. We are dealing with the same concept in the case of continuous random
variables.
Random Variables (Con’t)
Thus, instead of talking about the probability of a specific value, we talk about the
probability that the value of the random variable lies in a specified range. In
particular, we are interested in the probability that the random variable is less than
or equal to (or, similarly, greater than or equal to) a specified constant a. We write
this as
If this function is given for all values of a (i.e., − < a < ), then the values of random
variable x have been defined. Function F is called the cumulative probability
distribution function or simply the cumulative distribution function (cdf). The
shortened term distribution function also is used.
Random Variables (Con’t)
Observe that the notation we have used makes no distinction between a random
variable and the values it assumes. If confusion is likely to arise, we can use more
formal notation in which we let capital letters denote the random variable and
lowercase letters denote its values. For example, the cdf using this notation is
written as
When confusion is not likely, the cdf often is written simply as F(x). This notation
will be used in the following discussion when speaking generally about the cdf of a
random variable.
Random Variables (Con’t)
Due to the fact that it is a probability, the cdf has the following properties:
The probability density function (pdf) of random variable x is defined as the derivative
of the cdf:
The term density function is commonly used also. The pdf satisfies the following
properties:
Random Variables (Con’t)
The preceding concepts are applicable to discrete random variables. In this case,
there is a finite no. of events and we talk about probabilities, rather than
probability density functions. Integrals are replaced by summations and,
sometimes, the random variables are subscripted. For example, in the case of a
discrete variable with N possible values we would denote the probabilities by P(xi),
i=1, 2,…, N.
Random Variables (Con’t)
In Sec. 3.3 of the book we used the notation p(rk), k = 0,1,…, L - 1, to denote the
histogram of an image with L possible gray levels, rk, k = 0,1,…, L - 1, where p(rk) is the
probability of the kth gray level (random event) occurring. The discrete random
variables in this case are gray levels. It generally is clear from the context whether one is
working with continuous or discrete random variables, and whether the use of
subscripting is necessary for clarity. Also, uppercase letters (e.g., P) are frequently used
to distinguish between probabilities and probability density functions (e.g., p) when
they are used together in the same discussion.
Random Variables (Con’t)
where the subscripts on the p's are used to denote the fact that they are different
functions, and the vertical bars signify the absolute value. A function T(x) is
monotonically increasing if T(x1) < T(x2) for x1 < x2, and monotonically decreasing if
T(x1) > T(x2) for x1 < x2. The preceding equation is valid if T(x) is an increasing or
decreasing monotonic function.
Expected Value and Moments
The expected value is one of the operations used most frequently when working with
random variables. For example, the expected value of random variable x is obtained by
letting g(x) = x:
when x is discrete. The expected value of x is equal to its average (or mean) value,
hence the use of the equivalent notation and m.
Expected Value & Moments (Con’t)
and
for continuous and discrete random variables, respectively. The square root of the
variance is called the standard deviation, and is denoted by .
Expected Value & Moments (Con’t)
We can continue along this line of thought and define the nth central moment of a
continuous random variable by letting
and
for discrete variables, where we assume that n 0. Clearly, µ0=1, µ1=0, and µ2=². The
term central when referring to moments indicates that the mean of the random variables
has been subtracted out. The moments defined above in which the mean is not
subtracted out sometimes are called moments about the origin.
Expected Value & Moments (Con’t)
In image processing, moments are used for a variety of purposes, including histogram
processing, segmentation, and description. In general, moments are used to
characterize the probability density function of a random variable. For example, the
second, third, and fourth central moments are intimately related to the shape of the
probability density function of a random variable. The second central moment (the
centralized variance) is a measure of spread of values of a random variable about its
mean value, the third central moment is a measure of skewness (bias to the left or
right) of the values of x about the mean value, and the fourth moment is a relative
measure of flatness. In general, knowing all the moments of a density specifies that
density.
Expected Value & Moments (Con’t)
In this case, we define a random variable directly as the value of the distances in our
sample set. Computing the mean of the random variable indicates whether, on
average, we are shooting high or low. If the mean is zero, we know that the average of
our shots are on the line. However, the mean does not tell us how far our shots
deviated from the horizontal. The variance (or standard deviation) will give us an idea
of the spread of the shots. A small variance indicates a tight grouping (with respect to
the mean, and in the vertical position); a large variance indicates the opposite. Finally,
a third moment of zero would tell us that the spread of the shots is symmetric about
the mean value, a positive third moment would indicate a high bias, and a negative
third moment would tell us that we are shooting low more than we are shooting high
with respect to the mean location.
The Gaussian Probability Density Function
Because of its importance, we will focus in this tutorial on the Gaussian probability
density function to illustrate many of the preceding concepts, and also as the basis for
generalization to more than one random variable. The reader is referred to Section
5.2.2 of the book for examples of other density functions.
where m and are as defined in the previous section. The term normal also is used
to refer to the Gaussian density. A plot and properties of this density function are
given in Section 5.2.2 of the book.
The Gaussian PDF (Con’t)
which, as before, we interpret as the probability that the random variable lies
between minus infinite and an arbitrary value x. This integral has no known closed-
form solution, and it must be solved by numerical or other approximation methods.
Extensive tables exist for the Gaussian cdf.
Several Random Variables
In the previous example, we used a single random variable to describe the behavior
of rifle shots with respect to a horizontal line passing through the bull's-eye in the
target. Although this is useful information, it certainly leaves a lot to be desired in
terms of telling us how well we are shooting with respect to the center of the target.
In order to do this we need two random variables that will map our events onto the
xy-plane. It is not difficult to see how if we wanted to describe events in 3-D space
we would need three random variables. In general, we consider in this section the
case of n random variables, which we denote by x1, x2,…, xn (the use of n here is not
related to our use of the same symbol to denote the nth moment of a random
variable).
Several Random Variables (Con’t)
It is convenient to use vector notation when dealing with several random variables.
Thus, we represent a vector random variable x as
when using vectors. As before, when confusion is not likely, the cdf of a random
variable vector often is written simply as F(x). This notation will be used in the
following discussion when speaking generally about the cdf of a random variable
vector.
As in the single variable case, the probability density function of a random variable
vector is defined in terms of derivatives of the cdf; that is,
Several Random Variables (Con’t)
When working with any two random variables (any two elements of x) it is common
practice to simplify the notation by using x and y to denote the random variables. In
this case the joint moment just defined becomes
It is easy to see that k0 is the kth moment of x and 0q is the qth moment of y, as
defined earlier.
Several Random Variables (Con’t)
The moment 11 = E[xy] is called the correlation of x and y. As discussed in Chapters 4
and 12 of the book, correlation is an important concept in image processing. In fact, it
is important in most areas of signal processing, where typically it is given a special
symbol, such as Rxy:
Several Random Variables (Con’t)
If the condition
holds, then the two random variables are said to be uncorrelated. From our earlier
discussion, we know that if x and y are statistically independent, then p(x, y) = p(x)p(y),
in which case we write
Thus, we see that if two random variables are statistically independent then they
are also uncorrelated. The converse of this statement is not true in general.
Several Random Variables (Con’t)
The joint central moment of order kq involving random variables x and y is defined
as
where mx = E[x] and my = E[y] are the means of x and y, as defined earlier. We note
that
and
By direct expansion of the terms inside the expected value brackets, and recalling the
mx = E[x] and my = E[y], it is straightforward to show that
From our discussion on correlation, we see that the covariance is zero if the random
variables are either uncorrelated or statistically independent. This is an important
result worth remembering.
Several Random Variables (Con’t)
If we divide the covariance by the square root of the product of the variances we
obtain
The quantity is called the correlation coefficient of random variables x and y. It can
be shown that is in the range −1 1 (see Problem 12.5). As discussed in Section
12.2.1, the correlation coefficient is used in image processing for matching.
The Multivariate Gaussian Density
where, for example, xi is the ith component of x and mi is the ith component of m.
The Multivariate Gaussian Density (Con’t)
Covariance matrices are real and symmetric (see the review of matrices and vectors).
The elements along the main diagonal of C are the variances of the elements x, such
that cii= xi². When all the elements of x are uncorrelated or statistically independent,
cij = 0, and the covariance matrix becomes a diagonal matrix. If all the variances are
equal, then the covariance matrix becomes proportional to the identity matrix, with
the constant of proportionality being the variance of the elements of x.
The Multivariate Gaussian Density (Con’t)
with
and
The Multivariate Gaussian Density (Con’t)