0% found this document useful (0 votes)
8 views

Prob RV Opt Basics

This document provides an overview of key concepts in probability and random variables including: 1) Random variables are used to analyze and characterize random phenomena by assigning real numbers to outcomes of experiments. Common distributions include normal, binomial, and uniform. 2) Probability is defined in two ways - relative frequency and classical. It must satisfy axioms like being greater than 0 and the probability of the sample space equals 1. 3) Important concepts include independent and conditional probability, covariance and correlation of random variables, and the central limit theorem which shows normalized sums of independent random variables converge to a normal distribution. 4) Optimization techniques are important in machine learning to find optimal parameter values for functions relating inputs and outputs,

Uploaded by

jfdweij
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Prob RV Opt Basics

This document provides an overview of key concepts in probability and random variables including: 1) Random variables are used to analyze and characterize random phenomena by assigning real numbers to outcomes of experiments. Common distributions include normal, binomial, and uniform. 2) Probability is defined in two ways - relative frequency and classical. It must satisfy axioms like being greater than 0 and the probability of the sample space equals 1. 3) Important concepts include independent and conditional probability, covariance and correlation of random variables, and the central limit theorem which shows normalized sums of independent random variables converge to a normal distribution. 4) Optimization techniques are important in machine learning to find optimal parameter values for functions relating inputs and outputs,

Uploaded by

jfdweij
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Few Basics in Probability, Random

Variables, matrix and optimization


Probability and random variables
• Signals: classified as deterministic or random

• Random: Unpredictable, Cannot predict the


value at time t with certainty.

• Consider x(t )  A cos( 2f t   )


1 If A, .. are
known x(t) known for all t. So deterministic
• If x(t) is generated from an oscillator with poor
frequency stability, output is random

• Another example: Tossing a coin (outputs are


uncertain)

• There are math tools available for analysis and


characterization of random phenomenon
• To do this: We need concept of probability,
random variables, random process (Not
discussed here)
Probability Concepts
• Some definitions:
Random experiment: An experiment whose outcomes
not known in advance
Outcome: Result of an experiment
Event: outcome or set of outcomes
Sample space: Set of all possible outcomes
Sample point: Outcome of an experiment
ME or disjoint events: A and B events are ME if they
cannot occur together
• Null event: Event with no sample points. If A
and B are ME then AB=null event
• Certain event: Consists of all outcomes

• Definition of probability:
1. Relative frequency
2. Classical
• Relative frequency definition:
n=number of times random expt repeated
n A = number of times event A occurs
 nA 
P( A)  lim  
n   n 

For small n P(A) fluctuates, but tend to limiting


value for large n. Empirical approach
• Classical definition:
No experiment required.
NA
P ( A) 
N
N= number of possible outcomes
N A = outcomes favourable to A
Assumption: Events are equally likely!
• Axioms ( )of probability: Properties that
statements accepted

probability has to satisfy


• For each event we assign a real number P(A),
that should satisfy
1. P ( A)  0
2 P( S )  1
3 if A  S , B  S , AB  
P( A  B)  P ( A)  P( B )
• Suppose A and B are not mutually exclusive,
then?
P(A+B)=P(A)+P(B)-P(AB)

If A, B and C are not ME

P(A+B+C)=P(A)+P(B)+P(C)-P(AB)-P(BC)-P(AC)+P(ABC)
Conditional Probability
P( A / B )  P ( AB) / P ( B )
P( B / A)  B ( AB) / P ( A)
P( A / B )  [ P ( B / A) P ( A)] / P ( B ) known as Bayes ' Theorem

• Bayes Theorem extensively used while solving


problems in ML area.

• ML and MAP estimations are based on this


Independent events
• If A and B are independent,
P(A/B)=P(A)
P(B/A)=P(B)
That makes P(AB)=P(A)P(B)
• If A, B, and C are independent, we should have
P(AB)=P(A)P(B)
P(BC)=P(B)P(C)
P(CA)=P(C)P(A) and P(ABC)=P(A)P(B)P(C)
Random variable
• A random variable (RV) defines a function
whose domain is the sample space of random
experiment and whose range is the real line

• X S --- R

• So a RV assigns a number X(f) for every


outcome f of an experiment
• Let the experiment be that of throwing a die

• Sample space consists of 6 faces of die i.e., 6


sample points

• Let the transformation be X ( f i )  10i


Then the random variable denoted by X takes
6 values.
Distribution and density functions
• One can study the behaviour of random
variable by using CDF and PDF

FX ( x)  P( X  x)
• Cumulative distribution function:

• FX () Is a function whose domain is real line and


range [0,1]
• A density function is obtained by
differentiating CDF

• Some of the distributions used in practice:


• Discrete: Binomial, Poisson, Geometric, etc
• Continuous: Uniform, Normal or Gaussian,
Gamma, Beta, Cauchy, Rayleigh etc
Gaussian random variable
• Single random variable then univariate pdf
given by

• When we have multiple random variables (n),


the multivariate pdf is given by
• Covariance matrix: If N RVs the matrix has a
size of NXN (Note: previous slide n is used)

Cov( X 1 , X 1 ) Cov( X 1 , X 2 )...........Cov( X 1 , X N ) 


Cov( X , X ) Cov( X , X )...........Cov( X , X ) 
 2 1 2 2 2 N 
........................................................................ 
 
.................... .
CX   
 
 
......................................................................... 
Cov( X , X ) Cov( X , X ).........Cov( X , X )
 N 1 N 2 N N

 
Cov( X 1 , X 1 )  Var ( X 1 )  E ( X 1  m X 1 ) 2
Cov( X 1 , X 2 )  E ( X 1  m X 1 )( X 2  m X 2 )
• E represents expectation or mean
E ( X )   xi P ( X  xi )
i

• If RVs are uncorrelated COV is 0. If independent they are


uncorrelated, not the other way

• Independence requires joint density function=product of


marginal densities.
• Properties of co-variance matrix/Correlation
matrix: (For Correlation, consider non-mean
subtracted RVs)

T
Symmetric C X  C X

Positive semi definite a C a  0


T
X

Eigen values greater than or equal to zero


Eigen vectors are orthogonal
Some points on eigen values and eigen
vectors
• Consider a square matrix. Eigen vectors x are those
which satisfy T[x]=scalar multiplied by x
• i.e., Ax  x, xis the eigen vector corresponding to
eigen value 


• How do we find ?

| ( A  I ) | 0
• Make why?
• Eigen values of a covariance matrix are always
positive why?
• Question then is:
Do the inverse of covariance matrix always
exist?

First check: What is the condition for existence


of inverse of a matrix?

Is covariance matrix diagonalizable?


Matrix Diagonalization
• One can represent a square matrix as:
AN  N  U N  N  N  NU 1 N  N

This is called diagonalization of A matrix


We have N eigen values and N eigen vectors each
of size NX1. So U is a eigen vector matrix.
Here is  diagonal matrix with diagonal entries as
eigen values.
Can any square matrix be decomposed as above?
Central Limit Theorem
• To understand this: Need to know IID RVs.
• IID is: Independent and identically distributed
To state the theorem: Consider n IID random
variables
Take the sum of these i.e.,
S n  X 1  X 2  .........  X n
• Now the RV S n has mean nand variance as
n 2

Zn
• Form normalized RV
S n  n
Zn 
as n

This random variable has Gaussian distribution with


zero mean and variance 1.
Convex Optimization
(You may refer – book/youtube lecture by Stephen Boyd)

• In ML we often use optimization techniques to find the required


parameters/weights of a function relating the input and output

• Given a function f in parameters, we minimize or maximize the same for


searching the optimum parameter values

• Minimization/Maximizaton can be done using optimization techniques.

• Optimization is necessary (to find optimal parameters) in many ML


techniques- Ex: linear regression, logistic regression, SVM to name a few.
• In general finding a global optimum solution of a
function with number of unknowns (as
parameters) is difficult and computationally
taxing. This is true especially for DNN (we end up
getting non-convex function)

• But for the class of problems in ML involving


convex functions, one can find the global solution
that too with less computational complexity.
Convex Sets
• A set C is convex set if for any two points x and
y belonging to C and   R, 0  1
the linear combination also belongs to C

x  (1   ) y  C
• Intuitively: If we take two points in C and draw
a line, every point in that line also belongs to C
Examples
n
• All of R
• The set || x || 1
• Affine subspace: The set of all x belonging to
n
R Ax  b
Such that
• Intersection of convex sets is also convex
• Positive semidefinite matrices
Convex function
• A function f : R  R is convex if its domain is a
n

convex set denoted by say D( f ) and if for all


x, y  D ( f ) and   R, 0    1

f (x  (1   ) y )  f ( x)  (1   ) f ( y )

Intuitively value of function at combined point


is less than or equal to value on the straight
line
• We say the function as strictly convex if equal
to is not considered.

• If f is convex, -f will be concave.


First order condition for convexity
• Draw a tangent at any point on the line. Every
point on this line should be below the
corresponding point on the function
Second order condition
• Hessian must be positive semidefinite. For example:

f ( x1 , x2 )  x12  x22
 2 f 2 f 
 2 

 1 x x1 x 2   Hessian of f x , x   2 0
 2 f 1 2 0 2 
2 f   
 2 
 x2 x1  x2 

• For positive semidefinite we want 2 0  a1 


[a1, a2 ]    0
0 2   a2 
• If Hessian is negative semidefinite then concave function.
• If one variable function, second derivative greater or zero
Examples of convex functions
• Exponential function f ( x)  e ax

second derivative is positive for all x, so convex


, Second derivative is …, so convex
f ( x)   log( x)
Affine functions: Hessian is 0
T
f ( x)  b x  c
So, both +ve semi and –ve semidefinite. Hence
Affine functions are concave as well as convex.
Quadratic functions: T1 T
f ( x)  x Ax  b x  c
Hessian is ?, so convex/non2convex is determined
by positive semidefinite of Hessian

You might also like