IT_w1
IT_w1
1 / 20
Pre-requisites and course objectives
Essential:
Probability and random variables
Communication Systems/Digital Communication
Good to Have:
Optimization
Course Objectives
Define Information and its measures
State and apply Source and Channel Coding Theorems
Define and compute channel capacities of different types of channels
Apply information theory to other domains
History
This subject celebrates the work done by the “Father of the Information
Age“ Claude Shannon [tWp01] . His contributions are deemed parallel to
Einstein. See this video. Everything digital is traced back to him!
2 / 20
Brief recap of Random Variables
Random variables are mappings1 from the sample space to real nos.
X :Ω→R
If Ω is countable, X is a discrete random variable and is characterized
by PMF : pX (x) = P(X = x) 2
P
Expectation: E [X ] = x∈X x p(x), X is the domain/support of X .
Function of a random variable: Z = g (X ) is another random variable.
E [Z ] = E [g (X )] = x∈X g (x) p(x) 3
P
1
In this course, we will consider them as functions as you know, however, there are
some more technicalities associated with it which requires a course in measure theory
which is out of scope.
2
We might omit the subscript X and write p(x) if it is clear from context
3
Try to prove this if g is monotonic and invertible
4
Prove this!, a and b are deterministic while X and Y are random variables, not
necessarily independent
3 / 20
Information Theory [Cov99]
4 / 20
Measures of Information
Self-information
The only function that satisfies the axioms are logarithmic functions. a
Joint Entropy
For a system, there is an i/p and o/p both of which can be treated to be
random. So, we extend the definition to two RVs.
XX
H(X , Y ) = −E [log p(X , Y )] = − p(x, y ) log p(x, y )
x y
6 / 20
Conditional Entropy
Evidently, as we extend the definition to two random variables,
conditionals also come into play.
XX
H(X |Y ) = −E [log p(X |Y )] = − p(x, y ) log p(x|y )
x y
Exercise: Can you show that if this condition is satisfied for a continuous
function, for any point x ∈ [x1 , x2 ], it must be that f ′′ (x) ≥ 0?
Concave Function
A function f is concave if −f is convex.
Convex Set
A set C is convex if for any two
points x1 , x2 ∈ C ,
tx1 + (1 − t)x2 ∈ C ∀t ∈ [0, 1]
12 / 20
1
Why Convexity?
Local and Global Minima
x ∗ is a global minima of f (x) if f (x ∗ ) ≤ f (x) ∀x ∈ Dom(f )
x ∗ is a local minima of f (x) if ∃ Nϵ (x ∗ ) ⊆ Dom(f ) s.t. f (x ∗ ) ≤ f (x)
∀x ∈ Nϵ (x ∗ ). Nϵ (x ∗ ) is an ϵ - neighbourhood of x ∗ .
Unconstrained Optimization2
An unconstrained optimization is of the form minx f (x). If f (x) is convex,
a local minima is global minima. Condition for local minima is f ′ (x) = 0.
Constrained Optimization
A constrained optimization is of the form minx f (x) s.t. x ∈ C . If f (x) is
convex, and C is a convex set, every local minima is a global minima.
We know convexity using any 2 points. In general, we can also extend this
forPn points. LetPx1 , x2 , . . . , xn be n points P
and f be a convex function,
n n
f ( i=1 θi xi ) ≤ i=1 θi f (xi ), ∀θi ∈ [0, 1], i θi = 1.
Proof: We use mathematical induction for the proof. The base case is
already covered i.e. for n = 2. Let this be true for n, we need to show it is
true for n + 1P as well. Let xn+1 be the new point. As it is true for n = 2,
consider x̄ = ni=1 θi xi to be the other point and hence we should have,
n
X
f (λx̄+(1−λ)xn+1 ) ≤ λf (x̄)+(1−λ)f (xn+1 ) ≤ λ θi f (xi )+(1−λ)f (xn+1 )
i=1
f (E [X ]) ≤ E [f (X )]
Actual
Prediction Rain No Rain
1 3
Rain 8 16
1 10
No Rain 16 16
18 / 20
References
19 / 20
Thank You!
20 / 20