Lossless Data Compression
Lossless Data Compression
Lecture 1
Data Compression
Lossless data compression: Store/Transmit big files using few bytes so that the original files can be perfectly retrieved. Example: zip.
Lossy data compression: Store/Transmit big files using few bytes so that the original files can be approximately retrieved. Example: mp3.
Motivation: Save storage space and/or bandwidth.
Definition of Codec
Let be an alphabet and let S * be a set of possible messages. A lossless codec (c,d) consists of
A coder c : S ! {0,1}* A decoder d: {0,1}* ! * 8 x 2 S: d(c(x))=x
so that
Remarks
It is necessary for c to be an injective map. If we do not worry about efficiency, we dont have to specify d if we have specified c. Terminology: Some times we just say code rather than codec.
Terminology: The set c(S) is called the set of code words of the codec. In examples to follow, we often just state the set of code words.
Proposition
Let S = {0,1}n. Then, for any codec (c,d) there is some x 2 S, so that |c(x)| n. Compression is impossible
Proposition
For any message x, there is a codec (c,d) so that |c(x)|=1. The Encyclopedia Britannica can be compressed to 1 bit.
Remarks
We cannot compress all data. Thus, we must concentrate on compressing relevant data. It is trivial to compress data known in advance. We should concentrate on compressing data about which there is uncertainty.
We will use probability theory as a tool to model uncertainty about relevant data.
!!!!!!!
Random data can be compressed well on the average!
. There is something fishy going on
Example: Fixed length codes (such as ascii). Example: {0,11,10} All codes in this course will be prefix codes.
Proposition
If c is a prefix code for S = 1 then cn is a prefix code for S = n where cn(x1 x2 .. xn) = c(x1) c(x2) . c(xn)
[0,1/2)
[3/4,1) 1
Kraft-McMillan Inequality
Let m1, m2, be the lengths of the code words of a prefix code. Then, 2- mi 1. Let m1, m2, be integers with 2- mi 1. Then there is prefix code c so that {mi} are the lengths of the code words of c.
Probability
A probability distribution p on S is a map p: S ! [0,1] so that x 2 S p(x) = 1. A U-valued stochastic variable is a map Y: S ! U.
Self-entropy
Given a probability distribution p on S, the self-entropy of x 2 S is the defined as H(x) = log2 p(x). The self-entropy of a message with probability 1 is 0 bits. The self-entropy of a message with probability 0 is +1. The self-entropy of a message with probabiltiy is 1 bit. We often measure entropy is unit bits
Entropy
Given a probability distribution p on S, its entropy H[p] is defined as E[H], i.e. H[p] = x 2 S p(x) log2 p(x). For a stochastic variable X, its entropy H[X] is the entropy of its underlying distribution: H[X] = i Pr[X=i] log2 Pr[X=i]
Facts
The entropy of the uniform distribution on {0,1}n is n bits. Any other distribution on {0,1}n has strictly smaller entropy. If X1 and X2 are independent stochastic variables, then H(X1, X2) = H(X1) + H(X2). For any function f, H(f(X)) H(X).
Shannons theorem
Let S be a set of messages and let X be an Svalued stochastic variable. For all prefix codes c on S, E[ |c(X)| ] H[X]. There is a prefix code c on S so that E[ |c(X)| ] < H[X] + 1 In fact, for all x in S, |c(x)| < H[x] + 1.