0% found this document useful (0 votes)
194 views24 pages

I. Entropy in Statistical Mechanics.: 03. Boltzmann Entropy, Gibbs Entropy, Shannon Information

The document discusses the concept of entropy in statistical mechanics and information theory. It describes how Boltzmann defined entropy (S) as a function of the number of microscopic configurations of a system that correspond to a given macrostate. Systems evolve towards states of maximum entropy because these macrostates occupy a much larger region of phase space. Gibbs developed a statistical mechanical approach using ensembles and defined entropy based on the probability distribution over phase space. Shannon later introduced information entropy as a measure of information content, where information is greatest when all outcomes are equally likely. Entropy satisfies properties like additivity and reaches a maximum for uniform distributions.

Uploaded by

Tanmoy Roy
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
194 views24 pages

I. Entropy in Statistical Mechanics.: 03. Boltzmann Entropy, Gibbs Entropy, Shannon Information

The document discusses the concept of entropy in statistical mechanics and information theory. It describes how Boltzmann defined entropy (S) as a function of the number of microscopic configurations of a system that correspond to a given macrostate. Systems evolve towards states of maximum entropy because these macrostates occupy a much larger region of phase space. Gibbs developed a statistical mechanical approach using ensembles and defined entropy based on the probability distribution over phase space. Shannon later introduced information entropy as a measure of information content, where information is greatest when all outcomes are equally likely. Entropy satisfies properties like additivity and reaches a maximum for uniform distributions.

Uploaded by

Tanmoy Roy
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

03. Boltzmann Entropy, Gibbs Entropy, Shannon Information. I. Entropy in Statistical Mechanics.

Goal: To explain the behavior of macroscopic systems in terms of the dynamical laws governing their microscopic consituents. In particular: To provide a micro-dynamical explanation of the 2nd Law. 1. Boltzmann's Approach. Consider different "macrostates" of a gas:
Ludwig Boltzmann

Why does the gas prefer to be in the equilibrium macrostate (last one)?

Suppose the gas consists of N identical particles governed by Hamilton's equations of motion (the micro-dynamics). A microstate X of the gas is a specification of the position (3 values) and momentum (3 values) for each of its N particles.
= phase space = 6N-dim space of all possible microstates. E = region of that consists of all microstates with constant energy E.

Hamiltonian dynamics maps initial microstate Xi to final microstate Xf. Can 2nd Law be explained by recourse to this dynamics?

Xi Xf

A macrostate of the gas is a specification of the gas in terms of macroscopic properties (pressure, temperature, volume, etc.). Relation between microstates and macrostates: Macrostates supervene on microstates!
To each microstate there corresponds exactly one macrostate. Many distinct microstates can correspond to the same macrostate.

So: E is partitioned into a finite number of regions corresponding to macrostates, with each microstate X belonging to one macrostate (X).
microstates


X (X)

macrostate

Claim #1: Different macrostates have vastly different sizes. Quantify this by the Boltzmann Entropy: SB((X)) = k log|(X)|
|(X)| = the volume (size) of (X)

What this means: The greater SB, the larger the region of phase space in which X is located. Claim #2: The equilibrium macrostate eq is vastly larger than any other macrostate (in other words, SB obtains its maximum value for eq).
eq

E
very small nonequilibrium macrostates

Thus: SB increases over time because, for any initial microstate Xi, the dynamics will map Xi into eq very quickly, and then keep it there for an extremely long time.

Two Ways to Explain the Approach to Equilibrium: (a) Appeal to Typicality For large N, E is almost entirely filled up with equilibrium microstates. Thus: A system approaches equilibrium because equilibrium microstates are typical and nonequilibrium microstates are atypical.

But: What is it about the dynamics that evolves atypical states to typical states? "If a system is in an atypical microstate, it does not evolve into an equilibrium microstate just because the latter is typical." (Frigg 2009) Need to identify properties of the dynamics that guarantee atypical states evolve into typical states. And: Need to show that these properties are typical. Example (Frigg 2009): If the dynamics is chaotic (in an appropriate sense), then (under certain conditions), any initial microstate Xi will quickly be mapped into eq and remain there for long periods of time.

(b) Appeal to Probabilities Associate probabilities with macrostates: the larger the macrostate, the greater the probability of finding a microstate in it. Thus: A system approaches equilibrium because it evolves from states of lower toward states of higher probability, and the equilibrium state is the state of highest probabililty.

"In most cases, the initial state will be a very unlikely state. From this state the system will steadily evolve towards more likely states until it has finally reached the most likely state, i.e., the state of thermal equilibrium."


w1

P6

w2

w3

P89

Arrangement #1: state of P6 in w1, state of P89 in w3, etc.

Start with the phase space of a single particle. Partition into cells w1, w2, ..., wk of size w. A state of an N-particle system is given by N points in . An arrangement is a specification of which points lie in which cells.


w1

P89

w2

w3

P6

Arrangement #1: state of P6 in w1, state of P89 in w3, etc.

Arrangement #2: state of P89 in w1, state of P6 in w3, etc. Distribution: (1, 0, 2, 0, 1, 1, ...)

Start with the phase space of a single particle. Partition into cells w1, w2, ..., wk of size w. A state of an N-particle system is given by N points in . An arrangement is a specification of which points lie in which cells. A distribution is a specification of how many points (regardless of which ones) lie in each cell. Takes form (n1, n2, ..., nk), where ni = # of points in wi. Note: More than one arrangement can correspond to the same distribution.

How many arrangements G(Di) are compatible with a given distribution Di = (n1, n2, ..., nk)? Answer:
Check: - Let D1 = (N, 0, ..., 0) and D2 = (N1, 1, 0, ..., 0). - G(D1) = N!/N! = 1. (Only one way for all N particles to be in w1.) - G(D2) = N!/(N1)! = N(N1)(N2)1/(N1)(N2)1 = N. (There are N different ways w2 could have one point in it; namely, if P1 was in it, or if P2 was in it, or if P3 was in it, etc...) "The probability of this distribution [Di] is then given by the number of permutations of which the elements of this distribution are capable, that is by the number [G(Di)]. As the most probable distribution, i.e., as the one corresponding to thermal equilibrium, we again regard that distribution for which this expression is maximal..."

G(Di ) =

N! n1 !n2 !nk !

n! = n(n1)(n2)1 = number of sequences that n distinguishable objects can be arranged. 0! = 1

Note: Each distribution Di corresponds to a macrostate Di.


Why? Because a system's macroscopic properties (volume, pressure, temp, etc) only depend on how many particles are in particular microstates, and not on which particles are in which microstates.

What is the size of this macrostate?


Recall: Size of a cell in single-particle phase space is w. So: And: Size of a cell in N-particle phase space E is wN. Each arrangement encodes one cell in E.
number of arrangements compatible with Di

Thus: The size of Di is |Di| =

size of cell corresponding to an arrangement

= G(Di)wN So: The Boltzmann entropy of Di is given by SB(Di) = k log(G(Di)wN) = k log(G(Di)) + Nk log(w) = k log(G(Di)) + const.

SB(Di) = k log(G(Di)) + const.

N! +const. = k log n !n !n ! 1 2 k
= k log(N!) k log(n1!) ... k log(nk!) + const.

Stirling's approx: logn! nlogn n

(Nk logN N) (n1k logn1 + n1) ... (nk lognk + nk) + const.

= k n j logn j + const.
j

n1 + ... + nk = N

Let: pj = nj/N =

probability of finding a randomly chosen microstate in cell wi

Then:

SB (Di ) = Nk p j log p j + const.


j

Intuitively: The biggest value of SB is for the distribution Di for which the pj's are all 1/N; i.e., the equilibrium distribution.

2. Gibbs' Approach. Boltzmann's Schtick: analysis of a single system. Each point of phase space represents a possible state of the system. Gibbs' Schtick: analysis of an ensemble of infinitely many systems. Each point of phase space represents a possible state of one member of the ensemble. The state of the entire ensemble is given by a density function (x, t) on . (x, t)dx is the number of systems in the ensemble whose states lie in the region (x, x + dx). pt (R) =
Willard Gibbs

(x,t)dx =

probability at time t of finding the state of a system in region R

The Gibbs Entropy is then given by:

SG () = k (x,t)log((x,t))dx

Interpretive Issues: Why do low-probability states evolve into high-probability states? Characterizations of the dynamics are, again, required to justify this. How are the probabilities to be interpreted? Ontic probabilities = properties of physical systems - Long run frequencies? - Single-case propensities? Epistemic probabilities = measures of degrees of belief - Objective (rational) degrees of belief? - Subjective degrees of belief?

II. Entropy in Information Theory.


Goal: To construct a measure for the amount of information associated with a message. The amount of info gained from the reception of a message depends on how likely it is.
Claude Shannon

The less likely a message is, the more info gained upon its reception! Let X = {x1, x2, ..., xk} = set of messages. A probability distribution P on X is an assignment P = (p1, p2, ..., pk) of a probability pi = p(xi) to each message xi. Recall: This means pi 0 and p1 + p2 + ... + pk = 1.

A measure of information for X is a real-valued function H(X) : {prob. distributions on X} R, that satisfies:
Continuity. H(p1, ..., pk) is continuous. Additivity. H(p1q1, ..., pkqk) = H(P) + H(Q), for probability distributions P, Q. Monoticity. Info increases with k for uniform distributions: If > k, then H(Q) > H(P), for any P = (1/k, ..., 1/k) and Q = (1/, ..., 1/). Branching. H(p1, ..., pk) is independent of how the process is divided into parts. Bit normalization. The average info gained for two equally likely messages is one bit: H(1/2, 1/2) = 1.

Claim (Shannon 1949): There is exactly one function that satisfies these criteria; namely, the Shannon Entropy (or "Shannon Information"):

H(X ) = pi log pi
i

H(X) is maximal for p1 = p2 = ... = pk = 1/k. H(X) = 0 just when one pi is 1 and the rest are 0. Logarithm is to base 2: logx = y x = 2y.

In what sense is this a measure of information?

1. H(X) as Maximum Amount of Message Compression Let X = {x1, ..., xn} be a set of letters from which we construct the messages. Suppose the messages have N letters a piece. The probability distribution P = (p1, ..., pn) is now over the letter set. Then: A typical sequence of letters will contain p1N occurrences of x1, p1N occurrences of x1, etc. Thus:
The number of distinct typical sequences of letters

N! (p1N )!(p2N )!(pn N )!

So: log
The number of distinct typical sequences of letters

N! = log (p1N )!(p2N )!(pn N )!

Let's simplify the RHS...

N! = log(N !) {log((p N )!) + ... + log((p N )!)} log 1 n (p N )!(p N )!(p N )! 1 1 n


(N log N N ) {(p1N log p1N p1N ) + ... + (pn N log pn N pn N )}

= N {log N 1 p1 log p1 p1 log N + p1 ... pn log pn pn log N + pn }


= N pi log pi
i=1 n

= NH(X) Thus: log So:


The number of distinct typical sequences of letters The number of distinct typical sequences of letters

= NH(X)

= 2NH(X)

So: There are only 2NH(X) possible messages. This means we can encode them using only NH(X) bits.
Check: - 2 possible messages require 1 bit: 0, 1. - 4 possible messages require 2 bits: 00, 01, 10, 11. - etc.

How many bits would be needed to encode an entire message?


First: How many bits are needed to encode n letters? n = #letters x = #bits 2 letters 1 bit: 0, 1 4 letters 2 bits: 00, 01, 10, 11 8 letters 3 bits: 000, 001, 010, 011, 100, 101, 110, 111 So: n = 2x, so x = logn.

So: If we need logn bits for each letter, we'll need Nlogn bits for a sequence of N letters. Thus: Instead of requiring Nlogn bits to encode our messages, we can get by with only NH(X) bits. Thus: H(X) represents the maximum amount that messages drawn from a given set of letters can be compressed.

2. H(X) as a Measure of Uncertainty Suppose P = (p1, ..., pn) is a probability distribution over a set of values {x1, ..., xn} of a random variable X. The expected value E(X) of X is given by E(X ) = pi x i .
i=1 n

Let logpi be the information gained if X is measured to have the value xi.
Recall: The greater pi is, the more certain xi is, and the less information should be associated with it.

Then the expected value of logpi is just the Shannon information:

E(log pi ) = pi log pi = H(X )


i=1

What this means: H(X) tells us our expected information gain upon measuring X.

3. Conditional Entropy A communication channel is a device with a set of input states X = {x1, ..., xn} that are mapped to a set of output states Y = {y1, ..., yn}. Given probability distributions p(xi) and p(yi) on X and Y, and a joint distribution p(xi yi), the conditional probability p(xi|yj) of input xi, given output yj, is defined by

p(x i |y j ) =

p(x i y j ) p(y j )

The conditional Shannon Entropy of X given Y is

p(x i |y j )log p(x i |y j ) H(X|Y ) = p(y j ) i j


What this means: H(X|Y) measures the average uncertainty about the value of an input, given a particular output value.

Suppose we use our channel a very large number N of times. Then: There will be 2NH(X) typical input strings. And: There will be 2NH(X) typical output strings. Thus: There will be 2NH(X Y) typical strings of X and Y values. So:
# of typical input strings that could result in a given output Y

2NH (X Y ) 2NH (Y )

= 2N (H (X Y )H (Y ))

Claim: H(X Y) = H(Y) + H(X|Y) So:


# of typical input strings that could result in a given output Y

= 2NH (X|Y )

If one is trying to use a noisey channel to send a message, then the conditional entropy specifies the # of bits per letter that would need to be sent by an auxiliary noiseless channel in order to correct all the errors due to noise.

Check Claim:

H(X Y ) = p(x i y j )log[p(x i y j )]


i,j

= p(x i y j )log[p(y j )p(x i |y j )]


i,j

= p(x i y j )log[p(y j )] p(x i y j )log[p(x i |y j )]


i,j i,j

= p(y j )log[p(y j )] p(y j )p(x i |y j )log[p(x i |y j )]


j i,j

= H(Y ) + H(X|Y )

p(x
i

y j ) = p(y j )

Interpretive Issues: 1. How should the probabilities p(xi) be interpreted? Emphasis is on uncertainty: The information content of a message xi is a function of how uncertain it is, with respect to the receiver. So: Perhaps the probabilities are epistemic. In particular: p(xi) is a measure of the receiver's degree of belief in the accuracy of message xi. But: The probabilities are set by the nature of the source. If the source is not probabilistic, then p(xi) can be interpreted epistemically. If the source is inherently probabilistic, then p(xi) can be interpreted as the ontic probability that the source produces message xi.

2. How is Shannon Information/Entropy related to other notions of entropy? Thermodynamic entropy: Boltzmann entropy: Gibbs entropy: Shannon entropy:

S = S f Si =

QR T

SB (Di ) = Nk p j log p j + const.


j

SG () = k (x,t)log((x,t))dx

H(X ) = pi log pi
i

Can statistical mechanics be given an information-theoretic foundation? Can the 2nd Law be given an information-theoretic foundation?

You might also like