0% found this document useful (0 votes)

183 views

Shannon Entropy

Uploaded by

Alma Oračević

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

183 views

Shannon Entropy

Uploaded by

Alma Oračević

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Information entropy

From Wikipedia, the free encyclopedia

(Redirected from Shannon entropy)

Entropy is a concept in thermodynamics (see entropy), statistical
mechanics and information theory. Both concepts of entropy have
deep links with one another, although it took many years for the
development of the theories of statistical mechanics and information
theory to make this connection apparent. This article is about
information entropy, the information-theoretic formulation of
entropy. Information entropy is occasionally called Shannon's
entropy in honor of Claude E. Shannon, who formulated many of
the key ideas of information theory.

Contents

1 Introduction
2 Formal definitions
2.1 Relationship to thermodynamic entropy
2.2 Entropy as information content
2.3 Data compression
2.4 Limitations of entropy as information content
2.5 Data as a Markov process
2.6 Alternative definition
3 Efficiency
4 Derivation of Shannon's entropy
5 Properties of Shannon's information entropy
6 Extending discrete entropy to the continuous case:
differential entropy
7 References
8 See also
9 External links

Entropy of a Bernoulli trial as a

function of success probability,
often called the binary entropy
function

Introduction
The concept of entropy in information theory describes how much information there is in a signal or
event. Shannon introduced the idea of information entropy in his 1948 paper "A Mathematical Theory of
Communication".
An intuitive understanding of information entropy relates to the amount of uncertainty about an event
associated with a given probability distribution. As an example, consider a box containing many coloured
balls. If the balls are all of different colours and no colour predominates, then our uncertainty about the
colour of a randomly drawn ball is maximal. On the other hand, if the box contains more red balls than
any other colour, then there is slightly less uncertainty about the result: the ball drawn from the box has
more chances of being red (if we were forced to place a bet, we would bet on a red ball). Telling someone
the colour of every new drawn ball provides them with more information in the first case than it does in
the second case, because there is more uncertainty about what might happen in the first case than there is
in the second. Intuitively, if we know the number of balls remaining, and they are all of one color, then
there is no uncertainty about what the next ball drawn will be, and therefore there is no information
content from drawing the ball. As a result, the entropy of the "signal" (the sequence of balls drawn, as
calculated from the probability distribution) is higher in the first case than in the second.

Shannon, in fact, defined entropy as a measure of the average information content associated with a
random outcome.
Shannon's definition of information entropy makes this intuitive distinction mathematically precise. His
definition satisfies these desiderata:

The measure should be continuous i.e., changing the value of one of the probabilities by a very
small amount should only change the entropy by a small amount.
If all the outcomes (ball colours in the example above) are equally likely, then entropy should be
maximal. In this case, the entropy increases with the number of outcomes.
If the outcome is a certainty, then the entropy should be zero.
The amount of entropy should be the same independently of how the process is regarded as being
divided into parts.

(Note: The Shannon/Weaver book makes reference to Tolman (1938) who in turn credits Pauli (1933)
with the definition of entropy Shannon used. Elsewhere in statistical mechanics, the literature includes
references to von Neumann having derived the same form of entropy in 1927, which may explain why
von Neumann favoured the use of the existing term 'entropy'.)

Formal definitions
Shannon defines entropy in terms of a discrete random variable X, with possible states (or outcomes)
x1...xn as:

where
is the probability of the ith outcome of X.
That is, the entropy of the variable X is the sum, over all possible outcomes xi of X, of the product of the
probability of outcome xi times the log of the inverse of the probability of xi (which is also called xi's
surprisal - the entropy of X is the expected value of its outcome's surprisal). We can also apply this to a
general probability distribution, rather than a discrete-valued event.
Shannon shows that any definition of entropy satisfying his assumptions will be of the form:

where K is a constant (and is really just a choice of measurement units).

Shannon's definition of entropy, when applied to an information source, can determine the minimum
channel capacity required to reliably transmit the source as encoded binary digits. The formula can be
derived by calculating the mathematical expectation of the amount of information contained in a digit
from the information source. See also Shannon-Hartley theorem.
Shannon's entropy measure came to be taken as a measure of the uncertainty about the realization of a
random variable. It thus served as a proxy capturing the concept of information contained in a message as
opposed to the portion of the message that is strictly determined (hence predictable) by inherent

structures. For example, redundancy in language structure or statistical properties relating to the
occurrence frequencies of letter or word pairs, triplets etc. See Markov chain.

Relationship to thermodynamic entropy

Shannon's definition of entropy is closely related to thermodynamic entropy as defined in physics and
chemistry. Boltzmann and Gibbs did considerable work on statistical thermodynamics, which became the
inspiration for adopting the word entropy in information theory. There are relationships between
thermodynamic and informational entropy. In fact, in the view of Jaynes (1957), thermodynamics should
be seen as an application of Shannon's information theory: the thermodynamic entropy is interpreted as
being an estimate of the amount of further Shannon information (needed to define the detailed
microscopic state of the system) that remains uncommunicated by a description solely in terms of the
macroscopic variables of classical thermodynamics. For example, adding heat to a system increases its
thermodynamic entropy because it increases the number of possible microscopic states that it could be in,
thus making any complete state description longer. (See article: maximum entropy thermodynamics).
Maxwell's demon (hypothetically) reduces the thermodynamic entropy of a system using information
about the states of individual molecules; however, the demon himself increases his own entropy in the
process, and so the total entropy does not decrease (which resolves the paradox).

Entropy as information content

It is important to remember that entropy is a quantity defined in the context of a probabilistic model for a
data source. Independent fair coin flips have an entropy of 1 bit per flip. A source that always generates a
long string of A's has an entropy of 0, since the next character will always be an 'A'.
The entropy rate of a data source means the average number of bits per symbol needed to encode it.
Empirically, it seems that entropy of English text is between .6 and 1.3 bits per character, though clearly
that will vary from one source of text to another. Shannon's experiments with human predictors show an
information rate of between .6 and 1.3 bits per character (http://marknelson.us/2006/08/24/the-hutterprize/#comment-293), depending on the experimental setup; the PPM compression algorithm can achieve
a compression ratio of 1.5 bits per character.
From the preceding example, note the following points:
1. The amount of entropy is not always an integer number of bits.
2. Many data bits may not convey information. For example, data structures often store information
redundantly, or have identical sections regardless of the information in the data structure.

Data compression
Entropy effectively bounds the performance of the strongest lossless (or nearly lossless) compression
possible, which can be realized in theory by using the typical set or in practice using Huffman, LempelZiv or arithmetic coding. The performance of existing data compression algorithms is often used as a
rough estimate of the entropy of a block of data.

Limitations of entropy as information content

Although entropy is often used as a characterization of the information content of a data source, this
information content is not absolute: it depends crucially on the probabilistic model. A source that always
generates the same symbol has an entropy of 0, but the definition of what a symbol is depends on the
alphabet. Consider a source that produces the string ABABABABAB... in which A is always followed by
B and vice versa. If the probabilistic model considers individual letters as independent, the entropy rate of
the sequence is 1 bit per character. But if the sequence is considered as "AB AB AB AB AB..." with

symbols as two-character blocks, then the entropy rate is 0 bits per character.
However, if we use very large blocks, then the estimate of per-character entropy rate may become
artificially low. This is because in reality, the probability distribution of the sequence is not knowable
exactly; it is only an estimate. For example, suppose one considers the text of every book ever published
as a sequence, with each symbol being the text of a complete book. If there are N published books, and
each book is only published once, the estimate of the probability of each book is 1/N, and the entropy (in
bits) is -log2 N. As a practical code, this corresponds to assigning each book a unique identifier and using
it in place of the text of the book whenever one wants to refer to the book. This is enormously useful for
talking about books, but it is not so useful for characterizing the information content of an individual
book, or of language in general: it is not possible to reconstruct the book from its identifier without
knowing the probability distribution, that is, the complete text of all the books. The key idea is that the
complexity of the probabilistic model must be considered. Kolmogorov complexity is a theoretical
generalization of this idea that allows the consideration of the information content of a sequence
independent of any particular probability model; it considers the shortest program for a universal
computer that outputs the sequence. A code that achieves the entropy rate of a sequence for a given
model, plus the codebook (i.e. the probabilistic model), is one such program, but it may not be the
shortest.

Data as a Markov process

A common way to define entropy for text is based on the Markov model of text. For an order-0 source
(each character is selected independent of the last characters), the binary entropy is:

where pi is the probability of i. For a first-order Markov source (one in which the probability of selecting
a character is dependent only on the immediately preceding character), the entropy rate is:

where i is a state (certain preceding characters) and pi(j) is the probability of j given i as the previous
character (s).
For a second order Markov source, the entropy rate is

In general the b-ary entropy of a source = (S,P) with source alphabet S = {a1, , an} and discrete
probability distribution P = {p1, , pn} where pi is the probability of ai (say pi = p(ai)) is defined by:

Note: the b in "b-ary entropy" is the number of different symbols of the "ideal alphabet" which is being
used as the standard yardstick to measure source alphabets. In information theory, two symbols are
necessary and sufficient for an alphabet to be able to encode information, therefore the default is to let b =
2 ("binary entropy"). Thus, the entropy of the source alphabet, with its given empiric probability

distribution, is a number equal to the number (possibly fractional) of symbols of the "ideal alphabet", with
an optimal probability distribution, necessary to encode for each symbol of the source alphabet. Also note
that "optimal probability distribution" here means a uniform distribution: a source alphabet with n
symbols has the highest possible entropy (for an alphabet with n symbols) when the probability
distribution of the alphabet is uniform. This optimal entropy turns out to be
.

Alternative definition
Another way to define the entropy function H (not using the Markov model) is by proving that H is
uniquely defined (as earlier mentioned) if and only if H satisfies the following conditions:
1. H(p1, , pn) is defined and continuous for all p1, , pn where pi
[0,1] for all i = 1, , n and p1 +
+ pn = 1. (Remark that the function solely depends on the probability distribution, not the alphabet.)
2. For all positive integers n, H satisfies

3. For positive integers bi where b1 + + bk = n, H satisfies

This last functional relationship characterizes the entropy of a system with sub-systems and is in a sense
the most important of the three. It demands that the entropy of a system can be calculated from the
entropy of its sub-systems if we know how the sub-systems interract with each other.
Assume that we have an ensemble of n elements with a uniform distribution on them. If we mentally
divide this ensemble into k boxes (sub-systems) with bi elements in each, the entropy can be calculated as
a sum of individual entropies of the boxes weighed by the probability of finding oneself in that particular
box PLUS the entropy of the system of boxes.

Efficiency
A source alphabet encountered in practice should be found to have a probability distribution which is less
than optimal. If the source alphabet has n symbols, then it can be compared to an "optimized alphabet"
with n symbols, whose probability distribution is uniform. The ratio of the entropy of the source alphabet
with the entropy of its optimized version is the efficiency of the source alphabet, which can be expressed
as a percentage.
This implies that the efficiency of a source alphabet with n symbols can be defined simply as being equal
to its n-ary entropy. See also Redundancy (information theory).

Derivation of Shannon's entropy

Since the entropy was given as a definition, it does not need to be derived. On the other hand, a
"derivation" can be given which gives a sense of the motivation for the definition as well as the link to
thermodynamic entropy.

Q. Given a roulette with n pockets which are all equally likely to be landed on by the ball, what is the
probability of obtaining a distribution (A1, A2, , An) where Ai is the number of times pocket i was
landed on and

is the total number of ball-landing events?

A. The probability is a multinomial distribution, viz.

where

is the number of possible combinations of outcomes (for the events) which fit the given distribution, and

is the number of all possible combinations of outcomes for the set of P events.
Q. And what is the entropy?
A. The entropy of the distribution is obtained from the logarithm of :

The summations can be approximated closely by being replaced with integrals:

The integral of the logarithm is

So the entropy is

By letting px = Ax/P and doing some simple algebra we obtain:

and the term (1 n) can be dropped since it is a constant, independent of the px distribution. The result is
.
Thus, the Shannon entropy is a consequence of the equation

which relates to Boltzmann's definition,

,
of thermodynamic entropy, where k is the Boltzmann constant.

Properties of Shannon's information entropy

We write H(X) as Hn(p1,...,pn). The Shannon entropy satisfies the following properties:

For any n, Hn(p1,...,pn) is a continuous and symmetric function on variables p1, p2,...,pn.
Event of probability zero does not contribute to the entropy, i.e. for any n,
.

Entropy is maximized when the probability distribution is uniform. For all n,

Following from the Jensen inequality,

are non-negative real numbers summing up to one, and

, then

.
If we partition the mn outcomes of the random experiment into m groups with each group containing n
elements, we can do the experiment in two steps: first, determine the group to which the actual outcome
belongs; then, find the outcome in that group. The probability that you will observe group i is qi. The
conditional probability distribution function for group i is pi1/qi,...,pin/qi). The entropy

is the entropy of the probability distribution conditioned on group i. This property means that the total
information is the sum of the information gained in the first step, Hm(q1,..., qn), and a weighted sum of
the entropies conditioned on each group.
Khinchin in 1957 showed that the only function satisfying the above assumptions is of the form
,
where k is a positive constant representing the desired unit of measurement.

Extending discrete entropy to the continuous case: differential

entropy
The Shannon entropy is restricted to finite sets. It seems that the formula

where f denotes a probability density function on the real line, is analogous to the Shannon entropy and
could thus be viewed as an extension of the Shannon entropy to the domain of real numbers. Formula (*)
is usually referred to as the continuous entropy, or differential entropy. Although the analogy between
both functions is suggestive, the following question must be set: is the Boltzmann entropy a valid
extension of the Shannon entropy? To answer this question, we must establish a connection between the
two functions:
We wish to obtain a generally finite measure as the bin size goes to zero. In the discrete case, the bin size
is the (implicit) width of each of the n (finite or infinite) bins whose probabilities are denoted by pn. As
we generalize to the continuous domain, we must make this width explicit.
To do this, start with a continuous function f discretized as shown in the figure. As the figure indicates, by
the mean-value theorem there exists a value xi in each bin such that

and thus the integral of the function f can be approximated (in the Riemannian sense) by

where this limit and bin size goes to zero are equivalent.
We will denote

and expanding the logarithm, we have

, we have

and so

But note that

continuous entropy:

, therefore we need a special definition of the differential or

which is, as said before, referred to as the differential entropy. This means that the differential entropy is
not a limit of the Shannon entropy for n
It turns out as a result that, unlike the Shannon entropy, the differential entropy is not in general a good
measure of uncertainty or information. For example, the differential entropy can be negative; also it is not
invariant under continuous co-ordinate transformations.
More useful for the continuous case is the relative entropy of a distribution, defined as the KullbackLeibler divergence from the distribution to a reference measure m(x),

The relative entropy carries over directly from discrete to continuous distributions, and is invariant under

co-ordinate reparametrisations.

References
This article incorporates material from Shannon's entropy on PlanetMath, which is licensed under the
GFDL.

Information is not entropy, information is not uncertainty !

(http://www.lecb.ncifcrf.gov/~toms/information.is.not.uncertainty.html) - a discussion of the use of
the terms "information" and "entropy".
I'm Confused: How Could Information Equal Entropy?
(http://www.ccrnp.ncifcrf.gov/~toms/bionet.info-theory.faq.html#Information.Equal.Entropy) - a
similar discussion on the bionet.info-theory FAQ.
Description of information entropy from "Tools for Thought" by Howard Rheingold
(http://www.rheingold.com/texts/tft/6.html)
A java applet representing Shannon's Experiment to Calculate the Entropy of English
(http://math.ucsd.edu/~crypto/java/ENTROPY/)

Retrieved from "http://en.wikipedia.org/wiki/Information_entropy"

Categories: PlanetMath sourced articles | Entropy | Information theory | Statistics | Randomness

This page was last modified 16:16, 24 January 2007.

All text is available under the terms of the GNU Free
Documentation License. (See Copyrights for details.)
Wikipedia is a registered trademark of the Wikimedia
Foundation, Inc., a US-registered 501(c)(3) tax-deductible
nonprofit charity.

Physical Reservoir Computing With Emerging Electronics
No ratings yet
Physical Reservoir Computing With Emerging Electronics
14 pages
An Introduction To Transfer Entropy - Information Flow in Complex Systems
No ratings yet
An Introduction To Transfer Entropy - Information Flow in Complex Systems
210 pages
Abarbanel - 1996 - Analysis of Observed Chaotic Data PDF
No ratings yet
Abarbanel - 1996 - Analysis of Observed Chaotic Data PDF
277 pages
Fundamentals of The Monte Carlo Method
100% (1)
Fundamentals of The Monte Carlo Method
348 pages
1 Autocorrelation
No ratings yet
1 Autocorrelation
38 pages
Ultrasound Beam Forming
No ratings yet
Ultrasound Beam Forming
48 pages
Daniel Dubin-Numerical and Analytical Methods For Scientists and Engineers, Using Mathematica-Wiley-Interscience (2003)
No ratings yet
Daniel Dubin-Numerical and Analytical Methods For Scientists and Engineers, Using Mathematica-Wiley-Interscience (2003)
647 pages
A Study of The Morphology of Gunshot Entrance Wounds, in Connection With Their Dynamic Creation, Utilizing The Skin-Skull-Brain Model''
100% (1)
A Study of The Morphology of Gunshot Entrance Wounds, in Connection With Their Dynamic Creation, Utilizing The Skin-Skull-Brain Model''
5 pages
Pe 1974 10 PDF
100% (1)
Pe 1974 10 PDF
102 pages
Resonance Magazine, March 1996
No ratings yet
Resonance Magazine, March 1996
127 pages
Complex Systems, Fractals and Chaos
No ratings yet
Complex Systems, Fractals and Chaos
6 pages
An Introduction to Matrices, Sets and Groups for Science Students
From Everand
An Introduction to Matrices, Sets and Groups for Science Students
G. Stephenson
5/5 (1)
Aarr Feb-2018 - 1
No ratings yet
Aarr Feb-2018 - 1
20 pages
Scott Stephens - Game Peoples Play PDF
No ratings yet
Scott Stephens - Game Peoples Play PDF
68 pages
Nonlinear Dynamics
No ratings yet
Nonlinear Dynamics
374 pages
Altmann - Rotations, Quaternions and Double Matrices
No ratings yet
Altmann - Rotations, Quaternions and Double Matrices
312 pages
Large-Scale Deep Reinforcement Learning
No ratings yet
Large-Scale Deep Reinforcement Learning
6 pages
mathematica-beyond-mathematics-the-wolfram-language-in-the-real-world-2nbsped-2022030387-2022030388-9781032004839-9781032010236-9781003176800_compress
No ratings yet
mathematica-beyond-mathematics-the-wolfram-language-in-the-real-world-2nbsped-2022030387-2022030388-9781032004839-9781032010236-9781003176800_compress
459 pages
Closed World Assumption: Fundamentals and Applications
From Everand
Closed World Assumption: Fundamentals and Applications
Fouad Sabry
No ratings yet
Towards Neural Numeric-To-Text Generation From Temporal Personal Health Data
No ratings yet
Towards Neural Numeric-To-Text Generation From Temporal Personal Health Data
5 pages
Introduction To Chaos - Theoretical and Numerical Methods - Carlo F Barenghi
No ratings yet
Introduction To Chaos - Theoretical and Numerical Methods - Carlo F Barenghi
149 pages
From The Atomic Bomb To The Landau Institute - Autobiography. Top Non-Secret - I. Khalatnikov (Springer, 2012) WW
No ratings yet
From The Atomic Bomb To The Landau Institute - Autobiography. Top Non-Secret - I. Khalatnikov (Springer, 2012) WW
226 pages
Modern Solvers For Helmholtz Problems
No ratings yet
Modern Solvers For Helmholtz Problems
247 pages
5 Ways To Fix Statistics
No ratings yet
5 Ways To Fix Statistics
3 pages
The Numerical Methods For Solving Schrödinger Equation
No ratings yet
The Numerical Methods For Solving Schrödinger Equation
102 pages
Systems Biology
From Everand
Systems Biology
Robert A. Meyers
No ratings yet
Why Information Can Not Be The Basis of Reality Scientific American Blog Network
No ratings yet
Why Information Can Not Be The Basis of Reality Scientific American Blog Network
4 pages
A Concise Guide To Compositional Data Analysis
100% (2)
A Concise Guide To Compositional Data Analysis
134 pages
An Introduction To Fractional Calculus
No ratings yet
An Introduction To Fractional Calculus
29 pages
Transmission Electron Micros
No ratings yet
Transmission Electron Micros
52 pages
A Twisted Octahedron: Tensegrity Models
No ratings yet
A Twisted Octahedron: Tensegrity Models
2 pages
Counterpropagation Networks
No ratings yet
Counterpropagation Networks
6 pages
Brownian Motion: Langevin Equation
100% (2)
Brownian Motion: Langevin Equation
14 pages
Game Theory PDF
No ratings yet
Game Theory PDF
21 pages
The Fluorescent Protein Revolution
No ratings yet
The Fluorescent Protein Revolution
340 pages
Exploring Computer Science With Scheme PDF
No ratings yet
Exploring Computer Science With Scheme PDF
2 pages
A Gentle Introduction To Backpropagation
100% (1)
A Gentle Introduction To Backpropagation
15 pages
Entropy in Thermodynamics and Info PDF
No ratings yet
Entropy in Thermodynamics and Info PDF
6 pages
Entropy (Information Theory)
No ratings yet
Entropy (Information Theory)
3 pages
Entropy (Information Theory)
No ratings yet
Entropy (Information Theory)
17 pages
Entr5
No ratings yet
Entr5
2 pages
1805.11965 Edward Witten
No ratings yet
1805.11965 Edward Witten
40 pages
A Mini-Introduction To Information Theory: Edward Witten
No ratings yet
A Mini-Introduction To Information Theory: Edward Witten
39 pages
What Is Shannon Information
No ratings yet
What Is Shannon Information
33 pages
Information Theory
No ratings yet
Information Theory
108 pages
Entropy and Uncertainty
No ratings yet
Entropy and Uncertainty
15 pages
G 1533 Information Science and Biology
No ratings yet
G 1533 Information Science and Biology
15 pages
Entropy Optimization Principles and Their Applications
No ratings yet
Entropy Optimization Principles and Their Applications
18 pages
A Mini-Introduction To Information Theor PDF
No ratings yet
A Mini-Introduction To Information Theor PDF
40 pages
A Mini-Introduction To Information Theor PDF
No ratings yet
A Mini-Introduction To Information Theor PDF
40 pages
B Richards Fall 2006 Report
No ratings yet
B Richards Fall 2006 Report
4 pages
Entropy: Low Entropy High Entropy
No ratings yet
Entropy: Low Entropy High Entropy
11 pages
shannon-entropy-a-rigorous-notion-at-the-crossroads-between-probability-information-theory-dynamical-systems-and-statistical-physics
No ratings yet
shannon-entropy-a-rigorous-notion-at-the-crossroads-between-probability-information-theory-dynamical-systems-and-statistical-physics
63 pages
A Brief Introduction On Shannon's Information Theory
No ratings yet
A Brief Introduction On Shannon's Information Theory
12 pages
Entropy: Quantum Entropy and Its Applications To Quantum Communication and Statistical Physics
No ratings yet
Entropy: Quantum Entropy and Its Applications To Quantum Communication and Statistical Physics
52 pages
Maximum Entropy Probability Distribution
No ratings yet
Maximum Entropy Probability Distribution
9 pages
Entropie Eng PDF
No ratings yet
Entropie Eng PDF
6 pages
Information Theory and Coding - Chapter 2
0% (1)
Information Theory and Coding - Chapter 2
41 pages
Entropy - Scholarpedia
No ratings yet
Entropy - Scholarpedia
15 pages
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
No ratings yet
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
16 pages
Instant Access to Systems Biology of Cell Signaling Recurring Themes and Quantitative Models 1st Edition James E. Ferrell ebook Full Chapters
100% (3)
Instant Access to Systems Biology of Cell Signaling Recurring Themes and Quantitative Models 1st Edition James E. Ferrell ebook Full Chapters
50 pages
COM 1202 Introduction To Programming PAPER
No ratings yet
COM 1202 Introduction To Programming PAPER
5 pages
IGCSE Files Telegram Channel: Thursday 08 October 2020
No ratings yet
IGCSE Files Telegram Channel: Thursday 08 October 2020
32 pages
Column Foundation Connection
No ratings yet
Column Foundation Connection
3 pages
ChE3111 - Chapter 6
No ratings yet
ChE3111 - Chapter 6
52 pages
Early Language, Literacy and Numeracy: Rylle Louie D. Aporado Delia Y. Alocillo, Ed. D
No ratings yet
Early Language, Literacy and Numeracy: Rylle Louie D. Aporado Delia Y. Alocillo, Ed. D
6 pages
Finding The Glory in The Stuggle, Audrey Weeks
No ratings yet
Finding The Glory in The Stuggle, Audrey Weeks
2 pages
filtered_qb
No ratings yet
filtered_qb
3 pages
Numerical Methods Practical File
No ratings yet
Numerical Methods Practical File
45 pages
Data Structures and Algorithms The Basic Toolbox Corrections and Remarks
No ratings yet
Data Structures and Algorithms The Basic Toolbox Corrections and Remarks
7 pages
EE301-Control Systems: Assignment-Compensators 27 July 2020
No ratings yet
EE301-Control Systems: Assignment-Compensators 27 July 2020
2 pages
Preliminary Exam: DR Samuel Palermo Younghoon Song: A. Serial PRBS Generation and Error Detection
No ratings yet
Preliminary Exam: DR Samuel Palermo Younghoon Song: A. Serial PRBS Generation and Error Detection
27 pages
On Euler'S Number E: Avery I. Mcintosh
No ratings yet
On Euler'S Number E: Avery I. Mcintosh
5 pages
Instant Ebooks Textbook Electromagnetic Waves and Antennas, Exercise Book 1st Edition Sophocles J. Orfanidis Download All Chapters
No ratings yet
Instant Ebooks Textbook Electromagnetic Waves and Antennas, Exercise Book 1st Edition Sophocles J. Orfanidis Download All Chapters
53 pages
HELM Workbook 41 Hypothesis Testing
No ratings yet
HELM Workbook 41 Hypothesis Testing
42 pages
Sim Write Up
No ratings yet
Sim Write Up
13 pages
Tutorial 2 FSM
No ratings yet
Tutorial 2 FSM
4 pages
Fortran i Abaqus
No ratings yet
Fortran i Abaqus
2 pages
Lecture - Scalar Vector Fields
No ratings yet
Lecture - Scalar Vector Fields
4 pages
Aakash Institute: NCERT Solutions For Class 10 Maths Chapter 7 Triangles Excercise: 7.1
No ratings yet
Aakash Institute: NCERT Solutions For Class 10 Maths Chapter 7 Triangles Excercise: 7.1
36 pages
The Influence of Store Environment On Customer Satisfaction Acros
No ratings yet
The Influence of Store Environment On Customer Satisfaction Acros
223 pages
Genius Publications List
25% (4)
Genius Publications List
5 pages
Ai Principles Uncertainty
No ratings yet
Ai Principles Uncertainty
21 pages
Winbugs: A Tutorial: Anastasia Lykou and Ioannis Ntzoufras
No ratings yet
Winbugs: A Tutorial: Anastasia Lykou and Ioannis Ntzoufras
12 pages
Determining Vision-Reality Gap
No ratings yet
Determining Vision-Reality Gap
11 pages
Burt 2007
No ratings yet
Burt 2007
10 pages
Fuxictr: An Open Benchmark For Click-Through Rate Prediction
No ratings yet
Fuxictr: An Open Benchmark For Click-Through Rate Prediction
9 pages
Laws of Motion
No ratings yet
Laws of Motion
27 pages
Directional Power Explained
No ratings yet
Directional Power Explained
4 pages
GCSE Maths Topic Checklist - Higher
No ratings yet
GCSE Maths Topic Checklist - Higher
2 pages

Shannon Entropy

Uploaded by

Shannon Entropy

Uploaded by

Information entropy

From Wikipedia, the free encyclopedia

(Redirected from Shannon entropy)

Entropy of a Bernoulli trial as a

where K is a constant (and is really just a choice of measurement units).

Relationship to thermodynamic entropy

Entropy as information content

Limitations of entropy as information content

Data as a Markov process

3. For positive integers bi where b1 + + bk = n, H satisfies

Derivation of Shannon's entropy

is the total number of ball-landing events?

The summations can be approximated closely by being replaced with integrals:

The integral of the logarithm is

By letting px = Ax/P and doing some simple algebra we obtain:

which relates to Boltzmann's definition,

Properties of Shannon's information entropy

Entropy is maximized when the probability distribution is uniform. For all n,

Following from the Jensen inequality,

are non-negative real numbers summing up to one, and

Extending discrete entropy to the continuous case: differential

and expanding the logarithm, we have

But note that

, therefore we need a special definition of the differential or

Information is not entropy, information is not uncertainty !

Retrieved from "http://en.wikipedia.org/wiki/Information_entropy"

This page was last modified 16:16, 24 January 2007.

You might also like