0% found this document useful (0 votes)
89 views16 pages

Schramm 1955

Uploaded by

baksokikil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views16 pages

Schramm 1955

Uploaded by

baksokikil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

DEVOTED TO RESEARCH STUDIES IN THE HELD OF MASS COMMUNICATIONS

SPRING 1955

Information Theory and


Mass Communication
BY WILBUR SCHRAMM
The Director of the Institute of Communications Research at the
University of Illinois discusses the nature of information theory
and some of its possible applications to research on mass com-
munications. There is an appendix on formulas, how to compute
them, and suggested readings.

FOR MOST OF US, INFORMATION (Taylor). Application has, so far,


theory dates back to Claude Shannon’s stopped short of mass communication.
notable article in the Bell System Tech- There has been a feeling that informa-
nical Journal, in 1948, but its roots are tion theory might help us understand
far older. They reach back at least as what goes on in a newspaper or a
far as the statistical mechanics of broadcast, but we have never been able
Boltsmann and Gibbs, Szilard’s and to say just how.
von Neumann’s treatments of informa- It is proposed in this paper to take a
tion in physics, Nyquist’s and Hartley’s brief overview of information theory
work with communication circuits, itself, then to examine broadly its ap-
Wiener’s development of cybernetics plicability to mass communication, and
and Shannon’s own earlier work on finally to look in more detail at some
switching and mathematical logic. But of the areas of mass communication in
since 1948 the theory has been respon- which it promises to be most helpful.
sible for significant advances in the de-
THE NATURE OF INFORMATION THEORY
sign of electronic “brains” and gover-
nors, and for deeper understanding of Let us be clear at the outset that in-
electronic communication generally. It formation theory is not a theory of
has been applied to the study of biolog- information in the same sense in which
ical processes (Quastler and others), that term is ordinarily used by social
human mental processes (Wiener, Mc- scientists. In fact, as Kellogg Wilson
Culloch and others), mental tests has cogently remarked ( l o ) , it might
(Cronbach and others), psycholinguis- well be called a theory of signal trans-
tics (Miller, Osgood, Wilson and oth- mission. Its highly ingenious mathemat-
ers) and the problem of readability ics are concerned chiefly with the en-
131
Downloaded from jmq.sagepub.com at UCSF LIBRARY & CKM on April 11, 2015
132 JOURNALISM QUARTERLY

tropy or uncertainty of sequences of states depend on its own past opera-


events in a system or related systems. tion. The air that carries sound waves,
Therefore, let us begin by saying what or the metal diaphragm of the micro-
information theory means by “system.” phone, is a structural system. So is the
A system is any part of an informa- sensory system of a human being. But
tion chain which is capable of existing the central nervous system, and espe-
in one or more states, or in which one cially the aspect of it to which we refer
or more events can occur. The vibrat- as the semantic system, is a functional
ing metal diaphragm of a telephone or system. It is capable of learning. It
a microphone is a system. So is the ra- codes and decodes information on the
dio frequency amplifier circuit of a ra- basis of past experience. Incidentally,
dio receiver. So is a telegraph wire. So this is one of the pitfalls in the way of
is the air which carries the pulsations applying information theory mathemat-
of sound waves. So is the basilar mem- ics to human communication. These
brane of the ear. So is the optic nerve. are probability formulas, and if the
So, in a little different sense, is the probabilities are altered-i.e., if any
semantic system of an individual. Each learning takes p l a c e 4 u r i n g the ex-
of these is capable of assuming differ- periment, the events can no longer be
ent states or playing host to different regarded as a stochastic process and
events, and each can be coupled to oth- the formula will not apply. It is there-
er systems to make a communication fore necessary rigidly to control the
chain. learning factor.‘
If information is to be transferred, Systems may be either correspond-
systems must obviously be coupled. We ing or non-corresponding. Correspond-
say systems are coupled when the state ing systems are capable of existing in
of one system depends to some degree identical states. Thus, the sound input
on the state of the system that adjoins of the microphone and the sound out-
it. Thus when a microphone diaphragm put of the loudspeaker are capable of
is depressed so as to cause a coil to cut existing in identical states-therefore,
magnetic lines of force and generate a corresponding. But the air and the dia-
current in a wire, those systems are phragm are not corresponding. Neither
coupled. When light frequencies strike are the diaphragm and the current, or
the eye and cause discharges in the op- the light signal and the central nervous
tic nerve, those systems are coupled. A system.
break in the coupling will obviously We can now say what information
prevent any information from being theory means by communication. Com-
transferred. That is what happens when munication occurs when two corre-
a microwave link goes out during a tel- sponding systems, coupled together
evision broadcast, or when a student’s through one or more non-correspond-
attention wanders in class. ing systems, assume identical states as a
Most human communication chains result of signal transfer along the chain.
contain a large number of coupled sys- Unless the sound that goes into the tel-
tems, and they contain one kind of sys- ephone is reproduced by the sound that
tem which Dr. Shannon has not pri- comes out of the telephone at the other
marily dealt with: the functional, as
opposed to the structural, system. A Which may be accomplished either by keeping
the periods of experimentation very short, or by
functional system is one that lenrns; its using a response already over-learned.

Downloaded from jmq.sagepub.com at UCSF LIBRARY & CKM on April 11, 2015
Information Theory and Mass Communication 133
end of the line, we do not have com- transmitted signal succeeds in reducing
munication. Unless the concept in the the number of equally probable out-
semantic system of Mr. A. is repro- comes at the receiving end by one-half,
duced in the semantic system of Mr. B., one bit of information is said to have
communication has not taken place. been transferred. (Bit comes from bi-
Begging the question of whether a nary digit.) Thus, when you reduce the
meaning as seen by one individual can two equally probable outcomes of a
ever be reproduced exactly by another coin toss to one, you are using one bit
individual-or whether we can test it of information. You can see that the
accurately enough to be sure-we have computing of this information readily
no great difficulty in adapting this defi- lends itself to using logarithms to the
nition to our common understanding of base 2, rather than our common base
the term communication. 10. In the case of the coin toss, log,2
But when we define information in = 1 bit. But it would take log,42 or
terms of information theory, then we about 5.4 bits of information to predict
have to get used to a somewhat differ- which typewriter key would be struck
ent approach. We can, of course, meas- at random, or log,26 (4.7 bits) to pre-
ure the “information” transmitted along dict which letter of the alphabet will
a communication chain in terms of come up, if one is chosen at random.
many kinds of units-letters, mor- This brings us to the basic terms of
phemes, phonemes, facts (if we can information theory, entropy and redun-
satisfactorily define a fact). But none dancy. Entropy simply means the un-
of these is satisfactory for the precise certainty or disorganization of a sys-
needs of information theory. Informa- tem; redundancy is the opposite. En-
tion is there defined in terms of its abil- tropy is, of course, a famous term
ity to reduce the uncertainty or dis- derived from mathematical physics,
organization of a situation at the re- where it has been used to talk about
ceiving end. Newton’s second law of thermodynam-
Let us take an example. Suppose I ics. The law that “entropy always in-
tell you one “fact” about a coin toss creases,” said Eddington, “holds, I
and one “fact” about typewriter keys. think, the supreme position among the
I tell you that tails will not come up laws of Nature.” It is this law, he also
when the coin is next tossed, and that said-the tendency of physical systems
the letter G will not be struck when the to become always more shuffled, less
next key is depressed on the typewriter. organized-which is the only way we
Now it is obvious that the information could tell whether a movie of the phys-
about the coin is more useful to you ical world were being run backward or
than the information about the type- forward. It is not surprising that Shan-
writer in predicting what will happen. non, trying to describe information in
You will have no remaining doubt as to terms of the reduction of uncertainty,
which side of the coin will come up, should use the term entropy and the tra-
whereas you will still be uncertain ditional mathematical symbol for that
which of the remaining 41 keys of the term, H.
typewriter will be struck. In terms of Entropy is measured in terms of the
information theory, more information information required to eliminate the
has been transferred about the coin uncertainty or randomness from a situ-
than about the typewriter. When a ation within a system or involving two

Downloaded from jmq.sagepub.com at UCSF LIBRARY & CKM on April 11, 2015
134 JOURNALISM QUARTERLY

systems. Entropy will obviously be at HOW APPLICABLE IS THE THEORY?


its maximum when all states of the sys- The concepts of information theory
tem are equally probabl-that is, when have an insightful quality, an intuitive
they occur completely at random, as sort of fit, when they are applied freely
when a coin is tossed. The formula for to mass communication situations.
maximum entropy is therefore the same For one thing, it is obvious that hu-
as the formula for information, man communication, like any other
Hmax = logn kind of communication, is merely a
where n is the number of equally prob- chain of coupled systems-thus,
able outcomes.
Most situations with which we deal
in human communication do not have
equally probable outcomes. For exam- In mass communication, these chains
ple, the letters of a language do not oc- take on certain remarkable character-
cur completely at random. If they did, istics. They are often very long. The
the language would be complete chaos. account of a news event in India must
Because we know that e in the English pass through very many coupled sys-
language occurs oftener than any other tems before it reaches a reader in Indi-
letter, we give it the simplest symbol in ana. Again, some of the systems have
the radio-telegraph c o d m n e dit, or phenomenally high rates of output
short. Because there are certain combi- compared t o their input. Shannon
nations of letters and sounds more like- would call them high-gain amplifiers.
ly than others to occur together, we These are the mass media, which have
find it possible to learn to spell and the power to produce many simulta-
understand speech. Therefore, in most neous and identical messages. Also, in
communication situations, we use en- this kind of chain, we have certain net-
tropy formulas which measure the de- works of systems within systems. Two
gree of predictability and randomness. of these are very important. The mass
In order not to clutter up the path media themselves are networks of sys-
at this point, the principal formulas of tems coupled in a complicated way so
information theory have been put in a as to do the job of decoding, interpret-
brief appendix to this paper. The non- ing, storage and encoding which we
mathematical reader need remember associate with all communicators.
only that mathematical tools are avail- Likewise, the individual who receives a
able to measure, among other things, mass media message is a part of a net-
the amount of uncertainty in a system work of group relationships, and the
(observed entropy), the degree of cer- workings of this network help to deter-
tainty or predictability in a system (re- mine how he responds to the message.
dundancy), the degree of uncertainty But each system in the mass commu-
in a system (relative entropy), the un- nication chain, whatever the kind of
certainty of occurrence of pairs of system, is host to a series of events
events (joint entropy), the uncertainty which are constrained by their environ-
of occurrence of events in sequence ments and by each other, and therefore
(conditional entropy), the amount of to a certain degree predictable and sub-
information transmitted under various ject to information theory measure-
conditions, and the capacity of a chan- ments. Much of the scholarly study of
nel t o transmit information. mass communication consists of an ex-

Downloaded from jmq.sagepub.com at UCSF LIBRARY & CKM on April 11, 2015
Information Theory and Mass Communication 135
amination of the constraints on these The amount of redundancy-using
events, and discovering the dependency that term freely-is therefore one of
of events in one of these systems on the great strategy questions confronting
events in another system. mass communication. The most eco-
For example, a large part of what we nomical writing is not always the most
call “effects” study is the comparison of effective writing. We could write this
events in one system with events in an- entire paper in the terse, economical
other. A readership study compares the language of mathematics, but that
events in a newspaper with the events would not necessarily communicate
in an individual‘s reading behavior. A most to the people who will read this
retention study compares the events in paper. A newspaper reporter may
a medium with the events in an indi- choose to explain the term photosyn-
vidual’s recall. And so forth. We have thesis in twenty words, which is redun-
every reason to suspect, therefore, that dancy unnecessary to a scientist but
a mathematical theory for studying elec- highly necessary to a layman. There is
tronic communication systems ought t o a kind of rule of thumb, in preparing
have some carry-over to human com- technical training materials, that two or
munication systems. more examples or illustrations should
be given for each important rule or
ENTROPY AND REDUNDANCY
term. There is another rule of thumb,
The term entropy is still strange to in broadcast commercials, that product
students of social communication, but names should be repeated three times.
redundancy is an old and familiar idea. All these are strategy decisions, aimed
The redundancy concept of informa- at using the optimum of redundancy.
tion theory gives us no great trouble. And indeed, finding the optimum de-
Kedundancy is a measure of certainty gree of redundancy for any given com-
or predictability. In information theory, munication purpose is one of the chief
as in social communication, the more problems of encoding.
redundant a system is, the less infor-
Relative entropy, as we have pointed
mation it is carrying in a given time.
out, is merely the other side of the coin
On the other hand, any language or any
from redundancy. The lower the redun-
code without redundancy would be
dancy, the higher the relative entropy.
chaos. In many cases, increasing the
redundancy will make for more effi- One of the aspects of human com-
cient communication. munication where entropy and redun-
For example, on a noisy teletype dancy measures have already proved
line, it helps considerably to have cer- their usefulness is in the study of lan-
tain probabilities controlling what let- guage. Morphemes, phonemes, letters
ters follow others. If a q (in English) and other linguistic units obviously do
occurs followed by two obvious errors, not occur in a language completely at
the operator at the receiving end can random; they are bound by certain se-
be quite sure that the q is followed by quential relationships, and therefore
a u and that the next letter will be an- subject to measures of entropy and re-
other vowel. When a circuit is bad, dundancy. We know, among other
operators arbitrarily repeat key words other things, that the relative entropy
twice. Remember the familiar cable of English is slightly less than 50%.2
language-THIS IS NOT-REPEAT,
NOT. . . ’This is calculated BS follows: The maximum
entropy of 26 English letters 15 log26 or about

Downloaded from jmq.sagepub.com at UCSF LIBRARY & CKM on April 11, 2015
136 JOURNALISM QUARTERLY

Shannon has estimated, incidentally, -if we get this result, it is clear that
that if the relative entropy of the lan- the uncertainty or relative entropy of
guage was only 20%-if the next let- Paragraph B is considerably greater for
ter in a sequence were, on the average, this audience than is that of Paragraph
SO% predictable-then it would be im- A. Paragraph A is apparently more re-
possible to construct interesting cross- dundant than B. Taylor has gone into
word puzzles. But if the relative en- this use of information theory in his
tropy were 70%-if the structure were doctoral dissertation, and it is clear
only 30% redundant-then it would that the redundancy or relative entropy
bc easily possible to construct three- of a passage is closely related to its
dimensional crossword puzzles ( 1). readability.
This information about crossword puz- If we consider an entire mass me-
zles, of course, is not intended to rep- dium as a system, then it is evident
resent the results of modern linguistic that the maximum entropy of a news-
scholarship. For a more representative paper o r a broadcasting station is im-
example, see Jacobson, Halle and mensely greater than that of a sema-
Cherry on the entropy of Russian pho- phor, a calling card, a personal letter or
nemes (9). a sermon. The paper or the station has
Wilson Taylor’s “Cloze” procedures a very great freedom to do different
is one of the interesting ways we have things and produce strikingly different
available for use in estimating the en- products. A large newspaper, like the
tropy or redundancy of prose. Taylor New York Times,has higher maximum
deletes every nth word in a passage, entropy than a smaller newspaper. If
and asks readers to supply the missing we could devise any way to make a
words. The scatter of different words valid comparison, I think we should
suggested for each of the missing terms find that the relative entropy of radio
provides a measure of the predictabil- and television would be less than that
ity of the passage to that particular of newspapers. If this is indeed the
audience. For example, if we present case, it may be that the tremendous
two paragraphs to the same group of wordage of broadcasting puts a burden
20 readers, and on the average this is on originality, and the scant feedback
the score they make: to a broadcasting station puts a pre-
mium on any formula which has proved
Paragraph A
popular. A successful formula is soon
16 specify word A (correct) imitated. A popular program promptly
2 B
2 C spawns a whole family that look like
it. A joke passes quickly from come-
Paragraph B dian to comedian. We might say that
6 specify word A (correct) for comedians, joint and conditional
4 B entropy are quite low. For comic strips,
4 C
3 D relative entropy is obviously very low,
1 E and redundancy very high.
1 F But it is also evident that no medium
1 G
would be lower if we figured sequential entropy
4.7 bitr per letter. The sequential entropy of for sequences longer than eight ktten.
groups of eight letters as they occur in English 1% Wilson L. Taylor, “Cloze Procedure: A
usage is about 2.35 bits per letter. Therefore, the New Tool for Measuring Readability,” J O ~ N A L -
relaclve entropy is 2.3V4.7 or about 3. This ISM Q U A I ~ L Y , 30:415-33 (Fall 1953).

Downloaded from jmq.sagepub.com at UCSF LIBRARY & CKM on April 11, 2015
Information Theory and Mass Communication 137
FIGURE A
-
I 8 a ma

I20

SOURCES OF NEWS
B Y CITIES

l8
o0o

ao
PLLY N E W YORK
11/8/54
TIMES

uses as much of the available freedom Times for that day was about 52%.
as it could. Complete freedom would Throughout that week it hung around
mean random content. The art of being SO%, minus or plus 5. This seems quite
an editor or a program manager con- typical of large newspapers. Four Chi-
sists in no small degree of striking the cago papers, two other New York pa-
right belance between predictability pers, and the Washington Post and
and uncertainty-right balance being Times Herald, all were between 41 and
defined as the best combination of sat- 57% for the same period. The London
isfied anticipation and surprise. From Times and Paris Figaro were a little
time to time we have tried to quantify over 40%. During the same period, a
this amount of organization or predict- radio news wire averaged about 45%
ability in a mass medium. One of the relative entropy.
simpler ways to approach it is to tabu- This rather remarkable order of
late the news sources in a paper. ~greementrepresents a pattern of con-
For example, Figure A is a typical straint which, if we understood it com-
distribution of news items by source in pletely, would tell us a great deal about
a metropolitan newspaper for one day. mass media. Why do large papers, on
The usual way we handle figures like the average, use about half the freedom
this is by means of the statistics of cen- open to them to represent different
tral tendency-mean, standard devia- news sources? Availability is one rea-
tion, etc. Suppose we were to handle it son, but the chief reason is simply that
by information theory mathematics. If this is the editors' definition of what
relative entropy were at a maximum, their clientele want, and can absorb,
each of these news sources would be and should have, and can be given
represented equally. Actually, the rela- within the bounds of physical limits
tive entropy of news sources in the and custom. Information theory ap-

Downloaded from jmq.sagepub.com at UCSF LIBRARY & CKM on April 11, 2015
138 JOURNALISM QUARTERLY

pears to offer us a new way to study a sight-sound medium like television


this characteristic of the media. greater than that of print? You can cer-
THE IDEA OF NOISE
tainly increase band width by using a
The idea of noise is another infor- high fidelity phonograph, but can you
mation theory concept which intuitive- also increase it by buying time on more
ly makes sense in the study of mass radio stations? Similarly, what consti-
communication. Noise, as we have said, tutes power, in this sense, within persua-
is anything in the channel other than sive communication? Supposedly, the
what the communicator puts there. nature of the arguments and the source
Xoise may be competing stimuli from will contribute to it. Will talking louder
inside-AC hum in the radio, print vis- contribute to power, so defined? Will
ible through a thin page in a magazine, buying bigger ads?
day-dreaming during a class lecture- To basic questions like these, infor-
or from outsid-ompeting headline mation theory is unlikely to contribute
cues on a newspaper page, reading a except by stimulating insights, but it
book while listening to a newscast, the should be pointed out that there are
I ? u z of conversation in the library. In formulas for calculating noise which
general, the strategy of a mass com- may well prove to be useful in tests of
municator is to reduce noise as much learning and retention from mass com-
as possible in his own transmission, and munication, and in rumor analysis and
to allow for expected noise in the re- other functions of communication
ception. An increase in redundancy, as chains.
COU PLlN G
we have already suggested, may com-
bat noise; a radio announcer may be That brings us to talk of coupling,
well advised to repeat a highly impor- which is another point at which infor-
tant announcement. mation theory comes very close to our
The information theory formula for usual way of thinking about human
maximum transmission capacity in the communication. We are accustomed to
face of noise also furnishes some guides think of “gatekeepers.” Stli;ily speak-
as to what can be done. This formula is ing, every system that couples two oth-
er systems is a gatekeeper. But there
are certain especially important gate-
keepers in mass communication: the
in which W is band width, P is power reporter walking his beat, the telegraph
of transmission, N is noise, In other editor deciding on what to select from
words, you can approach maximum ef- the wire, the press association wire filer
ficiency by reducing noise, increasing deciding what stories to pass on, the
band width or increasing power. Two commentator deciding what aspect of
of the great problems of mass commu- current life to focus on, the magazine
nication, of course, are to understand editor deciding what parts of the envi-
exactly what is meant by band width
and power, for any given situation. Is speaker may emphasize a point by choice of
words, by speaking louder. by pausing lust be-
the band width of a talking picture or fore the key word, by gestures, by facial expres-
television greater than that of a silent sion, etc. But suppose one of these cues is not
congruent with the others. For example, suppose
picture or radio?’ Is the band width of the speaker winks in the midst of all this acrious-
ness. Or suppose his voice trails up when it
4 Putting simultaneous and reinforcing cues in a should go down. This seems to be tho way we
single band of communication supposedly adds to u x simultaneous cues in a wide band to repre-
the “power” of a message. For example, a sent satire or humor or Irony.

Downloaded from jmq.sagepub.com at UCSF LIBRARY & CKM on April 11, 2015
Information Theory and Mass Communication 139
ronment to represent, and others. All Or think of the receiver at the end of
these are subject to the stability and the mass communication chain. What
fidelity measures of information theory: stories from the Reader‘s Digest, what
how likely are they to pass on the in- items from the newspaper, does he pass
formation that comes to them? How on to his friends? And how accurately
faithfully are they likely to reproduce does he represent the content? Does he
it? reproduce the part of the content which
Even the terms used to talk about reinforces his previous attitudes? Does
fidelity in electronic systems sound fa- he get the point of an article?
miliar to us in light of our experience Rumor analysis is a fascinating use
with mass communication. How much for the coupling concepts of informa-
of the information do the gatekeepers tion theory. What kinds of rumors en-
filter out? How much fading and boom- courage the stability of the chain-that
ing do they introduce (by changing the is, what kinds of rumors will tend to be
emphasis of the message)? How much passed on? And what factors govern
systematic distortion are they respon- how faithfully a rumor is passed on?
sible for (through bias)? How much Content analysis codes are subject to
random distortion (through carelessness study for stability and fidelity. How
or ignorance) ? much of the information in the meas-
The newspaper itself-if we want to ured content do they respond to? How
consider it as a system-is a gatekeeper faithfully do they reproduce it? As a
of the greatest importance. The daily matter of fact, many of the concepts of
life of a city presents itself to the pa- information theory are stimulating to
per’s input. Selected bits of the daily content study. For example, the heavy
life of the rest of the world enter the redundancy of Communist propaganda
input through the telegraph desk. What shows up from almost any content
comes out? What is the stability of the study, as does the relatively low en-
paper for reproducing local news as tropy of the semantic systems within
compared with national news, civic which the Communist propagandist
news as compared with crime news, works. The close coupling of units in
news of one presidential candidate as the Communist propaganda chain is
compared with news of another? And striking. And the stability and fidelity
what about fidelity? T o what extent of the Communist gatekeepers, trans-
does the paper change its input by cut- mitting the official line, are very high.
ting, by rewriting, by choosing a head- If they are not, the Party gets a new
line which affects the meaning, by giv- gatekeeper.
ing one story better position than an- Measures of stability and fidelity are
other? available, in information theory, and
Think of the reporter walking his relatively easy to use. When they are
beat. Everything he sees and hears is applied to a long chain-such as the
news for someone, but he must make a one, previously referred to, which car-
selection in term of what his editors ries news from “India to Indiana” and
and-supposedly-his readers want. back-it becomes apparent that the sta-
His stability is necessarily low. But how bility of the main points along the
is his fidelity? Does he get the quotes chain is quite high: that is, a bureau
from a speech right? Does he report an like London is quite likely to pass along
accident accurately? a story that comes from New Delhi.

Downloaded from jmq.sagepub.com at UCSF LIBRARY & CKM on April 11, 2015
140 JOURNALISM QUARTERLY

The closer one gets to the source of faster than most mechanical systems
news, the lower the stability, because (such as smoke signals and flags), but
the input is large, the output capacity far slower than that of most electronic
relatively small. Bloomington, for ex- devices (e.g., the electronic comput-
ample, regularly publishes about 65 lo- ers). We still have a great deal to find
cal stories, but can only put two or out about man’s capacity for handling
three on the wire. Delhi, likewise, can language and pictorial information.
send London only a small part of the Many of the capacity problems of
Indian news. Chicago, on the other mass communications, of course, find
hand, can send out more than half the man at the mercy of his works. The re-
stories available. The problem in meas- porter who has only 30 minutes to
uring the fidelity of this kind of chain write his story before deadline, the edi-
is to define measurable units. Using tor who is permitted to file only 200
length as one criterion, it becomes ap- words on the wire, the radio news bu-
parent that the greatest loss is near the reau desk which has room for only 13
source of news. Using rewriting as a minutes of copy and must select from
criterion, it seems that the chief re- 300 stories, the editor who finds a big
writing is done at the first wire points advertising day crowding out his news
and the chief national bureaus. -all these are communicators suffer-
CHANNEL CAPACITY ing from capacity problems they have
Channel capacity is another impor- helped to make. It is also obvious that
tant concept which is common both to the channel capacity of the New York
information theory and to mass com- Times is greater than that of a small
munication. All channels, human, elec- daily. But for the Times and its smaller
tronic or mechanical, have an upper brothers there is an even greater chan-
limit on their ability to assume differ- nel restriction: the reader. The reader
ent states or carry different events. We 01 a daily can spend, on the average,
can estimate, for example, the amount about 40 minutes on his paper. And he
of information the eye is capable of reads rather slowly. Even so, he can
transmitting to the optic nerve, and it read faster than he can listen, so to
is less than the information available to speak. A radio speaker usually stays
the eye, although apparently more than under 150 words a minute, not be-
the semantic system can handle. We cause he cannot talk faster, but because
can estimate the capacity of a t e l e he fears he will crowd the channel ca-
phone line or a microphone, and have pacity of his listeners.
very good formulas for doing so. But Shannon has developed a theorem
when we consider the characteristics of for a channel with noise which is both
a chain and recall that the chain is no remarkable in itself and highly mean-
stronger than its weakest link, then our ingful for persons concerned with mass
chief interest turns to the channel ca- communication ( 1 ) . His theorem says,
pacity of man, who is the weakest link in effect, that for rates of transmission
in most communication chains. less than the capacity of a channel it is
Perceptual experiments have told us possible to reduce noise to any desired
a great deal about the ability of man to level by improving the coding of the in-
transmit information through some of formation to be transmitted; but that
his systems. In general, we can say that for rates of transmission greater than
man’s ability to handle information is channel capacity it is never possible to

Downloaded from jmq.sagepub.com at UCSF LIBRARY & CKM on April 11, 2015
Information Theory and Mass Communication 141
FIGURE B
Ability t o Repeat Information from Newscasts

I n t o r r n o t l o n i n p u t

reduce noise below the amount by in a group of objects, in front of him.


which the rate of transmission exceeds (He has already over-learned this re-
channel capacity. In other words, as sponse, so supposedly no learning takes
Wilson notes, error can be reduced as place during the experiment.) The time
much as desired if only the rate of from the stimulus until the subject
transmission is kept below the total ca- touches an object is taken as the total
pacity of the channel; but if we over- time for decoding and encoding. It is
load the channel, then error increases hypothesized that as the rate increases
very swiftly. this total time will decrease until it be-
Information theory thus promises us comes stable. As the rate increases fur-
real assistance in studying the capacity ther, the number of errors will begin to
of channels. For example, in a recent increase, until at a certain rate the time
publication (10) an information theory will become highly variable and the
model is proposed to measure an indi- process will break down. The rate at
vidual's channel capacity for semantic which the total time becomes stable is
decoding. Verbal information is to be taken as the optimum channel capacity,
fed the individual at increasing rates. because it is there that the largest
This information consists of a group of amount of accurate information is be-
adjectives describing an object. The re- ing transmitted.
ceiver is asked to respond in each case This experiment has not yet been
by touching the corresponding object, done with the accurate controls which

Downloaded from jmq.sagepub.com at UCSF LIBRARY & CKM on April 11, 2015
142 JOURNALISM QUARTERLY

would be required, but some striking dundancy, noise, fidelity and capacity
confirmation of it comes out of experi- measures, they suggest trafic (what
ments with retention of newscasts. members do the most talking, and how
Subjects were presented newscasts of much talking is done?), closure (to
increasing density but constant length what extent is the group a closed cor-
-5, 10, 20, 30, 40. 50 items. The aver- poration?), and congruence (to what
age subject’s ability to recall the subject extent do members participate equally
of these items leveled off vary sharply in the communication of the group, or
between 10 and 20. There was practi- to what extent are there members who
cally no additional learning between 20 are chiefly talkers and others who are
and 30. After 20, the number of errors chiefly listeners?). All these formula-
began to increase rather sharply. In tions can be dealt with mathematically.
other words, the amount of information Measures like these suggest a quite dif-
transmitted behaved about as hypothe- ferent and stimulating way of studying
sized above, and the resulting curve small groups, and in particular they
was strikingly like those typically re- commend themselves for use in study-
sulting from experiments on the capac- ing the important groups within mass
ity of subjects to discriminate among communication.
stimuli-as shown in Figure B. Suppose, for example, we want to
study some part of the world news net-
NETWORKS
work. Suppose that we take the chief
Of all the potential contributions of newspapers of the leading cities in half
information theory to mass communi- a dozen countries-for example, the
cation, perhaps the most promising is United States, Great Britain, France,
in the study of communication net- Germany, Italy and the Soviet Union-
works. Networks are as important in and tabulate for one week the stories
mass communication as in electronic which the papers in each city carry
communication. Every functional group from the other cities in the network.
is a communication network. The staff This has been done in a small way,
of a newspaper or a broadcasting sta- with interesting results. Washington has
tion, a film production crew, the group the greatest output traffic, New York
with which a member of the mass com- the greatest input traffic. Moscow has
munication audience talks over what he the greatest degree of individual clo-
reads, hears and sees-all these are sure: that is, it is most likely to talk, if
communication networks. The inter- at all, to itself. Within a country, there
communication within the network is are startling differences in the amount
measurable, whether it consists of con- and distribution of input. In general
versation, print, gestures or electronic there appears to be a little more organi-
currents. zation (redundancy) in the pattern of
Osgood and Wilson, in a mimeo- input than in the pattern of output:
graphed publicati~n,~ have suggested a that is, source entropy is higher than
series of measures derived from infor- destination entropy. And the congru-
mation theory, for dealing with groups. ence (the correlation between source
In addition to the common entropy, re- and destination frequencies of points
in the network) varies markedly with
8 “A,,Vocabulary for Talking about Communi- political conditions and cultural rela-
cation, colloquium paper. Institute of Communi-
cations Research, University of Illinob. tionships at a particular time.

Downloaded from jmq.sagepub.com at UCSF LIBRARY & CKM on April 11, 2015
Information Theory and Mass Communication 143
Let us take a simpler example of that simulated the work of an actual
group communication. Here is a record newspaper staff, including reporting,
of telephone calls amongst four boys reference, editing, copyreading and set-
(who telephoned incessantly). The calls ting in type. All their intercommunica-
were tabulated at periods two months tions were recorded. Not enough
apart-20 calls while the boys were or- groups have yet been put through the
ganizing a school newspaper, and 20 procedure to reveal all the variables,
calls two months later after the paper but the pattern so far is very clear and
was well launched. interesting. Some of the groups were
started on their assignments entirely
Twenty Telephone Calls by Four Boys unstructured-that is, no roles were as-
A. In process of organizing a school signed. In others a leader was appoint-
newspaper: ed. In still others, every person was
Mike Bud MikeT. John
Mike .... 4 4 2 1 0 assigned a job. Inasmuch as some
Bud ..... 3 1 2 6 measure of leadership almost always
Mike T. . . 1 1 0 2 appeared, regardless of assignment,
John .... 1 1 0 2
5 6 5 4 participants were asked at the end
B. After school newspaper had been pub- whether they perceived a leader or
lished 2 months: leaders, and if so, whom? This, in gen-
Mike Bud MikcT. John eral, seems to be the pattern:
Mike . . . . 3 1 1 5 (a) As the perception of leadership
Bud ..... 7 1 0 8
Mike T. . . 5 1 0 6 increases, the relative transitional en-
John .... 1 0 0 1 tropy of communication in the group
1 3 4 2 1 decreases-that is, it becomes easier to
It is clear that the relative transi- predict who will talk to whom.
tional entropy of this group became less (b) As the degree of initial organi-
in the two months-that is, it became zation is increased, the total amount of
better organized-and also that the communication decreases and the total
congruence had changed so that in- time required to do the job decreases.
creasingly one pattern could be pre- (c) However, between the group in
dicted: i.e., the boys would call Mike. which a leader is appointed and the
It seems that whereas Mike must have group in which all members are as-
been the organizer at fist, he became signed roles, these measures change
the leader later, and the other boys much less than between the other
turned to him for advice or instruc- groups and the unstructured group. In
tions. some cases, the group in which a leader
This kind of result suggests the hy- only was appointed actually finished
pothesis that the entropy of communi- the job more quickly than the group in
cation within a functional group de- which all roles were assigned. This sug-
creases as the group becomes more gests that there may be a stage in which
fully organized into work roles and bet- increasing organization does not con-
ter perceives the existence of leadership. tribute to efficiency; and also, that it
By way of testing this and preparing must make a difference who is appoint-
the way for studying actual media ed leader, even in these previously un-
staffs, some experiments have been acquainted groups.
done with groups of five journalism These results are presented only to
students who were given assignments suggest that the approach is a promis-

Downloaded from jmq.sagepub.com at UCSF LIBRARY & CKM on April 11, 2015
144 JOURNALISM QUARTERLY

ing one for group study, and especially only analogic value, and that the con-
for the study of the kind of functional tribution of its mathematical tools is
groups that play such an important necessarily small. These tools seem to
part in mass communication. me to be extremely promising in the
study of language, channel capacities,
FINALLY couplings, and network groups, if no-
How can we sum up the import of where else. It will be to our advantage
all this for the study of mass commu- to explore these uses and others.
nication?
Appendix
Even such a brief overview as this
THE BASIC FORMULAS
must make it clear that information
It may be helpful to explain the basic
theory is suggestive and stimulating for entropy formula here in order to give a
students of human communication. It better idea of what information theory has
must be equally clear that the power of to offer mathematically.
the theory and its stimulating analogic Let us begin with an event which we
quality are greatly at variance with the call i within a system which we can call
puny quality of the mathematical ex- I.” (For example i may be the yellow light
on a traffic light I.) Then let us call p(i)
amples I have been able to cite-that the probability of event i occumng within
is, examples of the use so far made of the system. This is equivalent to saying
information theory mathematics in that p(i) equals l/a, in which a is a cer-
studying mass communication. Why tain number of equally probable classes.
(For example, the yellow light in a traffic
should this be? light occurs two times in four events, so
The theory is now-1948, as I have that its probability is 1/2.) The informa-
said, for most of us. Its application is tion we need to predict the Occurrence of
fringed with dangers. One of these has event i is therefore loaa. By algebraic
transformation, we can say that, since p(i)
been indicated-the danger of working equals I/a, a equals l/p(i). Therefore the
with stochastic processes in functional information necessary to specify the one
systems which may learn and thereby event i is lo&(l/p(i)). Since the logarithm
change the probabilities. It should also of x/y always equals log x - log y, we
have the information necessary to specify
be said that we do not as yet know event i equal to log21- log?p(i). The log
much about the sampling distributions of 1 is always zero, and therefore we ar-
of these entropy formulas, and it is rive at an equation which states the
therefore not always wise to use them amount of information necessary to spe-
for hypothesis testing and statistical in- cify one event in a system (let us call this
information h(i)),
ference. Finally, we must admit frank-
ly the difficulty of bridging the gap be- h (i) = -lo&p(i)
tween the formula’s concept of infor- Now what we need is an estimate of
the average amount of entropy associated
mation (which is concerned only with with all the states of a system. The aver-
the number of binary choices necessary age of a sample of numbers can be ex-
to specify an event in a system) and pressed as
our concept of information in human 2 i f(i)
communication (which is concerned i n
with the relation of a fact to outside where i is the numerical value of any
events+.g., how “informative” is it?). QThisexplanation of the formula for observed
entropy in general follows the approach of Wil-
This is not to say that the transfer son in bibliography item (10). Wilson’s treatment
cannot be made. Certainly I have no in- of the subject is easy to read and still both solid
and stimulating, and is recommended to begin-
tention of saying that the theory has ners in this field.

Downloaded from jmq.sagepub.com at UCSF LIBRARY & CKM on April 11, 2015
Information T h e o r y and Mass Communication 145
class of numbers, f ( i ) is the frequency of currence of events n and j together. We
occurrence of that class, n is the sample also get a formula for conditional entropy,
size, and p is the term for sum of all the which deals with the occurrence of two
i events in sequence (for example, the oc-
currence of u after q in a sample of Eng-
i's. But f(i)/n is the same as an estimate lish words). This is written,
of probability, which we called p (i), and
which we can here substitute in the term HtJ= 2 p(i.j)log?pl(j)
for an average as follows: 1J

3 i p(i) in which ptj represents the probability of


1 the occurrence of j after i has occurred.
Therefore, if we want the average amount Among the other formulas available are
of information needed t o predict the oc- those for redundancy (basically, 1 -
currence of the states of a system I, we HI,I), amount of information transmitted,
can use this term and substitute the infor- channel capacity, noise and maximum ef-
mation symbol for the numerical value, fective coding in the face of noise. It is
thus, not believed necessary t o speak in any
greater detail of these measures at this
H(1) = 3 h(i) p(i), or point, inasmuch as the purpose of these
pages is to give a general idea of the the-
ory rather than a complete description of
This last expression is the basic formula it.
for observed entropy. COMPUTING THE COMMONER MEASURES
It is clear that this formula will equal
zero when the probability of one event is Maximum entropy, of course, may be
unity and the probability of all other computed simply by taking the log (to the
events is zero: in other words, when there base 2 ) of the total number of events in
is no uncertainty in the system. It is also the system.
clear that the formula will approach H,,,"= In computing the other entropy measures
(which, you will remember, is log,n) as from social data, it will be necessary to
the events in the system become more estimate the probabilities of events within
nearly equally probable, so that there is systems by counting frequencies of occur-
maximum uncertainty in the system. In a rence over some uniform time period. For
coin toss, for example, observed entropy example, if a traffic light were a new phe-
is the same as maximum entropy (log,2, nomenon t o us, we might count the occur-
or 1 ) because the events are equally prob- rence of red, yellow and green events for
able. However, the more events in a sys- a certain length of time and get, say, 10
tem, the higher the observed entropy is each for red and green, 20 for yellow, out
likely to grow. Therefore, it becomes use- of 40 events. From these we should esti-
ful to have a measure by which to com- mate the probability of red and green as
pare systems which have different numbers % each, of yellow as %. To estimate the
of states. This is the formula for relative probabilities of events in two systems or
entropy, which is simply the observed en- of sequential events in one system, it is
tropy of a system divided by its maximum helpful to use a table like this one:
en t ronv- J

From the basic formula, we get the for-


mula for joint entropy, which is simply the I p(i)
entropy for the Occurrence of pairs of
events (for example. q and u together in
a sample of English words) and which is
written,
p(j)
H ( I J ) = -.2 p(i,j)logip(i,j)
1.1 Having computed the values of H ( I ) ,
This is read exactly like the basic entropy H ( J ) , H(1,J) from this kind of table, the
formula except that ( i j ) stands for the oc- values of the conditional entropies may be

Downloaded from jmq.sagepub.com at UCSF LIBRARY & CKM on April 11, 2015
146 JOURNALISM QUARTERLY

obtained, if desired, by the following rela- S. Miller, G. A., “What Is Information M e a s


tionships: urcment?”. Amerlcan PsyChO~Ogfst,8:3-11 (19S3).
(Non-technlcal, and a good beginning point for
H,(J) = H(I,J) - H(I) non-mathematicians. )
H,(I) = H(I,J) - H(J)
It is not necessary to d o all the calcula- Tables for Computlng lnformatlon Theory Meas-
ures:
tion which would seem to be required to 6. Dolansky. L.. and Dolansky, M. P.. Table
turn the probabilities into entropy scores, Of lOXa1/P, P k ? d / P , P 108a f f1-P) 108s
if one uses such a table as that of Dolan- f I - p J . MIT Technical Report 277. Cambridge,
sky and Dolansky (see bibliography) 19S2. (The most complete tables.)
which is recommended to anyone making 7. Newman. E. B., “Computational Methods
Useful in Analyzing Serica of Binary Data,”
extensive use of these formulas. American Journal o f Psychology, 6 4 : 2 S 2 4 2
(1951).
A Short List of Readings on
Examples o f Appllcations:
Information Theory
The basic theory: 8. Garner, W. H., and Hake. H. W., “Th:
Amount of Information in Absolute Judgments,
1. Shannon, C. E., and Weaver, Warren, The Psychological Review, 58:44&S9 (1951). (Joint
Mathematical Theory o f Communlcation. Urbana, and conditional entropy used to measure stimu-
1949. (Contains the classical article by Shannon, lus-response relationships.)
with comments by Weaver. For readers without
a good mathematical background, Weaver is a 9. Jakobson, R., Halle, H., and Cherty, E. C.,
better beginning article than Shannon.) “Towards the Logical Description of Languages
2. Weiner, Norbert, Cybernefics. New York, in Their Phonemic Aspect,’’ Language, 2 9 : 3 4 4 6
1948. (Stimulating in that it contalns much of the (1953).
viewpoint for Shannon’s later development.) 10. Osgwd, C .E.,editor, Sebock. T. A., and
3. Fano, R. M.,The Transmission of Informa- others, Psycholinguistics: A Survey o f Theory and
tion. MIT Technical Reports 65 and 149. Cam- Research Problem. Supplement to Infernattonal
bridge, 1949-50. (Highly mathematical.) Journal of American Linguisttcs, 20:4 (1954).
4. Goldman. Stanford, Informuffon Theory. (Sections on information theory are stimulating
New York, 19S3. (Textbook for graduate stu- and easy to read; they are written mostly by K.
dents in electrical engineering.) Wilson.)

“The duty of the press at this moment is to show the way to recovery
from the blight of fear and cautious conformity.
“There has been far too much defeatism. . . . I believe that the press in
America today has the greatest opportunity it has ever had to help point
the way back to sanity.
“What is more, I believe it has begun to live up to this challenging op-
portunity, and on this 50th anniversary we can look forward to another
half century in which this School of Journalism and the University will play
their part in educating other generations for the grave responsibility o f
freedom. . . .
“The press of America cut its teeth on controversy and controversy has
been its lifeblood.
“What is most extraordinary is to find in press and radio those who seem
so frightened and insecure that they would drive out every opinion that
does not conform to their own narrowly reactionary standard. They spend
a great deal of time vilifying those who do not agree with them as though
this were in itself a crime.
‘‘I would not want to deprive the radicals of either the right or the left
of their privileges of saying within the bounds o f constitutional freedom
what they please. And I might say in passing that we sometimes forget
what we should have learned from the example of Italy and Germany: that
there are radicals of the extreme right just as ready to destroy existing
institutions as are the extremists of the left. The Jacobin can wear black as
well as red.’’--hf.mQuis CHILDS, at University of Wisconsin.

Downloaded from jmq.sagepub.com at UCSF LIBRARY & CKM on April 11, 2015

You might also like