Principles of Computer Hardware - PDF Room
Principles of Computer Hardware - PDF Room
COMPUTER
HARDWARE
Alan Clements
School of Computing
University of Teesside
Fourth Edition
1
3
Great Clarendon Street, Oxford OX2 6DP
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide in
Oxford New York
Auckland Cape Town Dar es Salaam Hong Kong Karachi
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
New Delhi Shanghai Taipei Toronto
With offices in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam
Oxford is a registered trade mark of Oxford University Press
in the UK and certain other countries
Published in the United States
by Oxford University Press Inc., New York
© Alan Clements, 2006
The moral rights of the author have been asserted
Database right Oxford University Press (maker)
First published 1985
Second edition 1991
Third edition 2000
Fourth edition 2006-01-18
All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means,
without the prior permission in writing of Oxford University Press,
or as expressly permitted by law, or under terms agreed with the appropriate
reprographics rights organization. Enquiries concerning reproduction
outside the scope of the above should be sent to the Rights Department,
Oxford University Press, at the address above
You must not circulate this book in any other binding or cover
and you must impose the same condition on any acquirer
British Library Cataloguing in Publication Data
Data available
Library of Congress Cataloging in Publication Data
Data available
Typeset by Newgen Imaging Systems (P) Ltd., Chennai, India.
Printed in Great Britain
on acid-free paper by
Bath Press Ltd, Bath
10 9 8 7 6 5 4 3 2 1
PREFACE
Principle of Computer Hardware is aimed at students taking course. Topics like advanced computer arithmetic, timing
an introductory course in electronics, computer science, or diagrams, and reliability have been included to show how the
information technology. The approach is one of breadth computer hardware of the real world often differs from that
before depth and we cover a wide range of topics under the of the first-level course in which only the basics are taught.
general umbrella of computer hardware. I’ve also broadened the range of topics normally found in
I have written Principles of Computer Hardware to achieve first-level courses in computer hardware and provided sec-
two goals. The first is to teach students the basic concepts on tions introducing operating systems and local area networks,
which the stored-program digital computer is founded. as these two topics are so intimately related to the hardware of
These include the representation and manipulation of infor- the computer. Finally, I have discovered that stating a formula
mation in binary form, the structure or architecture of a com- or a theory is not enough—many students like to see an
puter, the flow of information within a computer, and the actual application of the formula. Wherever possible I have
exchange of information between its various peripherals. We provided examples.
answer the questions, ‘How does a computer work’, and ‘How Like most introductory books on computer architecture,
is it organized?’ The second goal is to provide students with a I have chosen a specific microprocessor as a vehicle to illustrate
foundation for further study. In particular, the elementary some of the important concepts in computer architecture. The
treatment of gates and Boolean algebra provides a basis for ideal computer architecture is rich in features and yet easy to
a second-level course in digital design, and the introduction understand without exposing the student to a steep learning
to the CPU and assembly-language programming provides a curve. Some microprocessors have very complicated architec-
basis for advanced courses on computer architecture/organi- tures that confront the students with too much fine detail early
zation or microprocessor systems design. in their course. We use Motorola’s 68K microprocessor because
This book is written for those with no previous knowledge it is easy to understand and incorporates many of the most
of computer architecture. The only background information important features of a high-performance architecture. This
needed by the reader is an understanding of elementary alge- book isn’t designed to provide a practical assembly language
bra. Because students following a course in computer science programming course. It is intended only to illustrate the oper-
or computer technology will also be studying a high-level ation of a central processing unit by means of a typical assem-
language, we assume that the reader is familiar with the con- bly language. We also take a brief look at other microprocessors
cepts underlying a high-level language. to show the range of computer architectures available.
When writing this book, I set myself three objectives. By You will see the words computer, CPU, processor, micro-
adopting an informal style, I hope to increase the enthusiasm processor, and microcomputer in this and other texts. The part
of students who may be put off by the formal approach of of a computer that actually executes a program is called a
more traditional books. I have also tried to give students an CPU (central processing unit) or more simply a processor.
insight into computer hardware by explaining why things are A microprocessor is a CPU fabricated on a single chip of sili-
as they are, instead of presenting them with information to be con. A computer that is constructed around a microprocessor
learned and accepted without question. I have included sub- can be called a microcomputer. To a certain extent, these terms
jects that would seem out of place in an elementary first-level are frequently used interchangeably.
CONTENTS
1.6 The stored program computer—an overview 19 3.1 The RS flip-flop 103
1.7 The PC—a naming of parts 22 3.1.1 Analyzing a sequential circuit by assuming initial
SUMMARY 23 conditions 104
P RO B L E M S 23 3.1.2 Characteristic equation of an RS flip-flop 105
3.1.3 Building an RS flip-flop from NAND gates 106
2 Gates, circuits, and combinational logic 25 3.1.4 Applications of the RS flip-flop 106
3.1.5 The clocked RS flip-flop 108
2.1 Analog and digital systems 26
3.2 The D flip-flop 109
2.2 Fundamental gates 28 3.2.1 Practical sequential logic elements 110
2.2.1 The AND gate 28 3.2.2 Using D flip-flops to create a register 110
2.2.2 The OR gate 30 3.2.3 Using Digital Works to create a register 111
2.2.3 The NOT gate 31 3.2.4 A typical register chip 112
2.2.4 The NAND and NOR gates 31
2.2.5 Positive, negative, and mixed logic 32 3.3 Clocked flip-flops 113
3.3.1 Pipelining 114
2.3 Applications of gates 34
3.3.2 Ways of clocking flip-flops 115
2.4 Introduction to Digital Works 40 3.3.3 Edge-triggered flip-flops 116
2.4.1 Creating a circuit 41 3.3.4 The master–slave flip-flop 117
2.4.2 Running a simulation 45 3.3.5 Bus arbitration—an example 118
2.4.3 The clock and sequence generator 48
3.4 The JK flip-flop 120
2.4.4 Using Digital Works to create embedded circuits 50
2.4.5 Using a macro 52 3.5 Summary of flip-flop types 121
xii Contents
4.1 Bits, bytes, words, and characters 146 5.4.1 Status flags 217
5.4.2 Data movement instructions 218
4.2 Number bases 148
5.4.3 Arithmetic instructions 218
4.3 Number base conversion 150 5.4.4 Compare instructions 220
4.3.1 Conversion of integers 150 5.4.5 Logical instructions 220
4.3.2 Conversion of fractions 152 5.4.6 Bit instructions 221
5.4.7 Shift instructions 221
4.4 Special-purpose codes 153 5.4.8 Branch instructions 223
4.4.1 BCD codes 153 SUMMARY 226
4.4.2 Unweighted codes 154 P RO B L E M S 226
4.5 Error-detecting codes 156
4.5.1 Parity EDCs 158 6 Assembly language programming 228
4.5.2 Error-correcting codes 158
4.5.3 Hamming codes 160 6.1 Structure of a 68K assembly language program 228
4.5.4 Hadamard codes 161 6.1.1 Assembler directives 229
6.1.2 Using the cross-assembler 232
4.6 Data-compressing codes 163
4.6.1 Huffman codes 164 6.2 The 68K’s registers 234
4.6.2 Quadtrees 167 6.2.1 Data registers 235
6.2.2 Address registers 236
4.7 Binary arithmetic 169
4.7.1 The half adder 170 6.3 Features of the 68K’s instruction set 237
4.7.2 The full adder 171 6.3.1 Data movement instructions 237
4.7.3 The addition of words 173 6.3.2 Using arithmetic operations 241
6.3.3 Using shift and logical operations 244
4.8 Signed numbers 175
6.3.4 Using conditional branches 244
4.8.1 Sign and magnitude representation 176
4.8.2 Complementary arithmetic 176 6.4 Addressing modes 249
4.8.3 Two’s complement representation 177 6.4.1 Immediate addressing 249
4.8.4 One’s complement representation 180 6.4.2 Address register indirect addressing 250
6.4.3 Relative addressing 259
4.9 Floating point numbers 181
4.9.1 Representation of floating point numbers 182 6.5 The stack 262
4.9.2 Normalization of floating point numbers 183 6.5.1 The 68K stack 263
4.9.3 Floating point arithmetic 186 6.5.2 The stack and subroutines 266
4.9.4 Examples of floating point calculations 188 6.5.3 Subroutines, the stack, and parameter
passing 271
4.10 Multiplication and division 189
4.10.1 Multiplication 189 6.6 Examples of 68K programs 280
4.10.2 Division 194 6.6.1 A circular buffer 282
SUMMARY 198 SUMMARY 287
P RO B L E M S 198 P RO B L E M S 287
Contents xiii
11.4 Color displays and printers 457 12.7.3 RAID systems 531
11.4.1 Theory of color 457 12.7.4 The floppy disk drive 532
11.4.2 Color CRTs 458 12.7.5 Organization of data on disks 533
11.4.3 Color printers 460 12.8 Optical memory technology 536
11.5 Other peripherals 461 12.8.1 Storing and reading information 537
11.5.1 Measuring position and movement 461 12.8.2 Writable CDs 540
11.5.2 Measuring temperature 463 SUMMARY 543
11.5.3 Measuring light 464 P RO B L E M S 543
11.5.4 Measuring pressure 464
11.5.5 Rotation sensors 464 13 The operating system 547
11.5.6 Biosensors 465
11.6 The analog interface 466 13.1 The operating system 547
11.6.1 Analog signals 466 13.1.1 Types of operating system 548
11.6.2 Signal acquisition 467
13.2 Multitasking 550
11.6.3 Digital-to-analog conversion 473
13.2.1 What is a process? 551
11.6.4 Analog-to-digital conversion 477
13.2.2 Switching processes 551
11.7 Introduction to digital signal processing 486
13.3 Operating system support from the CPU 554
11.7.1 Control systems 486
13.3.1 Switching states 555
11.7.2 Digital signal processing 488
13.3.2 The 68K’s two Stacks 556
SUMMARY 491
P RO B L E M S 492 13.4 Memory management 561
13.4.1 Virtual memory 563
12 Computer memory 493 13.4.2 Virtual memory and the 68K family 565
SUMMARY 568
12.1 Memory hierarchy 493 P RO B L E M S 568
12.2 What is memory? 496
INTRODUCTION
In this chapter we set the scene for the rest of the book. We define what we mean by computer
hardware, explain just why we teach computer hardware to computer science students, provide a
very brief history of computing, and look at the role of the computer.
abstraction of the computer. A computer’s organization should the lives of computer scientists and programmers be
describes how the architecture is implemented; that is, it made miserable by forcing them to learn what goes on inside
defines the hardware used to implement the architecture. a computer?
Let’s look at a simple example that distinguishes between If topics in the past have fallen out of the curriculum with no
architecture and organization. A computer with a 32-bit obviously devastating effect on the education of students, what
architecture performs operations on numbers that are 32 bits about today’s curriculum? Do we still need to teach computer
wide. You could build two versions of this computer. One is science students about the internal operation of the computer?
a high-performance device that adds two 32-bit numbers in a Computer architecture is the oldest component of the
single operation. The other is a low-cost processor that gets computer curriculum. The very first courses on computer
a 32-bit number by bringing two 16-bit numbers from mem- science were concerned with the design and construction of
ory one after the other. Both computers end up with the same computers. At that time programming was in its infancy and
result, but one takes longer to get there. They have the same compilers, operating systems, and databases did not exist.
architecture but different organizations. In the 1940s, working with computers meant building com-
Although hardware and software are different entities, puters. By the 1960s computer science had emerged as a
there is often a trade-off between them. Some operations can discipline. With the introduction of courses in program-
be carried out either by a special-purpose hardware system or ming, numerical methods, operating systems, compilers, and
by means of a program stored in the memory of a general- databases, the then curriculum reflected the world of the
purpose computer. The fastest way to execute a given task is mainframe.
to build a circuit dedicated exclusively to the task. Writing a In the 1970s computer architecture was still, to a considerable
program to perform the same task on an existing computer extent, an offshoot of electronics. Texts were more concerned
may be much cheaper, but the task will take longer, as the with the circuits in a computer than with the fundamental prin-
computer’s hardware wasn’t optimized to suit the task. ciples of computer architecture as now encapsulated by the
Developments in computer technology in the late 1990s expression instruction set architecture (ISA).
further blurred the distinction between hardware and soft- Computer architecture experienced a renaissance in the
ware. Digital circuits are composed of gates that are wired 1980s. The advent of the low-cost microprocessor-based sys-
together. From the mid-1980s onward manufacturers were tems and the single-board computer meant that computer
producing large arrays of gates that could be interconnected science students could study and even get hands-on experi-
electronically to create a particular circuit. As technology ence of microprocessors. They could build simple systems,
progressed it became possible to reconfigure the connections test them, interface them to peripherals such as LEDs and
between gates while the circuit was operating. We now have switches, and write programs in machine code. Bill Gates
the technology to create computers that can repair errors, himself is a product of this era.
restructure themselves as the state of the art advances, or even Assembly language programming courses once mirrored
evolve. high-level language programming courses—students were
taught algorithms such as sorting and searching in assembly
language, as if assembly language were no more than the poor
person’s C. Such an approach to computer architecture is
1.2 Why do we teach computer now untenable. If assembly language is taught at all today, it is
hardware? used as a vehicle to illustrate instruction sets, addressing
modes, and other aspects of a processor’s architecture.
A generation ago, school children in the UK had to learn In the late 1980s and early 1990s computer architecture
Latin in order to enter a university. Clearly, at some point it underwent another change. The rise of the RISC micro-
was thought that Latin was a vital prerequisite for everyone processor turned the focus of attention from complex
going to university. When did they realize that students could instruction set computers to the new high-performance,
still benefit from a university education without a prior highly pipelined, 32-bit processors. Moreover, the increase in
knowledge of Latin? Three decades ago students taking a the performance of microprocessors made it harder and
degree in electronics had to study electrodynamics, the dance harder for classes to give students the hands-on experience
of electrons in magnetic fields, a subject so frightening that they had a few years earlier. In the 1970s a student could con-
older students passed on its horrors to the younger ones in struct a computer with readily available components and
hushed tones. Today, electrodynamics is taught only to stu- simple electronic construction techniques. By the 1990s clock
dents on specialist courses. rates rose to well over 100 MHz and buses were 32 bits wide
We can watch a television program without understanding making it difficult for students to construct microprocessor-
how a cathode ray tube operates, or fly in a Jumbo jet without based systems as they did in the 1980s. High clock rates
ever knowing the meaning of thermodynamics. Why then require special construction techniques and complex chips
1.2 Why do we teach computer hardware? 3
have hundreds of connections rather than the 40- or 64-pin program that did not provide students with an insight into the
packages of the 8086/68K era. computer would be strange in a university that purports to edu-
In the 1990s computer architecture was largely concerned cate students rather than to merely train them.
with the instruction set architecture, pipelining, hazards, Those supporting the continued teaching of computer
superscalar processors, and cache memories. Topics such as architecture employ several traditional arguments. First,
microprocessor systems design at the chip level and micro- education is not the same as training and CS students are not
processor interfacing had largely vanished from the CS cur- simply being shown how to use commercial computer pack-
riculum. These topics belonged to the CEng and EE curricula. ages. A course leading to a degree in computer science should
In the 1990s a lot was happening in computer science; for also cover the history and the theoretical basis for the subject.
example, the introduction of new subject areas such as Without an appreciation of computer architecture, the com-
object-oriented programming, communications and net- puter scientist cannot understand how computers have
works, and the Internet/WWW. The growth of the computer developed and what they are capable of.
market, particularly for those versed in the new Internet- However, there are concrete reasons why computer archi-
based skills, caused students to look at their computing tecture is still relevant in today’s world. Indeed, I would
curricula in a rather pragmatic way. Many CS students will maintain that computer architecture is as relevant to the
join companies using the new technologies, but very few of needs of the average CS student today as it was in the past.
them indeed will ever design chips or become involved with Suppose a graduate enters the industry and is asked to select
cutting-edge work in computer architecture. At my own uni- the most cost-effective computer for use throughout a large
versity, the demand for courses in Internet-based computing organization. Understanding how the elements of a com-
has risen and fewer students have elected to take computer puter contribute to its overall performance is vital—is it
architecture when it is offered as an elective. better to spend $50 on doubling the size of the cache or $100
on increasing the clock speed by 500 MHz?
1.2.1 Should computer architecture Computer architecture cannot be divorced entirely from
software. The majority of processors are found not in PCs or
remain in the CS curriculum? workstations but in embedded1 applications. Those designing
Developments in computer science have put pressure on multiprocessors and real-time systems have to understand
course designers to remove old material to make room for the fundamental architectural concepts and limitations of com-
new. The fraction of students that will ever be directly mercially available processors. Someone developing an auto-
involved in computer design is declining. Universities pro- mobile electronic ignition system may write their code in C,
vide programs in multimedia-based computing and visual- but might have to debug the system using a logic analyzer that
ization at both undergraduate and postgraduate levels. displays the relationship between interrupt requests from
Students on such programs do not see the point of studying engine sensors and the machine-level code.
computer architecture. There are two other important reasons for teaching com-
Some have suggested that computer architecture is a prime puter architecture. The first reason is that computer architec-
candidate for pruning. It is easy to argue that computer archi- ture incorporates a wealth of important concepts that appear
tecture is as irrelevant to computer science as, say, Latin is to in other areas of the CS curriculum. This point is probably
the study of contemporary English literature. If a student least appreciated by computer scientists who took a course in
never writes an assembly language program or designs an architecture a long time ago and did little more than learn
instruction set, or interfaces a memory to a processor, why about bytes, gates, and assembly language. The second reason
should we burden them with a course in computer architec- is that computer architecture covers more than the CPU; it is
ture? Does the surgeon study metallurgy in order to under- concerned with the entire computer system. Because so many
stand how a scalpel operates? computer users now have to work with the whole system
It’s easy to say that an automobile driver does not have to (e.g. by configuring hard disks, by specifying graphics cards,
understand the internal combustion engine to drive an auto- by selecting a SCSI or FireWire interface), a course covering
mobile. However, it is patently obvious that a driver who the architecture of computer systems is more a necessity than
understands mechanics can drive in such a way as to enhance a luxury.
the life of the engine and to improve its performance. The Some computer architecture courses cover the architecture
same is true of computer architecture; understanding com- and organization of the processor but make relatively little
puter systems can improve the performance of software if the
software is written to exploit the underlying hardware. 1
An embedded computer is part of a product (digital camera, cell
The digital computer lies at the heart of computer science.
phone, washing machine) that is not normally regarded as a computing
Without it, computer science would be little more than a branch device. The end user does not know about the computer and does not
of theoretical mathematics. The very idea of a computer science have to program it.
4 Chapter 1 Introduction to computer hardware
reference to buses, memory systems, and high-performance crashing the operating system or other applications. Covering
peripherals such as graphics processors. Yet, if you scan the these topics in an architecture course makes the student
pages of journals devoted to personal/workstation comput- aware of the support the processor provides for the operating
ing, you will rapidly discover that much attention is focused system and enables those teaching operating system courses
on aspects of the computer system other than the CPU itself. to concentrate more on operating system facilities than on
Computer technology was once driven by the paperless- the mechanics of the hardware.
office revolution with its demand for low-cost mass storage, High-level languages make it difficult to access peripherals
sufficient processing power to rapidly recompose large docu- directly. By using an assembly language we can teach students
ments, and low-cost printers. Today, computer technology is how to write device drivers that directly control interfaces.
being driven by the multimedia revolution with its insatiable Many real interfaces are still programmed at machine level by
demand for pure processing power, high bandwidths, low accessing registers within them. Understanding computer
latencies, and massive storage capacities. architecture and assembly language can facilitate the design
These trends have led to important developments in com- of high-performance interfaces.
puter architecture such as special hardware support for mul- Programming and data structures Students encounter the
timedia applications. The demands of multimedia are being notion of data types and the effect of strong and weak data
felt in areas other than computer architecture. Hard disks typing when they study high-level languages. Because
must provide a continuous stream of data because people can computer architecture deals with information in its most
tolerate a degraded picture much better than a picture with primitive form, students rapidly become familiar with the
even the shortest discontinuities. Such demands require advantages and disadvantages of weak typing. They learn the
efficient track-seeking algorithms, data buffering, and high- power that you have over the hardware by being able to apply
speed, real-time error correction and detection algorithms. almost any operations to binary data. Equally, they learn
Similarly, today’s high data densities require frequent recal- the pitfalls of weak typing as they discover the dangers of
ibration of tracking mechanisms due to thermal effects. Disk inappropriate operations on data.
drives now include SMART technologies from the AI world Computer architecture is concerned with both the type of
that are able to predict disk failure before it occurs. These operations that act on data and the various ways in which the
developments have as much right to be included in the archi- location of an operand can be accessed in memory. Computer
tecture curriculum as developments in the CPU. addressing modes and the various means of accessing data
naturally lead on to the notion of pointers. Students learn
about how pointers function at machine level and the sup-
1.2.2 Supporting the CS curriculum
port offered for pointers by various architectures. This aspect
It is in the realm of software that you can most easily build a is particularly important if the student is to become a C
case for the teaching of assembly language. During a student’s programmer.
career, they will encounter abstract concepts in areas ranging An understanding of procedure call and parameter passing
from programming languages to operating systems to real- mechanisms is vital to anyone studying processor perform-
time programming to AI. The foundation of many of these ance. Programming in assembly language readily demon-
concepts lies in assembly language programming and computer strates the passing of parameters by value and by reference.
architecture. Computer architecture provides bottom-up Similarly, assembly language programming helps you to
support for the top-down methodology taught in high-level understand concepts such as the use of local variables and
languages. Consider some of the areas where computer re-entrant programming.
architecture can add value to the CS curriculum. Students sometimes find the concept of recursion difficult.
The operating system Computer architecture provides a You can use an assembly language to demonstrate how recur-
firm basis for students taking operating system courses. In sion operates by tracing through the execution of a program.
computer architecture students learn about the hardware The student can actually observe how the stack grows as
that the operating system controls and the interaction procedures are called.
between hardware and software; for example, in cache sys- Computer science fundamentals Computer architecture is
tems. Consider the following two examples of the way in awash with concepts that are fundamental to computer science
which the underlying architecture provides support for generally and which do not appear in other parts of the
operating system facilities. undergraduate curriculum. A course in computer architecture
Some processors operate in either a privileged or a user can provide a suitable forum for incorporating fundamental
mode. The operating system runs in the privileged or pro- principles in the CS curriculum. For example, a first course in
tected mode and all applications run in the user mode. This computer architecture introduces the student to bits and
mechanism creates a secure environment in which the effects binary encoding techniques. A few years ago much time
of an error in an application program can be prevented from would have been spent on special-purpose codes for BCD
1.3 An overview of the book 5
arithmetic. Today, the professor is more likely to introduce introduction to flip-flops and their application to sequential
error-correcting codes (important in data communications circuits such as counters, timers, and sequencers.
systems and secure storage mechanisms) and data-compression Computer architecture and assembly language The prim-
codes (used by everyone who has ever zipped a file or used a itive instructions that directly control the operation of a com-
JPEG-encoded image). puter are called machine-code instructions and are composed
of sequences of binary values stored in memory. As program-
ming in machine code is exceedingly tedious, an aid to
1.3 An overview of the book machine code programming called assembly language has
been devised. Assembly language is shorthand permitting the
It’s difficult to know just what should be included in an intro- programmer to write machine-code instructions in a simple
ductory course on computer architecture, organization, and abbreviated form of plain language. High-level languages
hardware—and what should be excluded. Any topic can be (Java, C, Pascal, BASIC) are sometimes translated into a series
expanded to an arbitrary extent; if we begin with gates and of assembly-language instructions by a compiler as an inter-
Boolean algebra, do we go on to semiconductor devices and mediate step on the way to pure machine code. This interme-
then semiconductor physics? In this book, we cover the mater- diate step serves as a debugging tool for programmers who
ial specified by typical computer curricula. However, I have wish to examine the operation of the compiler and the output
included a wider range of material because the area of influ- it produces. Computer architecture is the assembly language
ence encompassed by the digital computer has expanded programmer’s view of a computer.
greatly in recent years. The major subject areas dealt with in Programmers writing in assembly language require a
this book are outlined below. detailed knowledge of the architecture of their machines,
Computer arithmetic Our system of arithmetic using the unlike the corresponding programmers operating in high-
base 10 has evolved over thousands of years. The computer car- level languages. At this point I must say that we introduce
ries out its internal operations on numbers represented in the assembly language to explain the operation of the central pro-
base two. This anomaly isn’t due to some magic power inher- cessing unit. Apart from certain special exceptions, programs
ent in binary arithmetic but simply because it would be uneco- should be written in a high-level language whenever possible.
nomic to design a computer to operate in denary (base 10) Computer organization This topic is concerned with how a
arithmetic. At this point I must make a comment. Time and computer is arranged in terms of its building blocks (i.e. the
time again, I read in the popular press that the behavior of logic and sequential circuits made from gates and flip-flops).
digital computers and their characteristics are due to the fact We introduce the architecture of a simple hypothetical com-
that they operate on bits using binary arithmetic whereas we puter and show how it can be organized in terms of func-
humans operate on digits using decimal arithmetic. That idea tional units. That is, we show how the computer goes about
is nonsense. Because there is a simple relationship between reading an instruction from memory, decoding it, and then
binary and decimal numbers, the fact that computers represent executing it.
information in binary form is a mere detail of engineering. It’s Input/output It’s no good having a computer unless it can
the architecture and organization of a computer that makes it take in new information (programs and data) and output the
behave in such a different way to the brain. results of its calculations. In this section we show how
Basic logic elements and Boolean algebra Today’s techno- information is moved into and out of the computer. The
logy determines what a computer can do. We introduce the operation of three basic input/output devices is described:
basic logic elements, or gates, from which a computer is made the keyboard, the display, and the printer.
up and show how these can be put together to create more We also examine the way in which analog signals can be
complex units such as arithmetic units. The behavior of these converted into digital form, processed digitally by a com-
gates determines both the way in which the computer carries puter, and then converted back into analog form. Until the
out arithmetic operations and the way in which the func- mid-1990s it was uneconomical to process rapidly changing
tional parts of a computer interact to execute a program. We analog signals (e.g. speech, music, video) digitally. The advent
need to understand gates in order to appreciate why the com- of high-speed low-cost digital systems has opened up a new
puter has developed in the way it has. The operation of cir- field of computing called digital signal processing (DSP). We
cuits containing gates can be described in terms of a formal introduce DSP and outline some of the basic principles.
notation called Boolean algebra. An introduction to Boolean Memory devices A computer needs memory to hold pro-
algebra is provided because it enables designers to build cir- grams, data, and any other information it may require at
cuits with the least number of gates. some point in the future. We look at the immediate access
As well as gates, computers require devices called flip-flops, store and the secondary store (sometimes called backing
which can store a single binary digit. The flip-flop is the store). An immediate access store provides a computer with
basic component of many memory units. We provide an the data it requires in approximately the same time as it takes
6 Chapter 1 Introduction to computer hardware
the computer to execute one of its machine-level operations. 1.4.1 Navigation and mathematics
The secondary store is very much slower and it takes thou-
sands of times longer to access data from a secondary store The development of navigation in the eighteenth century was
than from an immediate access store. However, secondary probably the most important driving force behind auto-
storage is used because it is immensely cheaper than an mated computation. It’s easy to tell how far north or south of
immediate access store and it is also non-volatile (i.e. the data the equator you are—you measure the height of the sun
isn’t lost when you switch the computer off). The most pop- above the horizon at midday and then use the elevation to
ular form of secondary store is the disk drive, which relies on work out your latitude. Unfortunately, calculating your lon-
magnetizing a moving magnetic material to store data. gitude relative to the prime meridian through Greenwich in
Optical storage technology in the form of the CD and DVD England is very much more difficult. Longitude is determined
became popular in the 1990s because it combines the rela- by comparing your local time (obtained by observing the
tively fast access time of the disk with the large capacity and angle of the sun) with the time at Greenwich.
low cost of the tape drive. The mathematics of navigation uses trigonometry, which
Operating systems and the computer An operating system is concerned with the relationship between the sides and
coordinates all the functional parts of the computer and pro- angles of a triangle. In turn, trigonometry requires an accur-
vides an interface for the user. We can’t cover the operating ate knowledge of the sine, cosine, and tangent of an angle.
system in detail here. However, because the operating system Those who originally devised tables of sines and other math-
is intimately bound up with the computer’s hardware, we do ematical functions (e.g. square roots and logarithms) had to
cover two of its aspects—multiprogramming and memory do a lot of calculation by hand. If x is expressed in radians
management. Multiprogramming is the ability of a computer (where 2 radians 360) and x 1, the expression for
to appear to run two or more programs simultaneously. sin(x) can be written as an infinite series of the form
Memory management permits several programs to operate
x 3 x 5 x7 … x2n1
as though each alone occupied the computer’s memory and sin(x) x (1)n
3! 5! 7! (2n 1)!
enables a computer with a small, high-speed random access
memory and a large, low-speed serial access memory (i.e. Although the calculation of sin(x) requires the summation
hard disk) to appear as if it had a single large high-speed ran- of an infinite number of terms, we can obtain a reasonably
dom access memory. accurate approximation to sin(x) by adding just a handful of
Computer communications Computers are networked when terms together because xn tends towards zero as n increases
they are connected together. Networking computers has for x 1.
many advantages, not least of which is the ability to share An important feature of the formula for sin(x) is that it
peripherals such as printers and scanners. Today we have two involves nothing more than the repetition of fundamental
types of network—the local area network (LAN), which arithmetic operations (addition, subtraction, multiplication,
interconnects computers within a building, and the wide area and division). The first term in the series is x itself. The sec-
network, which interconnects computers over much greater ond term is x 3/3!, which is derived from the first term by
distances (e.g. the Internet). Consequently, we have devoted a multiplying it by x2 and dividing it by 1 2 3. Each new
section to showing how computers communicate with each term is formed by multiplying the previous term by x2 and
other. Three aspects of computer communications are exam- dividing it by 2n(2n 1), where n is number of the term. It
ined. The first is the protocols or rules that govern the way in would eventually occur to people that this process could be
which information is exchanged between systems in an mechanized.
orderly fashion. The second is the way in which digital
information in a computer is encoded in a form suitable for 1.4.2 The era of mechanical computers
transmission over a serial channel, the various types of
channel, the characteristics of the physical channel, and how During the seventeenth century major advances were made in
data is reconstituted at the receiver. The third provides a watch making; for example, in 1656 Christiaan Huygens
brief overview of both local area and wide area networks. designed the first pendulum clock. The art of watch making
helped develop the gear wheels required by the first mechanical
calculators. In 1642 the French scientist Blaise Pascal designed
1.4 History of computing a simple mechanical adder and subtracter using gear wheels
with 10 positions marked on them. One complete rotation of
The computer may be a marvel of our age, but it has had a long a gear wheel caused the next wheel on its left to move one posi-
and rich history. Writing a short introduction to computer tion (a bit like the odometer used to record an automobile’s
history is difficult because there is so much to cover. Here we mileage). Pascal’s most significant contribution was the use of
provide some of the milestones in the computer’s development. a ratchet device that detected a carry (i.e. a rotation of a wheel
1.4 History of computing 7
from 9 to 0) and nudged the next wheel on the left one digit.
Number Number First Second
In other words, if two wheels show 58 and the right-hand squared difference difference
wheel is rotated two positions forward, it moves to the 0 posi-
tion and advances the 5 to 6 to get 60. Pascal’s calculator, the 1 1
Pascaline, could perform addition only. 2 4 3
In fact, Wilhelm Schickard, rather than Pascal, is now 3 9 5 2
generally credited with the invention of the first mechanical
4 16 7 2
calculator. His device, created in 1623, was more advanced
5 25 9 2
than Pascal’s because it could also perform partial multiplica-
6 36 11 2
tion. Schickard died in a plague and his invention didn’t
receive the recognition it merited. Such near simultaneous 7 49 13 2
developments in computer hardware have been a significant
feature of the history of computer hardware. Table 1.1 The use of finite differences to calculate squares.
Within a few decades, mechanical computing devices
advanced to the stage where they could perform addition,
subtraction, multiplication, and division—all the operations subtraction rather like Pascal’s mechanical adder. Its purpose
required by armies of clerks to calculate the trigonometric was to mechanize the calculation of polynomial functions
functions we mentioned earlier. and automatically print the result. It was a calculator rather
than a computer because it could carry out only a set of
The industrial revolution and early predetermined operations.
control mechanisms Babbage’s difference engine employed finite differences to
If navigation provided a requirement for mechanized com- calculate polynomial functions. Trigonometric functions can
puting, other developments provided important steps along be expressed as polynomials in the form a 0 x a1x1
the path to the computer. By about 1800 the industrial a2x2 . . . The difference engine can evaluate such expres-
revolution in Europe was well under way. Weaving was one sions automatically. Table 1.1 demonstrates how you can use
of the first industrial processes to be mechanized. A weaving the method of finite differences to create a table of squares
loom passes a shuttle pulling a horizontal thread to and fro without having to use multiplication. The first column con-
between vertical threads held in a frame. By changing the tains the natural integers 1, 2, 3, . . . The second column
color of the thread pulled by the shuttle and selecting whether contains the squares of these integers (i.e. 1, 4, 9, . . .). Column
the shuttle passes in front of or behind the vertical threads, 3 contains the first difference between successive pairs of
you can weave a particular pattern. Controlling the loom numbers in column 2; for example, the first value is 4 1 3,
manually is tedious and time consuming. In 1801 Joseph the second value is 9 4 5, and so on. The final column is
Jacquard designed a loom that could automatically weave a the second difference between successive pairs of first differ-
predetermined pattern. The information necessary to control ences. As you can see, the second difference is always 2.
the loom was stored in the form of holes cut in cards—the Suppose we want to calculate the value of 82 using finite
presence or absence of a hole at a certain point controlled the differences. We simply use Table 1.1 in reverse by starting
behavior of the loom. Information was read by rods that with the second difference and working back to the result. If
pressed against the card and either went through a hole or the second difference is 2, the next first difference (after 72) is
were stopped by the card. Some complex patterns required as 13 2 15. Therefore, the value of 82 is the value of 72 plus
many as 10 000 cards strung together in the form of a tape. the first difference; that is, 49 15 64. We have generated
82 without using multiplication. This technique can be
Babbage and the computer extended to evaluate many other mathematical functions.
Two of the most significant advances in computing were Babbage’s difference engine project was cancelled in 1842
made by Charles Babbage, a UK mathematician born in 1792: because of increasing costs. He did design a simpler differ-
his difference engine and his analytical engine. Like other ence engine using 31-digit numbers to handle seventh-order
mathematicians of his time, Babbage had to perform all differences, but no one was interested in financing it. In 1853
calculations by hand and sometimes he had to laboriously George Scheutz in Sweden constructed a working difference
correct errors in published mathematical tables. Living in the engine using 15-digit arithmetic and fourth-order differ-
age of steam, it was quite natural that Babbage asked himself ences. Incidentally, in 1991 a team at the Science Museum in
whether mechanical means could be applied to arithmetic London used modern construction techniques to build
calculations. Babbage’s difference engine. It worked.
The difference engine was a complex array of intercon- Charles Babbage went on to design the analytical engine,
nected gears and linkages that performed addition and which was to be capable of performing any mathematical
8 Chapter 1 Introduction to computer hardware
operation automatically. This truly remarkable and entirely One of the first effective communication systems was the
mechanical device was nothing less than a general-purpose optical semaphore, which passed visual signals from tower to
computer that could be programmed. The analytical engine tower across Europe. Claude Chappe in France developed a
included many of the elements associated with a modern elec- system with two arms, each of which could be in one of seven
tronic computer—an arithmetic processing unit that carries positions. The Chappe telegraph could send a message across
out all the calculations, a memory that stores data, and input France in about half an hour (good weather permitting). The
and output devices. Unfortunately, the sheer scale of the ana- telegraph was used for commercial purposes, but it also
lytical engine rendered its construction, at that time, impos- helped Napoleon to control his army.
sible. However, it is not unreasonable to call Babbage the King Maximilian had seen how the French visual sema-
father of the computer because his machine incorporated phore system had helped Napoleon’s military campaigns and
many of the intellectual concepts at the heart of the computer. in 1809 he asked the Bavarian Academy of Sciences to devise
Babbage envisaged that his analytical engine would be a scheme for high-speed communication over long distances.
controlled by punched cards similar to those used to control Samuil T. von Sömmering suggested a crude telegraph using
the operation of the Jacquard loom. Two types of punched 35 conductors, one for each character. Sömmering’s tele-
card were required. Operation cards specified the sequence of graph transmits electricity from a battery down one of these
operations to be carried out by the analytical engine and vari- 35 wires where, at the receiver, the current is passed through
able cards specified the locations in the store of inputs and a tube of acidified water. Passing a current through the water
outputs. breaks it down into oxygen and hydrogen. To use the
One of Babbage’s collaborators was Ada Gordon2, a math- Sömmering telegraph you detected the bubbles that appeared
ematician who became interested in the analytical engine when in one of the 35 glass tubes and then wrote down the cor-
she translated a paper on it from French to English. When responding character. Sömmering’s telegraph was ingenious
Babbage discovered the paper he asked her to expand the but too slow to be practical.
paper. She added about 40 pages of notes about the machine In 1819 Hans C. Oersted made one of the greatest discover-
and provided examples of how the proposed analytical engine ies of all time when he found that an electric current creates a
could be used to solve mathematical problems. Gordon magnetic field round a conductor. This breakthrough allowed
worked closely with Babbage and it’s been reported that she you to create a magnetic field at will. In 1828 Cooke exploited
even suggested the use of the binary system to store data. She Oersted’s discovery when he invented a telegraph that used
noticed that certain groups of operations are carried out over the magnetic field round a wire to deflect a compass needle.
and over again during the course of a calculation and pro- The growth of the railway networks in the early nineteenth
posed that a conditional instruction be used to force the ana- century spurred the development of the telegraph because you
lytical engine to perform the same sequence of operations had to warn stations down the line that a train was arriving. By
many times. This action is the same as the repeat or loop func- 1840 a 40-mile stretch between Slough and Paddington in
tion found in most of today’s high-level languages. London had been linked using the telegraph of Charles
Gordon devised algorithms to perform the calculation of Wheatstone and William Cooke. The Wheatstone and Cooke
Bernoulli numbers, making her one of the founders of numer- telegraph used five compass needles that normally hung in a
ical computation. Some regard Gordon as the world’s first vertical position. The needles could be deflected by coils to
computer programmer, who was constructing algorithms a point to the appropriate letter. You could transmit one of
century before programming became a recognized discipline— 20 letters (J, C, Q, U, X, and Z were omitted).
and long before any real computers were constructed.
Mechanical computing devices continued to be used in The first long-distance data links
compiling mathematical tables and performing the arithmetic We take wires and cables for granted. In the early nineteenth
operations used by everyone from engineers to accountants century, plastics hadn’t been invented and the only material
until about the 1960s. The practical high-speed computer had available for insulation waterproofing was a type of pitch
to await the development of the electronics industry. called asphaltum. In 1843 a form of rubber called gutta
percha was discovered. The Atlantic Telegraph Company cre-
1.4.3 Enabling technology— ated an insulated cable for underwater use containing a single
copper conductor made of seven twisted strands, surrounded
the telegraph by gutta percha insulation and protected by a ring of 18 iron
Many of the technological developments required to con- wires coated with hemp and tar.
struct a practical computer took place at the end of the
nineteenth century. The most important of these events was 2
Ada Gordon married William King in 1835. King inherited the title
the invention of the telegraph. We now provide a short history Earl of Lovelace and Gordon became Countess of Lovelace. Gordon is
of the development of telecommunications. often considered the founder of scientific computing.
1.4 History of computing 9
Submarine cable telegraphy began with a cable crossing lies in the physical properties of electrical conductors
the English Channel to France in 1850. The cable failed after and insulators. Thomson’s theories enabled engineers to
only a few messages had been exchanged and a more success- construct data links with much lower levels of distortion.
ful attempt was made the following year. Transatlantic cable Thomson contributed to computing by providing the the-
laying from Ireland began in 1857 but was abandoned when ory that describes the flow of pulses in circuits, which enabled
the strain of the cable descending to the ocean bottom caused the development of the telegraph and telephone networks. In
it to snap under its own weight. The Atlantic Telegraph turn, the switching circuits used to route messages through
Company tried again in 1858. Again, the cable broke after networks were used to construct the first electromechanical
only 3 miles but the two cable-laying ships managed to splice computers.
the two ends. The cable eventually reached Newfoundland in
August 1858 after suffering several more breaks and storm Developments in communications networks
damage. Although the first telegraph systems operated from point to
It soon became clear that this cable wasn’t going to be a point, the introduction of the telephone led to the develop-
commercial success. The receiver used the magnetic field from ment of switching centers. First-generation switching centers
the current in the cable to deflect a magnetized needle. employed a telephone operator who manually plugged a sub-
Unfortunately, after crossing the Atlantic the signal was too scriber’s line into a line connected to the next switching center
weak to be detected reliably. The original voltage used to drive in the link. By the end of the nineteenth century, the infra-
a current down the cable was approximately 600 V. So, they structure of computer networks was already in place.
raised the voltage to about 2000 V to drive more current along In 1897 an undertaker called Almon Strowger was annoyed
the cable and improve the detection process. Unfortunately, to find that he was not getting the trade he expected because
such a high voltage burned through the primitive insulation, the local telephone operator was connecting prospective
shorted the cable, and destroyed the first transatlantic tele- clients to Strowger’s competitor. So, Strowger cut out the
graph link after about 700 messages had been transmitted in human factor by inventing the automatic telephone exchange
3 months. that used electromechanical devices to route calls between
In England, the Telegraph Construction and Maintenance exchanges. When you dial a number using a rotary dial, a
Company developed a new 2300-mile-long cable weighing series of pulses are sent down the line to a rotary switch. If
9000 tons, which was three times the diameter of the failed you dial, for example, ‘5’, the five pulses move a switch five
1858 cable. Laying this cable required the largest ship in the steps clockwise to connect you to line number five, which
world, the Great Eastern. After a failed attempt in 1865 a routes your call to the next switching center. Consequently,
transatlantic link was finally established in 1866. It cost $100 when you phoned someone using Strowger’s technology the
in gold to transmit 20 words across the first transatlantic number you dialed determined the route your call took
cable at a time when a laborer earned $20/month. though the system.
By the time the telegraph was well established, radio was
Telegraph distortion and the theory of being developed. James Clerk Maxwell predicted radio waves
transmission lines in 1864 following his study of light and electromagnetic
The telegraph hadn’t been in use for very long before people waves. Heinrich Hertz demonstrated the existence of radio
discovered that it suffered from a problem called telegraph waves in 1887 and Guglielmo Marconi is credited with being
distortion. As the length of cables increased it became appar- the first to use radio to span the Atlantic in 1901.
ent that a sharply rising pulse at the transmitter end of a cable The light bulb was invented by Thomas A. Edison in 1879.
was received at the far end as a highly distorted pulse with Investigations into its properties led Ambrose Fleming to
long rise and fall times. This distortion meant that the 1866 discover the diode in 1904. A diode is a light bulb surrounded
transatlantic telegraph cable could transmit only eight words by a wire mesh that allows electricity to flow only one way
per minute. The problem was eventually handed to William between the filament (the cathode) and the mesh (the anode).
Thomson at the University of Glasgow. The flow of electrons from the cathode gave us the term
Thomson, who later became Lord Kelvin, was one of the ‘cathode ray tube’. In 1906 Lee de Forest modified Fleming’s
nineteenth century’s greatest scientists. He published more diode by placing a wire mesh between the cathode and anode.
than 600 papers, developed the second law of thermodynam- By changing the voltage on this mesh, it was possible to
ics, and created the absolute temperature scale. In 1855 change the flow of current between the cathode and anode.
Thomson presented a paper to the Royal Society analyzing This device, called a triode, could amplify signals. Without
the effect of pulse distortion, which became the cornerstone the vacuum tube to amplify weak signals, modern electronics
of what is now called transmission line theory. The transmis- would have been impossible. The term electronics refers to
sion line effect reduces the speed at which signals can change circuits with amplifying or active devices such as tubes or tran-
state. The cause of the problems investigated by Thomson sistors. The first primitive computers using electromechanical
10 Chapter 1 Introduction to computer hardware
devices did not use vacuum tubes and, therefore, these 1.4.4 The first electromechanical
computers were not electronic computers.
The telegraph, telephone, and vacuum tube were all steps
computers
on the path to the development of the computer and, later, The forerunner of today’s digital computers used electro-
computer networks. As each of these practical steps was mechanical components called relays, rather than electronic
taken, there was a corresponding development in the accom- circuits such as vacuum tubes and transistors. A relay is con-
panying theory (in the case of radio, the theory came before structed from a coil of wire wound round an iron cylinder.
the discovery). When a current flows through the coil, it generates a mag-
netic field that causes the iron to act like a magnet. A flat
Typewriters, punched cards, and tabulators springy strip of iron is located close to the iron cylinder.
Another important part of computer history is the humble When the cylinder is magnetized, the iron strip is attracted,
keyboard, which is still the prime input device of most which, in turn, opens or closes a switch. Relays can perform
personal computers. As early as 1711 Henry Mill, an any operation that can be carried out by the logic gates mak-
Englishman, described a mechanical means of printing text ing up today’s computers. You cannot construct fast com-
on paper a character at a time. In 1829 the American William puters from relays because they are far too slow, bulky, and
Burt was granted the first US patent for a typewriter, unreliable. However, the relay did provide a technology that
although his machine was not practical. It wasn’t until 1867 bridged the gap between the mechanical calculator and the
that three Americans, Christopher Sholes, Carlos Glidden, modern electronic digital computer.
and Samuel Soule, invented their Type-Writer, the forerun- One of the first electromechanical computers was built by
ner of the modern typewriter. One of the problems encoun- Konrad Zuse in Germany. Zuse’s Z2 and Z3 computers were
tered by Sholes was the tendency of his machine to jam when used in the early 1940s to design aircraft in Germany. The
digraphs such as ‘th’ and ‘er’ were typed. Hitting the ‘t’ and ‘h’ heavy bombing at the end of the Second World War
keys at almost the same time caused the letters ‘t’ and ‘h’ to destroyed Zuse’s computers and his contribution to the
strike the paper simultaneously and jam. His solution was to development of the computer was ignored for many years.
arrange the letters on the keyboard to avoid the letters of He is mentioned here to demonstrate that the notion of a
digraphs being located side by side. This layout has continued practical computer occurred to different people in different
until today and is now described by the sequence of the first places. The Z3 was completed in 1941 and was the World’s
six letters on the left of the top row—QWERTY. Because the first functioning programmable mechanical computer.
same digraphs do not occur in different languages, the layout Zuse’s Z4 computer was finished in 1945, was later taken to
of a French keyboard is different to that of an English key- Switzerland, and was used at the Federal Polytechnical
board. It is reported that Sholes made it easy to type ‘Type- Institute in Zurich until 1955.
Writer’ by putting all these characters on the same row. As Zuse was working on his computer in Germany, Howard
Another enabling technology that played a key role in the Aiken at Harvard University constructed his Harvard Mark I
development of the computer was the tabulating machine, a computer in 1944 with both financial and practical support
development of the mechanical calculator that processes data from IBM. Aiken was familiar with Babbage’s work and his
on punched cards. One of the largest data processing opera- electromechanical computer, which he first envisaged in 1937,
tions carried out in the USA during the nineteenth century operated in a similar way to Babbage’s proposed analytical
was the US census. A census involves taking the original data, engine. The original name for the Mark I was the Automatic
sorting and collating it, and tabulating the results. Sequence Controlled Calculator, which, perhaps, better
In 1879 Herman Hollerith became involved in the evaluation describes its nature.
of the 1880 US Census data. He devised an electric tabulating Aiken’s machine was a programmable calculator that was
system that could process data stored on cards punched by used by the US Navy until the end of the Second World
clerks from the raw census data. Hollerith’s electric tabulating War. Just like Babbage’s machine, the Mark I used decimal
machine could read cards, process the information on the cards, counter wheels to implement its main memory consisting of
and then sort them. The tabulator helped lay the foundations of 72 words of 23 digits plus a sign. The program was stored on
the data processing industry. a paper tape (similar to Babbage’s punched cards), although
Three threads converged to make the computer possible: operations and addresses (i.e. data) were stored on the same
Babbage’s calculating machines, which performed arithmetic tape. Input and output operations used punched cards or an
calculations; communications technology, which laid the electric typewriter. Because the Harvard Mark I treated data
foundations for electronics and even networking; and the and instructions separately, the term Harvard architecture is
tabulator because it and the punched card media provided a now applied to any computer with separate paths for data
means of controlling machines, inputting data into them, and instructions. The Harvard Mark I didn’t support condi-
and storing information. tional operations and therefore is not strictly a computer.
1.4 History of computing 11
However, it was later modified to permit multiple paper tape the ENIAC at the Moore School of Engineering at the
readers with a conditional transfer of control between University of Pennsylvania.
the readers. John von Neumann, one of the leading mathematicians of
his age, participated in EDVAC’s design. He wrote a docu-
ment entitled ‘First draft of a report on the EDVAC’, which
1.4.5 The first mainframes compiled the results of various design meetings. Before von
Relays have moving parts and can’t operate at very high Neumann, computer programs were stored either mechan-
speeds. It took the invention of the vacuum tube by John A. ically or in separate memories from the data used by the pro-
Fleming and Lee de Forest to make possible the design of gram. Von Neumann introduced the concept of the stored
high-speed electronic computers. John V. Atanasoff is now program—an idea so commonplace today that we take it for
credited with the partial construction of the first completely granted. In a stored program von Neumann machine both the
electronic computer. Atanasoff worked with Clifford Berry at program that specifies what operations are to be carried out
Iowa State College on their computer from 1937 to 1942. and the data used by the program are stored in the same
Their machine used a 50-bit binary representation of num- memory. The stored program computer consists of a memory
bers and was called the ABC (Atanasoff–Berry Computer). It containing instructions coded in binary form. The control
was designed to solve linear equations and wasn’t a general part of the computer reads an instruction from memory,
purpose computer. Atanasoff and Berry abandoned their carries it out, then reads the next instruction, and so on.
computer when they were assigned to other duties because of Although EDVAC is generally regarded as the first stored pro-
the war. gram computer, this is not strictly true because data and
instructions did not have a common format and were not
ENIAC interchangeable.
The first electronic general purpose digital computer was EDVAC promoted the design of memory systems. The
John W. Mauchly’s ENIAC (Electronic Numerical Integrator capacity of EDVAC’s mercury delay line memory was 1024
and Calculator), completed in 1945 at the University of words of 44 bits. A mercury delay line operates by converting
Pennsylvania. ENIAC was intended for use at the Army data into pulses of ultrasonic sound that continuously retic-
Ordnance Department to create firing tables that relate the ulate in a long column of mercury in a tube.
range of a field gun to its angle of elevation, wind conditions, EDVAC was not a great commercial success. Its construc-
and so on. For many years, ENIAC was regarded as the first tion was largely completed by April 1949, but it didn’t run its
electronic computer, although credit was later given to first applications program until October 1951. Because of its
Atanasoff and Berry because Mauchly had visited Atanasoff adoption of the stored program concept, EDVAC became a
and read his report on the ABC machine. topic in the first lecture course given on computers. These
ENIAC used 17 480 vacuum tubes and weighed about 30 t. lectures took place before EDVAC was actually constructed.
ENIAC was a decimal machine capable of storing 20 10-digit Another important early computer was IAS constructed by
decimal numbers. IBM card readers and punches imple- von Neumann and his colleagues at the Institute for
mented input and output operations. ENIAC was pro- Advanced Studies in Princeton. IAS is remarkably similar to
grammed by means of a plug board that looked like an old modern computers. Main memory was 1K words and a mag-
pre-automatic telephone switchboard; that is, a program was netic drum was used to provide 16K words of secondary stor-
set up manually by means of wires. In addition to these wires, age. The magnetic drum was the forerunner of today’s disk
the ENIAC operator had to manually set up to 6000 muti- drive. Instead of recording data on the flat platter found in a
position mechanical switches. Programming ENIAC was hard drive, data was stored on the surface of a rotating drum.
very time consuming and tedious. In the late 1940s the Whirlwind computer was produced
ENIAC did not support dynamic conditional operations at MIT for the US Air Force. This was the first computer
(e.g. IF . . . THEN). An operation could be repeated a fixed intended for real-time information processing. It employed
number of times by hard wiring the loop counter to an ferrite-core memory (the standard form of mainframe mem-
appropriate value. Because the ability to make a decision ory until the semiconductor integrated circuit came along in
depending on the value of a data element is vital to the the late 1960s). A ferrite core is a tiny bead of a magnetic mar-
operation of all computers, ENIAC was not a computer in tial that can be magnetized clockwise or counterclockwise to
today’s sense of the word. It was an electronic calculator. store a one or a zero. Ferrite core memory is no longer widely
used today, although the term remains in expressions such as
John von Neumann, EDVAC and IAS core dump, which means a printout of the contents of a region
The first US computer to use the stored program concept was of memory.
EDVAC (Electronic Discrete Variable Automatic Computer). One of the most important centers of early computer
EDVAC was designed by some of the same team that designed development in the 1940s was Manchester University in
12 Chapter 1 Introduction to computer hardware
England. In 1948 Tom Kilburn created a prototype computer the number of accesses to the slower main store. Cache
called the Manchester Baby. This was a demonstration memory has become one of the most important features of
machine that tested the concept of the stored program com- today’s high performance systems.
puter and the Williams store, which stored data on the surface In August 1980 IBM became the first major manufacturer
of a cathode ray tube. Some regard the Manchester Baby as to market a PC. IBM had been working on a PC since about
the world’s first true stored program computer. 1979 when it was becoming obvious that IBM’s market would
eventually start to come under threat from the PC manufac-
IBM’s place in computer history turers such as Apple and Commodore. IBM not only sold
No history of the computer can neglect the giant of the com- mainframes and personal computers—by the end of 1970s
puter world, IBM, which has had such an impact on the IBM had introduced the floppy disk, computerized super-
computer industry.Although IBM grew out of the Computing– market checkouts, and the first automatic teller machines.
Tabulating–Recording (C–T–R) Company founded in 1911,
its origin dates back to the 1880s. The C–T–R Company was 1.4.6 The birth of transistors, ICs, and
the result of a merger between the International Time
Recording (ITR) Company, the Computing Scale Company
microprocessors
of America, and Herman Hollerith’s Tabulating Machine Since the 1940s computer hardware has become smaller and
Company (founded in 1896). In 1914 Thomas J. Watson, faster. The power-hungry and unreliable vacuum tube was
Senior, left the National Cash Register Company to join the replaced by the smaller, reliable transistor in the 1950s. The
C–T–R company and soon became President. In 1917, a transistor plays the same role as a thermionic tube; the only
Canadian unit of the C–T–R company called International real difference is that a transistor switches a current flowing
Business Machines Co. Ltd was set up. Because this name was through a crystal rather than a beam of electrons flowing
so well suited to the C–T–R company’s role, they adopted it through a vacuum. The transistor was invented by William
for the whole organization in 1924. IBM bought Electromatic Shockley, John Bardeen, and Walter Brattain at AT&T’s Bell
Typewriters in 1933 and the first IBM electric typewriter was Lab in 1948.
marketed 2 years later. If you can put one transistor on a slice of silicon, you can
IBM’s first contact with computers was via its relationship put two or more transistors on the same piece of silicon. The
with Aiken at Harvard University. In 1948 Watson Senior at idea occurred to Jack St Clair Kilby at Texas Instruments
IBM gave the order to construct the Selective Sequence in 1958. Kilby built a working model and filed a patent
Control Computer. Although this was not a stored program early in 1959. In January of 1959, Robert Noyce at Fairchild
computer, it was IBM’s first step from the punched card Semiconductor was also thinking of the integrated circuit. He
tabulator to the computer. too applied for a patent and it was granted in 1961. Today,
Thomas. J. Watson, Junior, was responsible for building the both Noyce and Kilby are regarded as the joint inventors
Type 701 EDPM (Electronic Data Processing Machine) in of the IC.
1953 to convince his father that computers were not a threat
to IBM’s conventional business. The 700 series was successful The minicomputer era
and dominated the mainframe market for a decade. In 1956 The microprocessor was not directly derived from the main-
IBM launched a successor, the 704, which was the world’s first frame computer. Between the mainframe and the micro-
supercomputer. The 704 was largely designed by Gene processor lies the minicomputer, a cut-down version of the
Amdahl who later founded his own supercomputer company mainframe, which appeared in the 1960s. By the 1960s many
in the 1990s. departments of computer science could afford their own
IBM’s most important mainframe was the System/360, minicomputers and a whole generation of students learned
which was first delivered in 1965. The importance of the computer science from PDP-11s and NOVAs in the 1960s and
32-bit System/360 is that it was a member of a series of com- 1970s. Some of these minicomputers were used in real-time
puters, each with the same architecture (i.e. programming applications (i.e. applications in which the computer has to
model) but with different performance; for example, the respond to changes in its inputs within a specified time).
System/360 model 91 was 300 times faster than the model 20. One of the first minicomputers was Digital Equipment
IBM developed a common operating system, OS/360, for their Corporation’s PDP-5, introduced in 1964. This was followed
series. Other manufactures built their own computers that by the PDP-8, in 1966 and the very successful PDP-11, in
were compatible with System/360 and thereby began the slow 1969. Even the PDP-11 would be regarded as a very basic
process towards standardization in the computer industry. machine by today’s standards. Digital Equipment built on
In 1960 the Series/360 model 85 became the first computer their success with the PDP-11 series and introduced their
to implement cache memory. Cache memory keeps a copy of VAX architecture in 1978 with the VAX-11/780, which
frequently used data in very high-speed memory to reduce dominated the minicomputer world in the 1980s. The VAX
1.4 History of computing 13
range was replaced by the 64-bit Alpha architecture (a high- only more powerful than their predecessors in terms of the
performance microprocessor) in 1991. The Digital Equipment speed at which they could execute instructions, they were also
Corporation, renamed Digital, was taken over by Compaq more sophisticated in terms of the facilities they offered. Intel
in 1998. took the core of their 8080 microprocessor and converted it
from an 8-bit into a 16-bit machine, the 8086. Motorola did
Microprocessor and the PC not extend their 8-bit 6800 to create a 16-bit processor.
Credit for creating the world’s first microprocessor, the 4040, Instead, they started again and did not attempt to achieve
goes to Ted Hoff and Fagin at Intel. Three engineers from either object or source code compatibility with earlier
Japan worked with Hoff to implement a calculator’s digital processors. By beginning with a clean slate, Motorola was
logic circuits in silicon. Hoff developed a general purpose able to create a 32-bit microprocessor with an exceptionally
computer that could be programmed to carry out calculator clean architecture in 1979.
functions. Towards the end of 1969 the structure of a pro- Several PC manufacturers adopted the 68K; Apple used it
grammable calculator had emerged. The 4004 used about in the Macintosh and it was incorporated in the Atari and
2300 transistors and is considered the first general purpose Amiga computers. All three of these computers were regarded
programmable microprocessor, even though it was only a as technically competent and had many very enthusiastic
4-bit device. followers. The Macintosh was sold as a relatively high-priced
The 4004 was rapidly followed by the 8-bit 8008 micro- black box with the computer, software, and peripherals from a
processor, which was originally intended for a CRT applica- single source. This approach could not compete with the IBM
tion. By using some of the production techniques developed PC, launched in 1981, with an open system architecture that
for the 4004, Intel was able to manufacture the 8008 as early allowed the user to purchase hardware and software from the
as March 1972. The 8008 was soon replaced by a better supplier with the best price. The Atari and Amiga computers
version, the first really popular general purpose 8-bit micro- suffered because they had the air of the games machine.
processor, the 8080 (in production in early 1974). Shortly Although the Commodore Amiga in 1985 had many of the
after the 8080 went into production, Motorola created its hallmarks of a modern multimedia machine, it was derided
own competitor, the 8-bit 6800. as a games machine because few then grasped the importance
Six months after the 8008 was introduced, the first ready- of advanced graphics and high-quality sound.
made computer based on the 8008, the Micral, was designed The 68K developed into the 68020, 68030, 68040, and
and built in France. The term microcomputer was coined to 68060. Versions were developed for the embedded processor
refer to the Micral, although the Micral was not successful in market and Motorola played no further role in the PC market
the USA. In January 1975 Popular Electronics magazine pub- until Apple adopted Motorola’s PowerPC processor. The
lished an article on microcomputer design by Ed Roberts PowerPC came from IBM and was not a descendent of the
who had a small company called MITS. Roberts’ computer 68K family.
was called Altair and was constructed from a kit. Many fell in love with the Apple Mac. It was a sophisticated
Although the Altair was intended for hobbyists, it had a and powerful PC, but not a great commercial success. Apple’s
significant impact and sold 2000 kits in its first year. In March commercial failure demonstrates that those in the semi-
1976, Steve Wozniak and Steve Jobs designed a 6502-based conductor industry must realize that commercial factors
computer, which they called the Apple 1. A year later in 1977 are every bit as important as architectural excellence and
they created the Apple II with 16 kbytes of ROM, 4 kbytes of performance. Apple failed because their processor, from
RAM, and a color display and keyboard. Although unsoph- hardware to operating system, was proprietary. Apple didn’t
isticated, this was the first practical PC. publish detailed hardware specifications or license their BIOS
As microprocessor technology improved, it became pos- and operating system. IBM adopted open standards and
sible to put more and more transistors on larger and larger anyone could build a copy of the IBM PC. Hundreds of
chips of silicon. Microprocessors of the early 1980s were not manufacturers started producing parts of PCs and an entire
14 Chapter 1 Introduction to computer hardware
industry sprang up. You could buy a basic system from one high-performance hardware to process the signals in real-
place, a hard disk from another, and a graphics card from yet time. The MP3 player requires a high-speed data link to
another supplier. By publishing standards for the PC’s bus, download music from the Internet.
anyone could create a peripheral for the PC. What IBM lost in The demand for increasing reality in video games and real-
the form of increased competition, they more than made up time image processing has spurred development in special-
for in the rapidly expanding market. IBM’s open standard purpose video subsystems. Video processing requires the
provided an incentive for software writers to generate soft- ability to render images, which means drawing vast numbers
ware for the PC market. of polygons on the screen and filling them with a uniform
The sheer volume of PCs and their interfaces (plus the color. The more polygons used to compose an image, the
software base) pushed PC prices down and down. The Apple more accurate the rendition of the image.
was perceived as over-priced. Even though Apple adopted The effect of the multimedia revolution had led to the com-
the PowerPC, it was too late and Apple’s role in the PC world moditization of the PC, which is now just another commodity
was marginalized. However by 2005, cut-throat competition like a television or a stereo player. Equally, the growth of multi-
from PC manufacturers was forcing IBM to abandon its PC media has forced the development of higher speed processors,
business, whereas Apple was flourishing in a niche market low-cost high-density memory systems, multimedia-aware
that rewarded style. operating systems, data communications, and new processor
A major change in direction in computer architecture took architectures.
place in the 1980s when the RISC or Reduced Instruction Set
Computer first appeared. Some observers expected the RISC to The Internet revolution
sweep away all CISC processors like the 8086 and 68K families. Just as the computer itself was the result of a number of inde-
It was the work carried out by David Paterson at the pendent developments (the need for automated calculation,
University of Berkley in the early 1980s that brought the RISC the theoretical development of computer science, the
philosophy to a wider audience. Paterson was also respons- enabling technologies of communications and electronics,
ible for coining the term ‘RISC’ in 1980. The Berkeley RISC the keyboard and data processing industries), the Internet
was constructed at a university (like many of the first main- was the fruit of a number of separate developments.
frames such as EDSAC) and required only a very tiny fraction The principal ingredients of the Internet are communica-
of the resources consumed by these early mainframes. Indeed, tions, protocols, and hypertext. Communications systems
the Berkeley RISC is hardly more than an extended graduate have been developed throughout human history as we have
project. It took about a year to design and fabricate the RISC I already pointed out when discussing the enabling technology
in silicon. By 1983 the Berkeley RISC II had been produced behind the computer. The USA’s Department of Defense cre-
and that proved to be both a testing ground for RISC ideas ated a scientific organization, ARPA (Advanced Research
and the start of a new industry. Many of the principles of Projects Agency) in 1958 at the height of the Cold War. ARPA
RISC design were later incorporated in Intel’s processors. had some of the characteristics of the Manhattan project,
which had preceded it during the Second World War. A large
1.4.7 Mass computing and the rise of group of talented scientists was assembled to work on a pro-
ject of national importance. From its early days ARPA con-
the Internet centrated on computer technology and communications
The Internet and digital multimedia have driven the evolu- systems; moreover, ARPA was moved into the academic area
tion of the PC. The Internet provides interconnectivity and which meant that it had a rather different ethos from that of
the digital revolution has extended into sound and vision. the commercial world because academics cooperate and
The cassette-based personal stereo system has been displaced share information.
by the minidisk and the MP3 players with solid state memory. One of the reasons why ARPA concentrated on networking
The DVD with its ability to store an entire movie on a single was the fear that a future war involving nuclear weapons
disk first became available in 1996 and by 1998 over one would begin with an attack on communications centers lim-
million DVD players had been sold in the USA. The digital iting the capacity to respond in a coordinated manner. By
video camera that once belonged to the world of the profes- networking computers and ensuring that a message can take
sional filmmaker is now available to anyone with a modest many paths through the network to get from its source to its
income. destination, the network can be made robust and able to cope
All these applications have had a profound effect on the with the loss of some of its links of switching centers.
computer world. Digital video requires vast amounts of stor- In 1969 ARPA began to construct a testbed for networking,
age. Within 5 years, low-cost hard disk capacities grew from a system that linked four nodes: University of California at
about 1 Gbyte to 400 Gbytes or more. The DVD uses very Los Angeles, SRI (in Stanford), University of California at
sophisticated signal processing techniques that require very Santa Barbara, and University of Utah. Data was sent in the
1.5 The digital computer 15
form of individual packets or frames rather than as complete 1.5.1 The PC and workstation
end-to-end messages. In 1972 ARPA was renamed DARPA
(Defense Advances Research Projects Agency). The 1980s witnessed two significant changes in computing—
In 1973 the TCP/IP (transmission control protocol/Internet the introduction of the PC and the workstation. PCs bring
protocol) was developed at Stanford; this is the set of rules that computing power to people in offices and in their own
govern the routing of a packet through a computer network. homes. Although primitive PCs have been around since the
Another important step on the way to the Internet was Robert mid 1970s, the IBM PC and Apple Macintosh transformed
Metcalfe’s development of the Ethernet, which enabled com- the PC from an enthusiast’s toy into a useful tool. Software
puters to communicate with each other over a local area net- such as word processors, databases, and spreadsheets revolu-
work based on a low-cost cable. The Ethernet made it possible tionized the office environment, just as computer-aided
to link computers in a university together and the ARPANET design packages revolutionized the industrial design envir-
allowed the universities to be linked together. Ethernet was, onment. Today’s engineer can design a circuit and simulate
however, based on techniques developed during the construc- its behavior using one software package and then create a lay-
tion of the University of Hawaii’s radio-based packet-switching out for a printed circuit board (PCB) with another package.
ALOHAnet, another ARPA-funded project. Indeed, the output from the PCB design package may be
Up to 1983 ARPANET users had to use a numeric IP suitable for feeding directly into the machine that actually
address to access other users on the Internet. In 1983 the makes the PCBs.
University of Wisconsin created the Domain Name System In the third edition of this book in 1999 I said
(DNS), which routed packets to a domain name rather than Probably the most important application of the personal computer
an IP address. is in word processing . . . Today’s personal computers have immensely
The World’s largest community of physicists is at CERN in sophisticated word processing packages that create a professional-
Geneva. In 1990 Tim Berners-Lee implemented a hypertext- looking result and even include spelling and grammar checkers to
based system to provide information to other the members of remove embarrassing mistakes. When powerful personal computers
the high-energy physics community. This system was are coupled to laser printers, anyone can use desktop publishing
released by CERN in 1993 as the World-Wide Web (WWW). packages capable of creating manuscripts that were once the
province of the professional publisher.
In the same year, Marc Andreessen at the University of Illinois
developed a graphical user interface to the WWW, a browser Now, all that’s taken for granted. Today’s PCs can take video
called Mosaic. All that the Internet and the WWW had to do from your camcorder, edit it, add special effects, and then burn
now was to grow. it to a DVD that can be played on any home entertainment
system.
Although everyone is familiar with the PC, the concept of
1.5 The digital computer the workstation is less widely understood. A workstation can
be best thought of as a high-performance PC that employs
Before beginning the discussion of computer hardware state-of-the-art technology and is normally used in industry.
proper, we need to say what a computer is and to define a few Workstations have been produced by manufacturers such as
terms. If ever an award were to be given to those guilty of mis- Apollo, Sun, HP, Digital, Silicon Graphics, and Xerox. They
information in the field of computer science, it would go to share many of the characteristics of PCs and are used by
the creators of HAL in 2001, R2D2 in Star Wars, K9 in Doctor engineers or designers. When writing the third edition, I
Who, and Data in Star Trek. These fictional machines have stated that the biggest difference between workstations and
generated the popular myth that a computer is a reasonably PCs was in graphics and displays. This difference has all but
close approximation to a human brain, which stores an infinite vanished with the introduction of high-speed graphics cards
volume of data. and large LCD displays into the PC world.
The reality is a little more mundane. A computer is a
machine that takes in information from the outside world,
processes it according to some predetermined set of opera-
1.5.2 The computer as a data processor
tions, and delivers the processed information. This definition The early years of computing were dominated by the main-
of a computer is remarkably unhelpful, because it attempts to frame, which was largely used as a data processor. Figure 1.1
define the word computer in terms of the equally complex describes a computer designed to deal with the payroll of a
words information, operation, and process. Perhaps a better large factory. We will call the whole thing a computer, in
approach is to provide examples of what computers do by contrast with those who would say that the CPU (central
looking at the role of computers in data processing, numerical processing unit) is the computer and all the other devices
computation (popularly called number crunching), work- are peripherals. Inside the computer’s immediate access
stations, automatic control systems, and electronic systems. memory is a program, a collection of primitive machine-code
16 Chapter 1 Introduction to computer hardware
Display
Disk drives
Computer (central processing unit)
Plotter
Printer
Display
Line printer
Keyboard
Figure 1.1 The computer as a
Tape drive
data processor.
operations, whose purpose is to calculate an employee’s pay containing the relevant question. The keyboard can be used
based on the number of hours worked, the basic rate of pay, to modify the program itself so that new facilities may be
and the overtime rate. Of course, this program would also added as the system grows. Computers found in data process-
deal with tax and any other deductions. ing are often characterized by their large secondary stores and
Because the computer’s immediate access memory is relat- their extensive use of printers and terminals.
ively expensive, only enough is provided to hold the program
and the data it is currently processing. The mass of informa- 1.5.3 The computer as a numeric
tion on the employees is normally held in secondary store as
a disk file. Whenever the CPU requires information about a
processor
particular employee, the appropriate data is copied from the Numeric processing or number crunching refers to computer
disk and placed in the immediate access store. The time taken applications involving a very large volume of mathematical
to perform this operation is a small fraction of a second but is operations—sometimes billions of operations per job.
many times slower than reading from the immediate access Computers used in numeric processing applications are fre-
store. However, the cost of storing information on disk is very quently characterized by powerful and very expensive CPUs,
low indeed and this compensates for its relative slowness. very high-speed memories, and relatively modest quantities
The tape transport stores data more cheaply than the disk of input/output devices and secondary storage. Some super-
(tape is called tertiary storage). Data on the disks is copied computers are constructed from large arrays of microproces-
onto tape periodically and the tapes stored in the basement sors operating in parallel.
for security reasons. Every so often the system is said to crash Most of the applications of numeric processing are best
and everything grinds to a halt. The last tape dump can be described as scientific. For example, consider the application
reloaded and the system assumes the state it was in a short of computers to the modeling of the processes governing the
time before the crash. Incidentally, the term crash had the weather. The atmosphere is a continuous, three-dimensional
original meaning of a failure resulting from a read/write head medium composed of molecules of different gases. The sci-
in a disk drive crashing into the rotating surface of a disk and entist can’t easily deal with a continuous medium, but can
physically damaging the magnetic coating on its surface. make the problem more tractable by considering the atmo-
The terminals (i.e. keyboard and display) allow operators sphere to be composed of a very large number of cubes. Each
to enter data directly into the system. This information could of these cubes is considered to have a uniform temperature,
be the number of hours an employee has worked in the cur- density, and pressure. That is, the gas making up a cube shows
rent week. The terminal can also be used to ask specific ques- no variation whatsoever in its physical properties. Variations
tions, such as ‘How much tax did Mr XYZ pay in November?’ exist only between adjacent cubes. A cube has six faces and
To be a little more precise, the keyboard doesn’t actually ask the scientist can create a model of how the cube interacts with
questions but it allows the programmer to execute a program each of its six immediate neighbors.
1.5 The digital computer 17
Today the mechanical devices that display height, speed, the late 1990s. GPS provides another interesting application
engine performance, and the altitude of the aircraft are being of the computer as a component in an electronic system. The
replaced by electronic displays controlled by microcomputers. principles governing GPS are very simple. A satellite in
These displays are based on the cathode ray tube or LED, medium Earth orbit at 20 200 km contains a very accurate
hence the expression ‘glass cockpit’. Electronic displays are atomic clock and it broadcasts both the time and its position.
easier to read and more reliable than their mechanical coun- Suppose you pick up the radio signal from one of these
terparts, but they provide only the information required by Navstar satellites, decode it, and compare the reported time
the flight crew at any instant. with your watch. You may notice that the time from the satel-
Figure 1.4 illustrates an aircraft display that combines a lite is inaccurate. That doesn’t mean that the US military has
radar image of clouds together with navigational informa- wasted its tax dollars on faulty atomic clocks, but that the sig-
tion. In this example the pilot can see that the aircraft is nal has been traveling through space before it reaches you.
routed from radio beacon WCO to BKP to BED and will miss Because the speed of light is 300 000 km/s, you know that the
the area of storm activity. Interestingly enough, this type of satellite must 20 000 km away. Every point that is 20 000 km
indicator has been accused of deskilling pilots, because they from the satellite falls on the surface of a sphere whose center
no longer have to create their own mental image of the posi- is the satellite.
tion of their aircraft with respect to the World from much If you perform the same operation with a second satellite,
cruder instruments. you know that you are on the surface of another sphere.
In the 1970s the USA planned a military navigation system These two spheres must intersect. Three-dimensional geo-
based on satellite technology called GPS (global positioning metry tells us that the points at which two spheres merge is
system), which became fully operational in the 1990s. The a ring. If you receive signals from three satellites, the three
civilian use of this military technology turned out to be spheres intersect at just two points. One of these points is
one of the most important and unexpected growth areas in normally located under the surface of the Earth and can be
1.6 The stored program computer—an overview 19
Memory
Input 0 Get [4]
Data 1 Add it to [5]
Central
2 Put result in [6] Instruction to
processing be executed
Output unit 3 Stop
Program Address 4 2
(i.e. a location 5 7
in the memory)
6 1 Data element
in memory
Figure 1.7 Structure of the general purpose digital computer. 7
information as it goes along. When we say information we Figure 1.8 The program and data in memory.
Throughout this book square brackets denote ‘the contents of’
mean the data and the instructions held inside the computer.
so that in this figure, [4] is read as the contents of memory
Figure 1.7 shows two information-carrying paths connecting
location number 4 and is equal to 2.
the CPU to its memory. The lower path with the single
arrowhead from the memory to the CPU (heavily shaded in
Fig. 1.7) indicates the route taken by the computer’s program. Information is accessed from our memories by applying a key
The CPU reads the sequence of commands that make up a to all locations within the memory (brain). This key is related
program one by one from its memory. to the data being accessed (in some way) and is not related to
The upper path (lightly shaded in Fig. 1.7) with arrows at its location within the brain. Any memory locations contain-
both its ends transfers data between the CPU and memory. ing information that associates with the key respond to the
The program controls the flow of information along the data access. In other words, the brain carries out a parallel search
path. This data path is bidirectional, because data can flow in of its memory for the information it requires.
two directions. During a write cycle data generated by the Accessing many memory locations in parallel permits
program flows from the CPU to the memory where it is more than one location to respond to the access and is there-
stored for later use. During a read cycle the CPU requests the fore very efficient. Suppose someone says ‘chip’ to you. The
retrieval of a data item from memory, which is transferred word chip is the key that is fed to all parts of your memory for
from the memory to the CPU. matching.Your brain might produce responses of chip (silicon),
Suppose the instruction x y z is stored in memory. chip (potato), chip (on shoulder), and chip (gambling).
The CPU must first fetch the instruction from memory and The program in Fig. 1.8 occupies consecutive memory
bring it to the CPU. Once the CPU has analyzed or decoded locations 0–3 and the data locations 4–6. The first instruc-
the instruction it has to get the values of y and z from memory. tion, get [4], means fetch the contents of memory location num-
The CPU adds these values and sends the result, x, back to ber 4 from the memory. We employ square brackets to denote
memory for storage. the contents of the address they enclose, so that in this
Figure 1.8 demonstrates how the instructions making up a case [4] 2. The next instruction, at address 1, is add it to [5]
program and data coexist in the same memory. In this case and means add the number brought by the previous instruction
the memory has seven locations, numbered from 0 to 7. to the contents of location 5. Thus, the computer adds 2 and
Memory is normally regarded as an array of storage locations 7 to get 9. The third instruction, put result in [6], tells
(boxes or pigeonholes). Each of these boxes has a unique the computer to put the result (i.e. 9) in location 6. The 1 that
location or address containing data. For example, in the was in location 6 before this instruction was obeyed is
simple memory of Fig. 1.8, address 5 contains the number 7. replaced by 9. The final instruction in location 3 tells the
One difference between computers and people is that we computer to stop.
number m items from 1 to m, whereas the computer numbers We can summarize the operation of a digital computer by
them from 0 to m 1. This is because the computer regards 0 means of a little piece of pseudocode (pseudocode is a method
(zero) as a valid identifier. Unfortunately, people often of writing down an algorithm in a language that is a cross
confuse 0 the identifier with 0 meaning nothing. between a computer language such as C, Pascal, or Java and
Information in a computer’s memory is accessed by pro- plain English). We shall meet pseudocode again.
viding the memory with the address (i.e. location) of the
desired data. Only one memory location is addressed at a
time. If we wish to search through memory for a particular
item because we don’t know its address, we have to read the
items one at a time until we find the desired item. It appears
that the human memory works in a very different way.
22 Chapter 1 Introduction to computer hardware
CPU slot
Video
slot
1.6 Do you think that a digital computer could ever be capable At the same time we live in a world in which many of its
of feelings, free will, original thought, and self-awareness in a inhabitants go short of the very basic necessities of life: water,
similar fashion to humans? If not, why not? food, shelter, and elementary health care. Does the computer
make a positive contribution to the future well-being of the
1.7 Some of the current high-performance civil aircraft such as
World’s inhabitants? Is the answer the same if we ask about the
the A320 AirBus have fly-by-wire control systems. In a
computer’s short-term effects or its long-term effects?
conventional aircraft, the pilot moves a yoke that provides
control inputs that are fed to the flying control surfaces and 1.12 The workstation makes it possible to design and to test
engines by mechanical linkages or hydraulic means. In the A320 (by simulation) everything from other computers to large
the pilot moves the type of joystick normally associated with mechanical structures. Coupled with computer communications
computer games. The pilot’s commands from the joystick (called networks and computer-aided manufacturing, it could be
a sidestick) are fed to a computer and the computer interprets argued that many people in technologically advanced societies
them and carries them out in the fashion it determines is most will be able to work entirely from home. Indeed, all their
appropriate. For example, if the pilot tries to increase the speed shopping and banking activities can also be performed from
to a level at which the airframe might be overstressed, the home. Do you think that this step will be advantageous or
computer will refuse to obey the command. Some pilots and disadvantageous? What will be the effects on society of a
some members of the public are unhappy about this population that can, largely, work from home?
arrangement. Are their fears rational?
1.13 In a von Neumann machine, programs and data share the
1.8 The computer has often been referred to as a high-speed same memory. The operation ‘get [4]’ reads the contents of
moron. Is this statement fair? memory location number 4 and you can then operate on the
1.9 Computers use binary arithmetic (i.e. all numbers are number you’ve just read from this location. However, the
composed of 1s and 0s) to carry out their operations. Humans contents of this location may not be a number. It may be an
normally use decimal arithmetic (0–9) and have symbolic instruction itself. Consequently, a program in a von Neumann
means of representing information (e.g. the Latin alphabet or machine can modify itself. Can you think of any implications
the Chinese characters). Does this imply a fundamental this statement has for computing?
difference between people and computers? 1.14 When discussing the performance of computers we
1.10 Shortly after the introduction of the computer, someone introduced the benchmark, a synthetic program whose
said that two computers could undertake all the computing in execution time provides a figure of merit for the performance of
the World. At that time the best computers were no more a computer. If you glance at any popular computer magazine,
powerful than today’s pocket calculators. The commentator you’ll find computers compared in terms of benchmarks.
assumed that computers would be used to solve a few scientific Furthermore, there are several different benchmarks. A computer
problems and little else. As the cost and size of computers has that performs better than others when executing one
been reduced, the role of computers has increased. Is there a benchmark might not do so well when executing a different
limit to the applications of computers? Do you anticipate any benchmark. What are the flaws in benchmarks as a test of
radically new applications of computers? performance and why do you think that some benchmarks favor
one computer more than another?
1.11 A microprocessor manufacturer, at the release of their new
super chip, was asked the question, ‘What can your 1.15 The von Neumann digital computer offers just one
microprocessor do?’ He said it was now possible to put it in computing paradigm. Other paradigms are provided by analog
washing machines so that the user could tell the machine what computers and neural networks. What are the differences
to do verbally, rather than by adjusting the settings manually. between these paradigms and are there others?
Gates, circuits, and
combinational logic 2
CHAPTER MAP
2 Logic elements and 3 Sequential logic 4 Computer arithmetic 5 The instruction set
Boolean algebra We can classify logic circuits into In Chapter 4 we demonstrate architecture
Digital computers are two groups: the combinational how numbers are represented in In Chapter 5 we introduce the
constructed from millions of very circuit we described in Chapter 2 binary form and look at binary computer’s instruction set
simple logic elements called and the sequential circuit which arithmetic. We also demonstrate architecture (ISA), which defines
gates. In this chapter we forms the subject of this chapter. how the properties of binary the machine-level programmer’s
introduce the fundamental gates A sequential circuit includes numbers are exploited to create view of the computer. The ISA
and demonstrate how they can memory elements and its current codes that compress data or describe the type of operations a
be combined to create circuits behavior is governed by its past even detect and correct errors. computer carries out. We are
that carry out the basic functions inputs. Typical sequential circuits interested in three aspects of the
required in a computer. are counters and registers. ISA: the nature of the
instructions, the resources used
by the instructions (registers and
memory), and the ways in which
the instructions access data
(addressing modes).
INTRODUCTION
We begin our study of the digital computer by investigating the elements from which it is
constructed. These circuit elements are gates and flip-flops and are also known as combinational
and sequential logic elements, respectively. A combinational logic element is a circuit whose
output depends only on its current inputs, whereas the output from a sequential element
depends on its past history (i.e. a sequential element remembers its previous inputs) as well as
its current input. We describe combinational logic in this chapter and devote the next chapter to
sequential logic.
Before we introduce the gate, we highlight the difference between digital and analog systems
and explain why computers are constructed from digital logic circuits. After describing the
properties of several basic gates we demonstrate how a few gates can be connected together to
carry out useful functions in the same way that bricks can be put together to build a house or a
school. We include a Windows-based simulator that lets you construct complex circuits and then
examine their behavior on a PC.
The behavior of digital circuits can be described in terms of a formal notation called Boolean
algebra. We include an introduction to Boolean algebra because it allows you to analyze circuits
containing gates and sometimes enables circuits to be constructed in a simpler form. Boolean
algebra leads on to Karnaugh maps, a graphical technique for the simplification and manipulation
of Boolean equations.
The last circuit element we introduce is the tri-state gate, which allows you to connect lots of
separate digital circuits together by means of a common highway called a bus. A digital computer
is composed of nothing more than digital circuits, buses, and sequential logic elements.
By the end of this chapter, you should be able to design a wide range of circuits that can
perform operations as diverse as selecting between one of several signals to implementing simple
arithmetic operations.
Real circuits can fail. The final part of this chapter takes a brief look at how you test digital
circuits.
26 Chapter 2 Gates, circuits, and combinational logic
hole? Such a system would require extremely precise 2.2.1 The AND gate
electronics.
A single binary digit is known as a bit (BInary digiT) and is The AND gate is a circuit with two or more inputs and a sin-
the smallest unit of information possible; that is, a bit can’t be gle output. The output of an AND gate is true if and only if
subdivided into smaller units. Ideally, if a computer runs off, each of its inputs is also in a true state. Conversely, if one or
say, 3 V, a low level would be represented by 0.0 V and a high more of the inputs to the AND gate is false, the output will
level by 3.0 V. also be false. Figure 2.3 provides the circuit symbol for both a
two-input AND gate and a three-input AND gate. Note that
the shape of the gate indicates its AND function (this will
become clearer when we introduce the OR gate).
2.2 Fundamental gates An AND gate is visualized in terms of an electric circuit or
a highway as illustrated in Fig. 2.4. Electric current (or traffic)
The digital computer consists of nothing more than the inter-
flows along the circuit (road) only if switches (bridges) A and
connection of three types of primitive elements called AND,
B are closed. The logical symbol for the AND operator is a
OR, and NOT gates. Other gates called NAND, NOR, and
dot, so that A AND B can be written A ⋅ B. As in normal alge-
EOR gates can be derived from these gates. We shall see that
bra, the dot is often omitted and A ⋅ B can be written AB. The
all digital circuits, may be designed from the appropriate
logical AND operator behaves like the multiplier operator in
interconnection of NAND (or NOR) gates alone. In other
conventional algebra; for example, the expression (A B) ⋅
words, the most complex digital computer can be reduced
(C D) A ⋅ C A ⋅ D B ⋅ C B ⋅ D in both Boolean
to a mass of NAND gates. This statement doesn’t devalue
and conventional algebra.
the computer any more than saying that the human brain is
just a lot of neurons joined in a particularly complex way
devalues the brain. A A
C = A.B B C = A.B.C
We don’t use gates to build computers because we like B C
them or because Boolean algebra is great fun. We use gates
because they provide a way of mass producing cheap and (a) Two-input AND gate (b) Three-input AND gate
reliable digital computers.
Figure 2.3 The AND gate.
WHAT IS A GATE?
The word gate conveys the idea of a two-state device—open a gate by means of an example from the analog world.
or shut. A gate may be thought of as a black box with one or Consider the algebraic expression y F(x) 2x2 x 1. If
more input terminals and an output terminal. The gate we think of x as the input to a black box and y its output, the
processes the digital signals at its input terminals to produce a block diagram demonstrates how y is generated by a sequence
digital signal at its output terminal. The particular type of the of operations on x. The operations performed on the input are
gate determines the actual processing involved. The output C those of addition, multiplication, and squaring. Variable x
of a gate with two input terminals A and B can be expressed in enters the ‘squarer’ and comes out as x2. The output from the
conventional algebra as C F (A,B), where A, B, and C are two- squarer enters a multiplier (along with the constant 2) and
valued variables and F is a logical function. comes out as 2x2, and so on. By applying all the operations to
The output of a gate is a function only of its inputs. When input x, we end up with output 2x2 x 1. The boxes
we introduce the sequential circuit, we will discover that the carrying out these operations are entirely analogous to gates
sequential circuit’s output depends on its previous output as in the digital world—except that gates don’t do anything as
well as its current inputs. We can demonstrate the concept of complicated as addition or multiplication.
Input 2
x
Adder 2x 2 + x Adder 2x 2 + x +1 Output
+ + y
1
2.2 Fundamental gates 29
CIRCUIT CONVENTIONS
Because we write from left to right, many logic circuits are also P
read from left to right; that is, information flows from left to
right with the inputs of gates on the left and the outputs on X
the right.
These two lines are connected
Because a circuit often contains many signal paths, some of
these paths may have to cross over each other when the
diagram is drawn on two-dimensional paper. We need a means Y
of distinguishing between wires that join and wires that
simply cross each other (rather like highways that merge and These two lines are not connected
and cross over at this point
highways that fly over each other). The standard procedure is
to regard two lines that simply cross as not being connected as
the diagram illustrates. The connection of two lines is denoted A corollary of the statement that the same logic state exists
by a dot at their intersection. everywhere on a conductor is that a line must not be
The voltage at any point along a conductor is constant and connected to the output of more than one circuit—otherwise
therefore the logical state is the same everywhere on the line. the state of the line will be undefined if the outputs differ. At
If a line is connected to the input of several gates, the input to the end of this chapter we will introduce gates with special
each gate is the same. In this diagram, the value of X and P tri-state outputs that can be connected together without
must be the same because the two lines are connected. causing havoc.
word A
word B Switch A
C = A·B
A
In this example the result C A ⋅ B is given by 01000100.
Why should anyone want to AND together two words? If you The circuit is
complete if either
AND bit x with 1, the result is x (because Table 2.2 demon- B switch A or B
strates that 1.0 0 and 1.1 1). If you AND bit x with 0 the is closed
result is 0 (because the output of an AND gate is true only if
Switch B
both inputs are true). Consequently, a logical AND is used to
mask certain bits in a word by forcing them to zero. For Figure 2.6 The representation of an OR gate.
example, if we wish to clear the leftmost four bits of an 8-bit
word to zero, ANDing the word with 00001111 will do the
trick. The following example demonstrates the effect of an The use of the term OR here is rather different from the
AND operation with a 00001111 mask. English usage of or. The Boolean OR means (either A or B) or
(both A and B), whereas the English usage often means A or
source word
mask
B but not (A and B). For example, consider the contrasting
result use of the word or in the two phrases: ‘Would you like tea
or coffee?’ and ‘Reduced fees are charged to members who
are registered students or under 25’. We shall see that the
2.2.2 The OR gate more common English use of the word or corresponds to
the Boolean function known as the EXCLUSIVE OR, an
The output of an OR gate is true if any one (or more than important function that is frequently abbreviated to EOR
one) of its inputs is true. Notice the difference between AND or XOR.
and OR operations. The output of an AND is true only if all A computer can also perform a logical OR on words as the
inputs are true whereas the output of an OR is true if at least following example illustrates.
one input is true. The circuit symbol for a two-input and a
three-input OR gate is given in Fig. 2.5. The logical symbol word A
for an OR operation is an addition sign, so that the logical word B
operation A OR B is written as A B. The logical OR opera- C=A+B
tor is the same as the conventional addition symbol because The logical OR operation is used to set one or more bits in
the OR operator behaves like the addition operator in algebra a word to a logical 1. The term set means make a logical one,
(the reasons for this will become clear when we introduce just as clear means reset to a logical zero. For example, the
Boolean algebra). Table 2.3 provides the truth table for a two- least-significant bit of a word is set by ORing it with 00 . . . 01.
input OR gate. By applying both AND and OR operations to a word we can
The behavior of an OR gate can be represented by the selectively clear or set its bits. Suppose we have an 8-bit binary
switching circuit of Fig. 2.6. A path exists from input to out- word and we wish to clear bits 6 and 7 and set bits 4 and 5. If
put if either of the two switches is closed. the bits of the word are d0 to d7, we can write:
A A d7 d6 d5 d4 d3 d2 d1 d0 Source word
C=A+B B D=A+B+C
B C 0 0 1 1 1 1 1 1 AND mask
0 0 d5 d4 d3 d2 d1 d0 First result
(a) Two-input OR gate. (b) Three-input OR gate.
0 0 1 1 0 0 0 0 OR mask
Figure 2.5 The OR gate. 0 0 1 1 d3 d2 d1 d0 Final result
2.2 Fundamental gates 31
2.2.3 The NOT gate normally closed so that they form a switch that is closed when
switch A is open.
The NOT gate is also called an inverter or a complementer and If switch A is closed, a current flows through the coil to
is a two-terminal device with a single input and a single out- generate a magnetic field that magnetizes the iron core. The
put. If the input of an inverter is X, its output is NOT X which contact on the iron strip is pulled toward the core, opening
–
is written X or X*. Figure 2.7 illustrates the symbol for an the contacts and breaking the circuit. In other words, closing
inverter and Table 2.4 provides its truth table. Some teachers switch A opens the relay’s switch and vice versa. The system in
–
vocalize X as ‘not X’ and others as ‘X not’. The inverter is the Fig. 2.8 behaves like a NOT gate. The relay is used by a com-
simplest of gates because the output is the opposite of puter to control external devices and is described further
the input. If the input is 1 the output is 0 and vice versa. By the when we deal with input and output devices.
way, the triangle in Fig. 2.7 doesn’t represent an inverter. Like both the AND and OR operations, the NOT function
The small circle at the output of the inverter indicates the can also be applied to words:
inversion operation. We shall see that this circle indicates
logical inversion wherever it appears in a circuit. word A
We can visualize the operation of the NOT gate in terms of B=A
the relay illustrated in Fig. 2.8. A relay is an electromechanical
switch (i.e. a device that is partially electronic and partially
mechanical) consisting of an iron core around which a coil of 2.2.4 The NAND and NOR gates
wire is wrapped. When a current flows through a coil, it gen- The two most widely used gates in real circuits are the NAND
erates a magnetic field that causes the iron core to act as a and NOR gates. These aren’t fundamental gates because the
magnet. Situated close to the iron core is a pair of contacts, NAND gate is derived from an AND gate followed by an
the lower of which is mounted on a springy strip of iron. If inverter (Not AND) and the NOR gate is derived from an OR
switch A is open, no current flows through the coil and the gate followed by an inverter (Not OR), respectively. The circuit
iron core remains unmagnetized. The relay’s contacts are symbols for the NAND and NOR gates are given in Fig. 2.9.
The little circle at the output of a NAND gate represents the
symbol for inversion or complementation. It is this circle that
A A converts the AND gate to a NAND gate and an OR gate to a
The output is NOR gate. Later, when we introduce the concept of mixed
the logical
complement of logic, we will discover that this circle can be applied to the
the input inputs of gates as well as to their outputs.
Table 2.5 gives the truth table for the NAND and the NOR
Figure 2.7 The NOT gate or inverter. gates. As you can see, the output columns in the NAND and
NOR tables are just the complements of the outputs in the
corresponding AND and OR tables.
Input Output
–
We can get a better feeling for the effect that different gates
A F=A have on two inputs, A and B, by putting all the gates together
0 1 in a single table (Table 2.6). We have also included the
1 0 EXCLUSIVE OR (i.e. EOR) and its complement the
EXCLUSIVE NOR (i.e. EXNOR) in Table 2.6 for reference.
Table 2.4 Truth table for the inverter. The EOR gate is derived from AND, OR, and NOT gates and
is described in more detail later in this chapter. It should be
Contacts
noted here that A·B is not the same as A·B, just as AB is
(switch A) not the same as AB.
A A A.B A
Iron strip Output C=A.B C = A.B
B B
A
Battery AND gate followed by an inverter NAND gate
Coil A A+B A
C=A+B C=A+B
B B
Figure 2.8 The operation of a relay. Figure 2.9 Circuit symbols for the NAND and NOR gates.
32 Chapter 2 Gates, circuits, and combinational logic
2.2.5 Positive, negative, and mixed logic Suppose we regard the low level as true and use negative
logic, Table 2.7 shows that we have an AND gate whose out-
At this point we introduce the concepts of positive logic, put is low if and only if each input is low. It should also be
negative logic, and mixed logic. Some readers may find that apparent that an AND gate in negative logic functions as an
this section interrupts their progress toward a better under- OR gate in positive logic. Similarly, a negative logic OR gate
standing of the gate and may therefore skip ahead to the next functions as an AND gate in positive logic. In other words, the
section. same gate is an AND gate in negative logic and an OR gate in
Up to now we have blurred the distinction between two positive logic. Figure 2.10 demonstrates the relationship
unconnected concepts. The first concept is the relationship between positive and negative logic gates.
between low/high voltages in a digital circuit, 0 and 1 logical For years engineers used the symbol for a positive logic
levels, and true/false logic values. The second concept is the AND gate in circuits using active-low signals with the result
logic function; for example, AND, OR, and NOT. So far, we that the reader was confused and could only understand the
have used positive logic in which a high-level signal represents
a logical one state and this state is called true.
Table 2.7 provides three views of the AND function. The Logical form Positive logic Negative logic
leftmost column provides the logical truth table in which the
output is true only if all inputs are true (we have used T and F A B A⋅B A B A⋅B A B A⋅B
to avoid reference to signal levels). The middle column
F F F 0 0 0 1 1 1
describes the AND function in positive logic form in which
F T F 0 1 0 1 0 1
the output is true (i.e. 1) only if all inputs are true (i.e. 1).
T F F 1 0 0 0 1 1
The right hand column in Table 2.7 uses negative logic in
which 0 is true and 1 is false. The output A ⋅ B is true (i.e. 0) T T T 1 1 1 0 0 0
only when both inputs are true (i.e. 0).
As far as digital circuits are concerned, there’s no funda- Table 2.7 Truth table for AND gate in positive and negative
mental difference between logical 1s and 0s and it’s as sensible logic forms.
to choose a logical 0 level as the true state as it is to choose a
logical 1 state. Indeed, many of the signals in real digital
systems are active-low which means that their function is A A
C C
carried out by a low-level signal. B B
Table 2.5 Truth table for the NAND and NOR gates. Figure 2.10 Positive and negative logic.
Inputs Output
A B AND A ⋅ B OR A⫹B NAND A.B NOR A B EOR A 䊝 B EXNOR A 䊝 B
0 0 0 0 1 1 0 1
0 1 0 1 1 0 1 0
1 0 0 1 1 0 1 0
1 1 1 1 0 0 0 1
A A G1 P
B B
Active-low output
D
Q
C G2 G4 F
C
Figure 2.12 Using mixed logic. R
G3
F is the output
P, Q, and R are
There is no physical difference between the circuits of A, B, and C are intermediate
inputs variables
Figs. 2.11(a) and 2.11(b). They are both ways of representing
the same thing. However, the meaning of the circuit in
Fig. 2.11(b) is clearer. Figure 2.13 The use of gates—Example 1.
Consider another example of mixed logic in which we use
both negative and positive logic concepts. Suppose a circuit is
activated by a low-level signal if input A is low and input B Inputs Intermediate values Output
high, or input D is high, or input C is low. Figure 2.12 shows A B C P A⋅B Q B⋅C R A⋅C FPQR
how we might draw such a circuit. For most of this book we
will continue to use positive logic. 0 0 0 0 0 0 0
0 0 1 0 0 0 0
0 1 0 0 0 0 0
2.3 Applications of gates 0 1 1 0 1 0 1
1 0 0 0 0 0 0
We now look at four simple circuits to demonstrate that a few 1 0 1 0 0 1 1
gates can be connected together in such a way as to create a 1 1 0 1 0 0 1
circuit whose function and importance may readily be appre- 1 1 1 1 1 1 1
ciated by the reader. Following this informal introduction to
circuits we introduce Digital Works, a Windows-based pro-
Table 2.8 Truth table for Fig. 2.13.
gram that lets you construct and simulate circuits containing
gates on a PC. We then return to gates and provide a more
formal section on the analysis of logic circuits by means of doesn’t help us a lot at this point. However, by visually
Boolean algebra. inspecting the truth table for F we can see that the output is
Circuits are constructed by connecting gates together. The true if two or more of the inputs A, B, and C, are true. That is,
output from one gate can be connected (i.e. wired) to the this circuit implements a majority logic function whose out-
input of one or more other gates. However, two outputs can- put takes the same value as the majority of inputs. We have
not be connected together. already seen how such a circuit is used in an automatic land-
ing system in an aircraft by choosing the output from three
Example 1 Consider the circuit of Fig. 2.13 that uses three
independent computers to be the best (i.e. majority) of three
two-input AND gates labeled G1, G2, and G3, and a three-
inputs. Using just four basic gates, we’ve constructed a circuit
input OR gate labeled G4. This circuit has three inputs A, B,
that does something useful.
and C, and an output F. What does it do?
We can tackle this problem in several ways. One approach Example 2 The circuit of Fig. 2.14 has three inputs, one out-
is to create a truth table that tabulates the output F for all the put, and three intermediate values (we’ve also included a
eight possible combinations of the three inputs A, B, and C. mixed logic version of this circuit on the right hand side of
Table 2.8 corresponds to the circuit of Fig. 2.13 and includes Fig. 2.14). By inspecting the truth table for this circuit
columns for the outputs of the three AND gates as well as the (Table 2.9) we can see that when the input X is 0, the output,
output of the OR gate, F. F, is equal to Y. Similarly, when X is 1, the output is equal to Z.
The three intermediate signals P, Q, and R are defined by The circuit of Fig. 2.14 behaves like an electronic switch, con-
P A ⋅ B, Q B ⋅ C, and R A ⋅ C. Figure 2.13 tells us that necting the output to one of two inputs, Y or Z, depending on
we can write down the output function, F, as the logical OR of the state of a control input X.
the three intermediate signals P, Q, and R; that is, The circuit of Fig. 2.14 is a two-input multiplexer that can
F P Q R. be represented by the arrangement of Fig. 2.15. Because the
We can substitute the expressions for P, Q, and R to get word multiplexer appears so often in electronics, it is
F A ⋅ B B ⋅ C A ⋅ C. This is a Boolean equation, but it frequently abbreviated to MUX.
2.3 Applications of gates 35
Y Q Y Q
P P
F F
R R
Z Z
Figure 2.14 The use of gates—
X X Mixed logic version
Example 2.
Inputs Intermediate values Output the set of inputs that cause the output to be true. In Table 2.9
the output is true when
X Y Z PX Q P ·Y R X ·Z F Q·R
(1) X 0, Y 1, Z 0 (X ·Y ·Z)
0 0 0 1 1 1 0
(2) X 0, Y 1, Z 1 (X ·Y ·Z)
0 0 1 1 1 1 0
0 1 0 1 0 1 1 (3) X 1, Y 0, Z 1 (X·Y ·Z)
0 1 1 1 0 1 1 (4) X 1, Y 1, Z 1 (X ·Y ·Z)
1 0 0 0 1 1 0
There are four possible combinations of inputs that make the
1 0 1 0 1 0 1
output true. Therefore, the output can be expressed as the
1 1 0 0 1 1 0 logical sum of the four cases (1)–(4) above; that is,
1 1 1 0 1 0 1
F X ·Y ·Z X·Y ·Z X·Y ·Z X·Y ·Z
Table 2.9 Truth table for Fig. 2.14. This function is true if any of the conditions (1)–(4) are
true. A function represented in this way is called a sum-of-
products (S-of-P) expression because it is the logical OR (i.e.
Input Y sum) of a group of terms each composed of several of vari-
Output F
Input Z
Electronic ables ANDed together (i.e. products). A sum-of-products
switch
expression represents one of the two standard ways of writing
down a Boolean expression.
An alternative way of writing a Boolean equation is called
Control input X
select Y or Z
a product-of-sums (P-of-S) expression and consists of several
terms ANDed together. The terms are made up of variables
Figure 2.15 The logical representation of Figure 2.14. ORed together. A typical product-of-sums expression has
the form
This equation demonstrates that a given Boolean function has two inputs, two intermediate values, and one output.
can be expressed in more than one way. Table 2.10 provides its truth table.
The multiplexer of Fig. 2.14 may seem a very long way from The circuit of Fig. 2.17 represents one of the most
computers and programming. However, multiplexers are important circuits in digital electronics, the exclusive or (also
found somewhere in every computer because computers oper- called EOR or XOR). The exclusive or corresponds to the
ate by modifying the flow of data within a system. A multi- normal English use of the word or (i.e. one or the other but
plexer allows one of two data streams to flow through a switch not both). The output of an EOR gate output is true if one of
that is electronically controlled. Let’s look at a highly simplified the inputs is true but not if both inputs are true.
example. The power of a digital computer (or a human brain) An EOR circuit always has two inputs (remember that
lies in its ability to make decisions. Decision taking in a com- AND and OR gates can have any number of inputs). Because
puter corresponds to the conditional branch; for example, the EOR function is so widely used, the EOR gate has its own
special circuit symbol (Fig. 2.18) and the EOR operator its
own special logical symbol ‘䊝’; for example, we can write
F A EOR B A 䊝 B
We can’t go into the details of how such a construct is imple-
mented here. What we would like to do is to demonstrate that The EOR is not a fundamental gate because it is constructed
something as simple as a multiplexer can implement some- from basic gates.
thing as sophisticated as a conditional branch. Consider Because the EOR gate is so important, we will discuss it a
the system of Fig. 2.16. Two numbers P and Q are fed to a little further. Table 2.10 demonstrates that F is true when
comparator where they are compared. If they are the same, A 0 and B 1, or when A 1 and B 0. Consequently,
the output of the comparator is 1 (otherwise it’s 0). The same the output F A ·B A·B. From the circuit in Fig. 2.17 we
output is used as the control input to a multiplexer that can write
selects between two values X and Y. In practice, such a system F P·Q
would be rather more complex (because P, Q, X, and Y are all
multi-bit values), but the basic principles are the same. PAB
Figure 2.17 The use of gates—Example 3. Table 2.10 Truth table for the circuit of Fig. 2.17.
2.3 Applications of gates 37
The EOR is a remarkably versatile logic element that pops allows us to build an equality tester that indicates whether or
up in many places in digital electronics. The output of an not two words are identical (Fig. 2.21).
EOR is true if its inputs are different and false if they are the In Fig. 2.21 two m-bit words (Word 1 and Word 2) are fed
same. As we’ve already stated, unlike the AND, OR, NAND to a bank of m EOR gates. Bit i from Word 1 is compared with
and NOR gates the EOR gate can have only two inputs. The bit i from Word 2 in the ith EOR gate. If these two bits are the
EOR gate’s ability to detect whether its inputs are the same same, the output of this EOR gate is zero.
If the two words in Fig. 2.21 are equal, the outputs of all
EORs are zero and we need to detect this condition in order
A to declare that Word 1 and Word 2 are identical. An AND gate
C=A⊕B will give a 1 output when all its inputs are 1. However, in this
B case, we have to detect the situation in which all inputs are 0.
We can therefore connect all m outputs from the m EOR gates
Figure 2.18 Circuit symbol for an EOR gate. to an m-input NOR gate (because the output of a NOR gate
is 1 if all inputs are 0).
If you look at Fig. 2.21 you can see that the outputs from
A A the EOR gates aren’t connected to a NOR gate but to an
G1 A.B
B m-input AND gate with inverting inputs. The little bubbles at
B
the AND gate’s inputs indicate inversion and are equivalent to
G3 NOT gates. When all inputs to the AND gate are active-low,
B F=A.B + A.B
the AND gate’s output will go active-high (exactly what we
A.B
A G2 want). In mixed logic we can regard an AND gate with active-
low inputs and an active-high output as a NOR gate.
Remember that we required an equality detector (i.e. com-
Figure 2.19 An alternative circuit for an EOR gate. parator) in Fig. 2.21 (Example 2) to control a multiplexer.
We’ve just built one.
Word 1
Bit m–1 Bit 1 Bit 0 Each EOR gate compares
a pair of bits
x0
Inputs Outputs
G1 y0
x4 x3 x2 x1 x0 y4 y3 y2 y1 y0
x1 0 0 0 0 0 0 0 0 0 0
G2 y1 0 0 0 0 1 0 0 0 0 1
0 0 0 1 0 0 0 0 1 0
0 0 0 1 1 0 0 0 1 0
x2 0 0 1 0 0 0 0 1 0 0
G3 y2
0 0 1 0 1 0 0 1 0 0
0 0 1 1 0 0 0 1 0 0
x3 0 0 1 1 1 0 0 1 0 0
y3 0 1 0 0 0 0 1 0 0 0
G4
0 1 0 0 1 0 1 0 0 0
0 1 0 1 0 0 1 0 0 0
x4 y4 0 1 0 1 1 0 1 0 0 0
0 1 1 0 0 0 1 0 0 0
Figure 2.22 Example 4—the priority circuit. 0 1 1 0 1 0 1 0 0 0
0 1 1 1 0 0 1 0 0 0
If two or more inputs are asserted simultaneously, only the 0 1 1 1 1 0 1 0 0 0
output corresponding to the input with the highest priority 1 0 0 0 0 1 0 0 0 0
is asserted. Computers use this type of circuit to deal with 1 0 0 0 1 1 0 0 0 0
simultaneous requests for service from several peripherals 1 0 0 1 0 1 0 0 0 0
(e.g. disk drives, the keyboard, the mouse, and the modem). 1 0 0 1 1 1 0 0 0 0
Consider the five-input prioritizer circuit in Fig. 2.22. The 1 0 1 0 0 1 0 0 0 0
prioritizer’s five inputs x0 to x4 are connected to the outputs
1 0 1 0 1 1 0 0 0 0
of five devices that can make a request for attention (input x4
1 0 1 1 0 1 0 0 0 0
has the highest priority). That is, device i can put a logical
1 on input xi to request attention at priority level i. If several 1 0 1 1 1 1 0 0 0 0
inputs are set to 1 at the same time, the prioritizer sets only 1 1 0 0 0 1 0 0 0 0
one of its outputs to 1, all the other outputs remain at 0. 1 1 0 0 1 1 0 0 0 0
For example, if the input is x4,x3,x2,x1,x0 00110, the output 1 1 0 1 0 1 0 0 0 0
y4,y3,y2,y1,y0 00100, because the highest level of input is x2. 1 1 0 1 1 1 0 0 0 0
Table 2.11 provides a truth table for this prioritizer. 1 1 1 0 0 1 0 0 0 0
If you examine the circuit of Fig. 2.22, you can see that out- 1 1 1 0 1 1 0 0 0 0
put y4 is equal to input x4 because there is a direct connection. 1 1 1 1 0 1 0 0 0 0
If x4 is 0, then y4 is 0; and if x4 is 1 then y4 is 1. The value of x4 1 1 1 1 1 1 0 0 0 0
is fed to the input of the AND gates G3, G2, and G1 in the
lower priority stages via an inverter. If x4 is 1, the logical level
at the inputs of the AND gates is 0, which disables them and Table 2.11 Truth table for the priority circuit of Fig. 2.22.
forces their outputs to 0. If x4 is 0, the value fed back to the
AND gates is 1 and therefore they are not disabled by x4.
Similarly, when x3 is 1, gates G3, G2 and G1 are disabled,
(c) Compare the circuit diagrams of P and Q in terms of speed
and so on.
and cost of implementation.
Example 5 Our final example looks at two different circuits
(a) The circuit diagram for P (X Y)(Y 䊝 Z) is given by
that do the same thing. This is a typical exam question.
Fig. 2.23 and the circuit diagram for Q Y ·Z X·Y·Z
(a) Using AND, OR, and NOT gates only, draw circuits to is give by Fig. 2.24.
generate P and Q from inputs X, Y, and Z, where (b) The truth table for functions P and Q is given in
P (X Y)(Y 䊝 Z) and Q Y ·Z X·Y·Z . Table 2.12 from which it can be seen that P Q.
(b) By means of a truth table establish a relationship between (c) We can compare the two circuits in terms of speed
P and Q. and cost.
2.3 Applications of gates 39
X
X+Y
Y
Y
P
Z YZ
Z YZ+ YZ
YZ
Y
Y YZ
Z
Q
Z XYZ+ YZ
Propagation delay The maximum delay in the circuit for Cost Total number of gates needed to implement P is 7.
P is four gates in series in the Y path (i.e. NOT gate, AND gate, Total number of gates needed to implement Q is 5. Total
OR gate, AND gate). The maximum delay in the circuit for inputs in the circuit for P is 12. Total inputs in the circuit for
Q is three gates in series in both Y and Z paths (i.e. NOT gate, Q is 9. Clearly, the circuit for Q is better than that for P both
AND gate, OR gate). Therefore the circuit for Q is 33% faster in terms of the number of gates and the number of inputs to
than that for P. the gates.
40 Chapter 2 Gates, circuits, and combinational logic
0 0 0 1 0 0 0 0 0
0 0 1 1 1 1 1 0 1
0 1 0 0 1 0 0 0 0
0 1 1 0 0 0 0 0 0
1 0 0 1 0 0 0 0 0
1 0 1 1 1 1 1 0 1
1 1 0 1 1 1 0 1 1
1 1 1 1 0 0 0 0 0
2.4 Introduction to Digital Works that Digital Works simulates both simple 1-bit storage
elements called flip-flops and larger memory components
We now introduce a Windows-based logic simulator called such as ROM and RAM.
Digital Works that enables you to construct a logic circuit After installing Digital Works on your system, you can run
from simple gates (AND, OR, NOT, NAND, NOR, EOR, it to get the initial screen shown in Fig. 2.25. We have anno-
XNOR) and to analyze the circuit’s behavior. Digital Works tated six of the most important icons on the toolbars. A cir-
also supports the tri-state logic gate that enables you to con- cuit is constructed by using the mouse to place gates on the
struct systems with buses. In the next chapter we will discover screen or workspace and a wiring tool to connect the gates
2.4 Introduction to Digital Works 41
together. The input to your circuit may come from a clock clicking at a suitable point in the workspace as Fig. 2.27
generator (a continuous series of alternating 1s and 0s), a demonstrates. If you hold the control key down when placing
sequence generator (a user-defined sequence of 1s and 0s), or a gate, you can place multiple copies of the gate in the work-
a manual input (from a switch that you can push by means of space. The OR gate is shown in broken outline because we’ve
the mouse). You can observe the output of a gate by connect- just placed it (i.e. it is currently selected). Once a gate has been
ing it to a display, LED. You can also send the output of the placed, you can select it with the mouse by clicking the left
LED to a window that displays either a waveform or a button and drag it wherever you want. You can click the right
sequence of binary digits. button to modify the gate’s attributes (e.g. the number of
Digital Works has been designed to be consistent with the inputs).
Windows philosophy and has a help function that provides You can tidy up the circuit by moving the gates within the
further information about its facilities and commands. The work area by left clicking a gate and dragging it to where you
File command in the top toolbar provides the options you want it. Figure 2.28 shows the work area after we’ve moved
would expect (e.g. load, save, save as). the gates to create a symmetrical layout. You can even drag
gates around the work area after they’ve been wired up and
reposition wires by left clicking and dragging any node
2.4.1 Creating a circuit (a node is a point on a wire that consists of multiple sections
We are going to design and test an EOR circuit that has the or links).
logic function A·B A·B. This function can be imple- Digital Works displays a grid to help you position the gates.
mented with two inverters, two AND gates, and an OR gate. The grid can be turned on or off and the spacing of the grid
Figure 2.26 shows three of the icons we are going to use to lines changed. Objects can be made to snap to the grid. These
create this circuit. The first icon is the new circuit icon that functions are accessed via the View command in the top line.
creates a fresh circuit (which Digital Works calls a macro). Before continuing, we need to save the circuit. Figure 2.29
The second icon is the pointer tool used to select a gate (or demonstrates how we use the conventional File function in
other element) from the toolbars. The third icon is a gate that the toolbar to save a circuit. We have called this circuit
can be planted in the work area. OUP_EOR1 and Digital Works inserts the extension .dwm.
Let’s start by planting some gates on the work area. The The next step is to wire up the gates to create a circuit. First
EOR requires two AND gates, an OR gate, and two inverters. select the wiring tool from the tool bars by left clicking on it
First click on the pointer tool on the bottom row of icons. If it (Fig. 2.30). Then position the cursor over the point at which
hasn’t already been selected, it will become depressed when you wish to connect a wire and left click. The cursor changes
you select it. The pointer tool remains selected until another to wire when it’s over a point that can legally be connected to.
tool is selected. Left click to attach a wire and move the cursor to the point
You select a gate from the list on the second row of icons by you wish to connect. Left click to create a connection. Instead
first left clicking on the gate with the pointer tool and then left of making a direct connection between two points, you can
The View
command is used
to set up the grid,
display it, or select
snap-to-grid. You can select a
gate by left clicking
on it and then drag
it to a suitable point
in the work area.
click on the workspace to create a node (i.e. the connection is straight-line segments. If you select the pointer tool and left
series if straight lines.) click on a wire, you can drag any of its nodes (i.e. the points
You can make the wiring look neat by clicking on interme- between segments on a line). If you right click on a wire you
diate points to create a signal path made up of a series of can delete it or change its color. Once a wire has been
2.4 Introduction to Digital Works 43
STEP 1 Click on
the output of the
gate we wish to
connect.
Click on the
wiring tool to
connect gates
together.
STEP 2 Click
on the input we
wish to
connect the
output to.
connected to another wire (or an input or output), the con- inputs and output we need points we can connect the wire to.
nection point can’t be moved. To move a connection you have In this case we are going to use the interactive input device
to delete the wire and connect a new one. to provide an input signal from a push button and the LED to
Digital Works permits a wire to be connected only between show the state of the output.
two legal connections. In Fig. 2.30 the inputs to the two In Fig. 2.31 we’ve added two interactive inputs and an LED
inverters and the circuit’s outputs aren’t connected anywhere. to the circuit. When we run the simulator, we can set the
This is because each wire must be connected between two states of the inputs to provide a 0 or a 1 and we can observe
points—it can’t just be left hanging. In order to wire up the the state of the output on the LED.
44 Chapter 2 Gates, circuits, and combinational logic
The interactive
tool allows you
to generate
digital inputs.
We can now wire up the inputs and the output and com- window. This brings down a text box. Enter the text and click
plete the rest of the wiring as shown in Fig. 2.32. At this stage ok to place it on the screen.
we could run the circuit if we wanted. However, we will use We also wish to label the circuit’s inputs and outputs.
the text tool (indicated by the letter A on the middle toolbar) Although you can use the text tool to add text at any point,
to give the circuit a title. Click on the A and then click on the input and output devices (e.g. clocks, switches, LEDs) can be
place at which you wish to add the text to open the text given names. We will use this latter technique because the
2.4 Introduction to Digital Works 45
names attached to input and output devices are automatically values 0,0, 0,1, 1,0, and 1,1 to verify that the circuit is an EOR
used to label the timing diagrams we will introduce later. (the output LED should display the sequence 0, 1, 1, 0).
Figure 2.33 shows the circuit with annotation. The label Just observing the outputs of the LEDs is not always
EOR circuit has been added by the text tool, and inputs A and enough to get a picture of the circuit’s behavior. We need a
B have been labeled by right clicking on the input devices. In record of the states of the inputs and outputs. Digital Works
Fig. 2.33 we have right clicked on the LED to bring down provides a Logic History function that records and displays
a menu and then selected Text to invoke the text box (not inputs and outputs during a simulator run. Any input or out-
shown). You enter the name of the output (in this case Sum) put device can be added to Logic History. If you select input
into the text box and click ok. This label is then appended to A with the pointer tool and then right click, you get a pull
the LED on the screen. You can change the location of the down menu from which you can activate the Add to Logic
label by right clicking on its name, selecting Text Style from History function to record the value of input A. When this
the menu, and then selecting the required position (Left, function is selected (denoted by a tick on the menu), all input
Right, Top, Bottom). is copied to a buffer (i.e. store). As we have two inputs, A and
B, we will have to assign them to the Logic History function
independently.
2.4.2 Running a simulation To record the output of the LED, you carry out the same
We are now ready to begin simulation. The bottom row of procedure you did with the two inputs A and B (i.e. right
icons is concerned with running the simulation. The leftmost click on the LED and select Add to Logic History) (see
icon (ringed in Fig. 2.34) is left clicked to begin the simulation. Fig. 2.35).
The next step is to change the state of the interactive input In order to use the Logic History function, you have to
devices. If you click on the hand tool icon, the cursor changes activate it from the Tools function on the toolbar. Selecting
to a hand when positioned anywhere over the work area. Tools pulls down a menu and you have to select the Logic
By putting the hand cursor over one of the input devices, History window. Figure 2.36 shows the logic history window
you can left click the mouse to change the status of the input after a simulation run. Note that the inputs and outputs have
(i.e. input 0 or input 1). When the input device is supplying the labels you gave them (i.e. A, B, and Sum).
a 1, it becomes red. Figure 2.34 shows the situation input We now need to say something about the way the simulator
A 1, B 0, and the Sum 1 (the output LED becomes operates. The simulator uses an internal clock and a record of
red when it is connected to a 1 state). You can change the the state of inputs and outputs is taken at each clock pulse.
states of the input devices to generate all the possible input Figure 2.37 shows how you can change the clock speed from
46 Chapter 2 Gates, circuits, and combinational logic
the toolbar by pulling down the Circuit menu and selecting the signals are read and recorded at each clock pulse, the entire
Clock Speed. simulation is over in a second or so. Blink and you miss it.
We’re not interested in clocks at this stage because we are We need to stop the clock to perform a manual simulation.
looking at a circuit that doesn’t have a clock. However, because The Logic History window contains a copy of the run, stop,
2.4 Introduction to Digital Works 47
pause, and single-step icons to allow you to step through the inputs and to single step (you don’t have to use the pointer
simulation. Fig. 2.38 provides details of the Logic History tool to perform a single step).
window. The waveform in Fig. 2.38 was created by putting the The logic history can be displayed either as a waveform as
simulator in the pause mode and executing a single cycle at a in Fig. 2.38 or as a binary sequence as in Fig. 2.39 by clicking
time by clicking on the single-step button. Between each cycle on the display mode icon in the Logic History window. You
we have used the hand tool to change the inputs to the EOR can also select the number of states to be displayed in this
gate. We can use the hand tool to both change the state of the window.
48 Chapter 2 Gates, circuits, and combinational logic
The single
step button
executes a
single clock
cycle.
These buttons
allow you to
specify the
number of clock
The toggle display cycles in the run.
button switches
between waveform
and binary display
modes.
2.4.3 The clock and sequence generator (in this example, a single AND gate). One of the inputs to the
AND gate comes from the clock generator and the other from
Inputting data into a digital circuit by using the hand tool to the sequence generator. We’ve added LEDs to the gate’s inputs
manipulate push buttons and switches is suitable for simple and output to make it easy to observe the state of all signals.
circuits, but not for more complex systems. Digital Works Let’s go through the operations required to place and set
provides two means of generating signals automatically. One up a sequence generator (called a bit generator by Digital
is a simple clock generator, which produces a constant stream Works). First left click on the sequencer icon on the toolbar
of alternating 1s and 0s and the other is a sequence generator, and then move the cursor to the point at which you wish to
which produces a user-defined stream of 1s and 0s. The locate this device in the workspace. Then right click to both
sequence generator is controlled by Digital Works’ own clock place it in the workspace and bring down the menu that con-
and a new 1 or 0 is output at each clock pulse. Figure 2.40 trols the bit generator. From the pull-down menu, select Edit
shows the icons for the clock and pulse generator and Sequence and the window shown in Fig. 2.41 appears. You
demonstrates how they appear when placed in the work area. can enter a sequence either from the computer’s keyboard or
Figure 2.41 demonstrates how you can define a sequence of by using the mouse on the simulated keyboard in the Edit
pulses you wish to apply to one of the inputs of a circuit Sequence window. You can either enter the sequence in
2.4 Introduction to Digital Works 49
The sequence
device has been The calculator lets
selected and right you enter a sequence
clicked. From the in either binary or
pull down menu hexadecimal form.
‘Edit Sequence’ is When you run the
selected and the simulation, the bits of
calculator appears this sequence are fed
to the input.
binary or hexadecimal form (see Chapter 4 for a discussion of be wired up to the rest of the circuit exactly like an input or
hexadecimal numbers). output device. You left-click on the macro tag icon to select it
We can run the simple circuit by clicking on the run icon. and then move the cursor to the place on the workspace you
When the system runs you will see the LEDs turn on and off. wish to insert the macro tag (i.e. the input or output port).
The speed of the clock pulses can be altered by clicking on Then you wire the macro tag to the appropriate input or out-
Circuit in the toolbar to pull down a menu that allows you to put point of the circuit. Note that you can’t apply a macro tag
set the clock speed. to the input or output of a gate directly—you have to connect
it to an input or output by a wire.
You can also place a macro tag anywhere within the work-
2.4.4 Using Digital Works to create space by right clicking the mouse when using the wiring tool.
Right clicking terminates the wiring process, inserts a macro
embedded circuits tag, and activates a pull-down menu.
Up to now, we have used Digital Works to create simple cir- We are going to take the circuit of Fig. 2.42 and convert it
cuits composed from fundamental gates. You could create an into a black box with four terminals (i.e. the macro tags). This
entire microprocessor in this manner, but it would rapidly new circuit is just a new means of representing the old
become too complex to use in any meaningful way. Digital circuit—it is not a different entity. Indeed, this circuit doesn’t
Works allows you to convert a simple circuit into a logic ele- have a different file name and is saved in the same file as the
ment itself. The new logic element can be used as a building original circuit.
block in the construction of more complex circuits. These The first step is to create the macro (i.e. black box) itself.
complex circuits can be converted into new logic elements, This is a slightly involved and repetitive process because you
and so on. Turning circuits into re-usable black boxes is anal- have to repeat the procedure once for each of the macro tags.
ogous to the use of subroutines in a high-level language. Place the cursor over one of the macro tags in Fig. 2.43 and
Let’s take the simple two-input multiplexer described in right click to pull down the menu. Select Template Editor
Fig. 2.42 and convert it into a black box with four terminals: from the menu with a left click. A new window called
two inputs A and B, a control input C whose state selects one Template Editor appears (Fig. 2.43). You create a black box
of the inputs, and an output. When we constructed this cir- representation of the circuit in this window. Digital Works
cuit with Digital Works, we used the macro tag icon to place allows you to draw a new symbol to represent the circuit (in
macro tags at the circuit’s inputs and outputs. A macro tag can Fig. 2.43 we’ve used a special shape for the multiplexer).
Figure 2.42 Converting the two-input multiplexer circuit into a black box.
2.4 Introduction to Digital Works 51
Figure 2.43 shows the Template Editor window. We have for the multiplexer and the label we’ve given it. To add a label
used the simple polyline drawing tool provided by Digital or text to the circuit, select the text tool and click on the point
Works to create a suitable shape for the representation of the you wish to insert the text. This action will pull down the Edit
multiplexer. You just click on this tool in the Template Editor Text box.
window and draw the circuit by clicking in the workspace at The next step is to add pins to the black box in the
the points you wish to draw a line. You exit the drawing mode Template Editor window and associate them with the macro
by double clicking. You can also add text to the drawing by tags in the original circuit of Fig. 2.42. Once this has been
using the text tool. Figure 2.43 shows the shape we’ve drawn done, you can use the black box representation of the multi-
plexer in other circuits. The pins you
have added to the black box are the
connections to the circuit at the
macro tags.
Pin icon. This is used Click on the pin icon in the
to place the pins on
Template Editor and then left click in
the representation of
the circuit the workspace at the point you wish
to locate this pin—see Fig. 2.44. You
then right click on this new pin
and select Associate with Tag. This
Text tool. This is the operation associates the pin you have
Use this to label we’ve just placed with a macro tag in the
label the added to the circuit diagram. Each new pin placed
symbol. multiplexer. on the circuit in the Template Editor
window is automatically numbered
Polyline drawing tool. This is the
symbol we’ve in sequence.
Use this tool to create a
suitable symbol for your drawn for the We add additional pins to the black
circuit. multiplexer. box by closing the Template Editor,
going back to the circuit, clicking on
one of the unassigned pins, and
selecting Associate with Tag again.
Remember that Digital Works auto-
Figure 2.43 Drawing a symbol for the new circuit. matically numbers the pins in the
circuit diagram as you associate them
with tags. We can finish the process by
using the text tool to add labels to the
four pins see Fig. 2.45. We have now
created a new element that behaves
exactly like the circuit from which it
was constructed and which can be
used itself as a circuit element.
Figure 2.46 shows the original or
expanded version of the circuit. Note
The pin tool how the pins have been numbered
creates an automatically.
interface point
between the black
To summarize, you create a black
box and circuit. box representation of a circuit by car-
rying out the following sequence of
This is the location of the operations.
first pin. Right click it to
associate it with the ●
In Digital Works add and connect
macro tag in the circuit
diagram. (i.e. wire up) a macro tag to your
circuit.
●
Right click the macro tag to enter
Figure 2.44 Creating an interface point in the black box. the template editor.
52 Chapter 2 Gates, circuits, and combinational logic
●
Use the Template Editor to add a pin to the circuit ●
Close the Template Editor.
representation ●
Repeat these operations, once for each macro tag.
●
In the Template Editor, select this pin and right click to
When you exit Digital Works, saving your circuit also saves
associate it with the macro tag in the circuit diagram.
its black box representation. You can regard these two circuits
as being bound together—with one
representing a short-hand version of
the other. Note that the Template
Editor also has a save function. Using
This icon is a switch. this save function simply saves the
When down, the macro drawing you’ve created but not the
will display the pins. pins, the circuit, or its logic.
When up, an embedded
macro will not show pins.
2.4.5 Using a macro
Having created a black box circuit
(i.e. a macro), we can now use it as a
This logic element building block just like any other
will behave exactly logic element. We will start a new cir-
like the circuit of the cuit in Digital Works and begin with
multiplexer. It is a
multiplexer with all an empty work area. The macro for a
its internal details two-input multiplexer we have just
hidden. created and saved is used like other
circuit elements. You click on the
embed macro icon (see Fig. 2.47) and
move the pointer to the location in
Figure 2.45 The completed black box representation. the workspace where you wish to
Figure 2.46 The original circuit with the macro tags numbered.
2.4 Introduction to Digital Works 53
place the macro. Then you left click and select the appropri- Selecting the Edit Macro function converts the black box
ate macro from the pull-down menu that appears. macro representation into the original circuit as Fig. 2.50
The macro is automatically placed at the point you clicked on demonstrates. You can now edit this circuit in the normal
and can be used exactly like a circuit element placed from one of way. When editing has been completed, you select the Close
the circuit icons. Remember that the macro is the same as the Macro icon that appears on the lower toolbar. Closing this
circuit—the only difference is its on-screen representation. window returns to the normal circuit view, which contains
In Fig. 2.47 we have placed two of the multiplexers in the the macro that has now been changed.
workspace prior to wiring them together. Figure 2.48 demon- There are two macros in the circuit diagram of Fig. 2.48. If
strates how we can wire these two macros together, add a gate, we edit one of them what happens to the other and what
and provide inputs and LED displays. happens to the original circuit? Digital Works employs object
embedding rather than object linking. When a macro is
Modifying a circuit embedded in a circuit, a copy of the macro is embedded in the
Suppose you build a circuit that contains one or more macros circuit. If you modify a macro only that copy is changed. The
(e.g. Fig. 2.48) and wish to modify it. A circuit can be modi- original macro is not altered. Moreover, if you have embed-
fied in the usual way by opening its file in Digital Works and ded several copies of a macro in a circuit, only the macro that
making any necessary changes. Digital Works even allows you you edit is changed.
to edit (i.e. modify) a circuit while it’s running. Figure 2.51 demonstrates the effect of editing the macro
In order to modify a macro itself, you have to return to the version of a two-input multiplexer. Figure 2.51(a) shows the
macro’s expanded form (i.e. the circuit that the macro repre- modified expanded macro. An OR gate has been wired to
sents). A macro is expanded by right clicking on the macro’s the A and B inputs on pins 1 and 2 and a macro tag added
symbol and selecting the Edit Macro function from the pull- to the output of the OR gate. By clicking on the macro tag, the
down menu that appears. Figure. 2.49 shows the system of Template Editor window is invoked. You can add a pin and
Fig. 2.48 in which the macro representation of the multi- assign it to the macro tag. When you exit the Template Editor
plexer in the upper left-hand side of the workspace has been and close the macro, the final circuit of Fig. 2.51(b) appears
right clicked on. (we have added an LED to the output of the new macro).
54 Chapter 2 Gates, circuits, and combinational logic
Figure 2.48 Embedding two macros, wiring them, and creating a new macro.
In order to edit a
macro, select the
pointer tool and right
click on the macro to
pull down a menu
with the Edit Macro
option.
Once the
expanded version
This is the of the macro has
expanded been edited, you
macro. It can return to the
can be circuit that
modified embeds the macro
just like any by clicking on
other. Close Macro.
(a) The modified macro. (b) The circuit with the modified macro.
ABBA The AND and OR operators are commutative so that the order of the
A⋅B B⋅A variables in a sum or product group does not matter.
A ⋅ (B ⋅ C) (A ⋅ B) ⋅ C The AND and OR operators are associative so that the order in which
A (B C) (A B) C sub-expressions are evaluated does not matter.
A ⋅ (B C) A ⋅ B A ⋅ C The AND operator behaves like multiplication and the OR operator like
A B ⋅ C (A B)(A C) addition. The first distributive property states that in an expression
containing both AND and OR operators the AND operator takes precedence
over the OR. The second distributive law, A B ⋅ C (A B)(A C), is not
valid in conventional algebra.
Table 2.14 Basic axioms of Boolean algebra. Table 2.15 Boolean operations on a
constant and a variable.
the relationship between a Boolean operator, a variable, and a values of X. A proof in which we test a theorem by examining
literal (see Table 2.15). all possibilities is called proof by perfect induction.
We can prove the validity of the equations in Table 2.15 by The axioms of Boolean algebra could be used to simplify
substituting all the possible values for X (i.e. 0 or 1). For equations, but it would be too tedious to keep going back
example, consider the axiom 0 ⋅ X 0. If X 1 we have to first principles. Instead, we can apply the axioms of
0.1 0, which is correct because by definition the output of Boolean algebra to derive some theorems to help in the sim-
an AND gate is true if and only if all its inputs are true. plification of expressions. Once we have proved a theorem
Similarly, if X 0 we have 0 ⋅ 0 0, which is also correct. by using the basic axioms, we can apply the theorem to
Therefore, the expression 0 ⋅ X 0 is correct for all possible equations.
Theorem 1 X X⋅Y X
Proof X X⋅Y X⋅1 X⋅Y Using 1 ⋅ X X and commutativity
X(1 Y) Using distributivity
X(1) Because 1 Y 1
X
Theorem 2 X X⋅Y X Y
Proof X X ⋅ Y (X X ⋅ Y) X ⋅ Y By Theorem 1 X X X ⋅ Y
X X⋅Y X⋅Y
X Y(X X) Remember that X X 1
X Y(1)
XY
Theorem 3 X⋅Y X⋅Z Y⋅Z X⋅Y X⋅Z
Proof X ⋅ Y X ⋅ Z Y ⋅ Z X ⋅ Y X ⋅ Z Y ⋅ Z(X X) Remember that (X X) 1
X⋅Y X⋅Z X⋅Y⋅Z X⋅Y⋅Z Multiply bracketed terms
X ⋅ Y(1 Z) X ⋅ Z(1 Y) Apply distributive rule
X ⋅ Y(1) X ⋅ Z(1) Because (1 Y) 1
X ⋅ Y X ⋅ Z
58 Chapter 2 Gates, circuits, and combinational logic
Inputs
X Y Z X X⋅Y X⋅Z Y⋅Z X⋅Y ⫹ X⋅Z X⋅Y ⫹ X⋅Z ⫹Y⋅Z
0 0 0 1 0 0 0 0 0
0 0 1 1 0 1 0 1 1
0 1 0 1 0 0 0 0 0
0 1 1 1 0 1 1 1 1
1 0 0 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0
1 1 0 0 1 0 0 1 1
1 1 1 0 1 0 1 1 1
same
Table 2.16 Proof of Theorem 3 by perfect induction.
We can also prove Theorem 3 by the method of perfect (Table 2.16). Because the columns labeled X ⋅ Y X ⋅ Z and
induction. To do this, we set up a truth table and demonstrate X ⋅ Y X ⋅ Z Y ⋅ Z in Table 2.16 are identical for all possible
that the theorem holds for all possible values of X, Y, and Z inputs, these two expressions must be equivalent.
Theorem 4 X(X Y) X
Proof X(X Y) X ⋅ X X ⋅ Y Multiply by X
X X⋅Y Because X ⋅ X X
X By Theorem 1
Theorem 5 X(X Y) X ⋅ Y
Proof X(X Y) X ⋅ X X ⋅ Y
0 X⋅Y Because X ⋅ X 0
X⋅Y
Theorem 6 (X Y)(X Y) X
Proof (X Y)(X Y) X ⋅ X X ⋅ Y X ⋅ Y Y ⋅ Y
X X⋅Y X⋅Y Because X ⋅ X X, Y ⋅ Y 0
X(1 Y Y)
X
Theorem 7 (X Y)(X Z) X ⋅ Z X ⋅ Y
Proof (X Y)(X Z) X ⋅ X X ⋅ Z X ⋅ Y Y ⋅ Z Multiply brackets
X⋅Z X⋅Y Y⋅Z Because X ⋅ X 0
X⋅Z X⋅Y By Theorem 3
Theorem 8 (X Y)(X Z)(Y Z) (X Y)(X Z)
Proof (X Y)(X Z)(Y Z) (X ⋅ Z X ⋅ Y)(Y Z) By Theorem 7
X⋅Y⋅Z X⋅Z⋅Z X⋅Y⋅Y X⋅Y⋅Z
X⋅Y⋅Z X⋅Z X⋅Y X⋅Y⋅Z Because X ⋅ X 1
X ⋅ Z(Y 1) X ⋅ Y(1 Z)
X⋅Z X⋅Y
(X Y)(X Z) By Theorem 7
We provide an alternative proof for Theorem 8 when we look at de Morgan’s theorem later in this chapter.
2.5 An Introduction to Boolean algebra 59
Theorem 9 X ·Y ·Z X Y Z
Proof To prove that X·Y·Z X Y Z, we assume that the
expression is true and test its consequences.
If X Y Z is the complement of X ⋅ Y ⋅ Z, then from the
basic axioms of Boolean algebra, we have
(X Y Z) ⋅ (X ⋅ Y ⋅ Z) 0 and (X Y Z) (X ⋅ Y ⋅ Z) 1
Subproof 1 (X Y Z) ⋅ X ⋅ Y ⋅ Z X ⋅ X ⋅ Y ⋅ Z Y ⋅ X ⋅ Y ⋅ Z Z ⋅ X ⋅ Y ⋅ Z
X ⋅ X ⋅ (Y ⋅ Z) Y ⋅ Y ⋅ (Y ⋅ Z) Z ⋅ Z(X ⋅ Y)
0
Subproof 2 (X Y Z) X ⋅ Y ⋅ Z Y ⋅ Z ⋅ (X) X Y Z Re-arrange equation
Y⋅Z X Y Z Use A ⋅ B B A B
(Y Y ⋅ Z) X Z Re-arrange equation
YZZX
Y1X1 Use Z Z 1
As we have demonstrated that
(X Y Z) ⋅ X ⋅ Y ⋅ Z 0 and that
(X Y Z) X ⋅ Y ⋅ Z 1, it follows that X Y Z is the
complement of X ⋅ Y ⋅ Z.
Theorem 10 X ·Y ·Z X Y Z
Proof One possible way of proving Theorem 10 is to use the method
we used to prove Theorem 9. For the sake of variety, we will
prove Theorem 10 by perfect induction (see Table 2.17).
Inputs
X Y Z X ⫹Y ⫹ Z XYZ X Y Z X⋅Y⋅Z
0 0 0 0 1 1 1 1 1
0 0 1 1 0 1 1 0 0
0 1 0 1 0 1 0 1 0
0 1 1 1 0 1 0 0 0
1 0 0 1 0 0 1 1 0
1 0 1 1 0 0 1 0 0
1 1 0 1 0 0 0 1 0
1 1 1 1 0 0 0 0 0
same
Table 2.17 Proof of Theorem 10 by perfect induction.
OBSERVATIONS
When novices first encounter Boolean algebra, it is not Observation 2 X ⋅ Y X ⋅ Y is not equal to 1
uncommon for them to invent new theorems that are incorrect X ⋅ Y X ⋅ Y cannot be simplified
(because they superficially look like existing theorems).We
include the following observations because they represent the Observation 3 X·Y is not equal to X·Y
most frequently encountered misconceptions.
Observation 4 X Y is not equal to X Y
Observation 1 X ⋅ Y X ⋅ Y is not equal to 1
X ⋅ Y X ⋅ Y cannot be simplified
Inputs Functions
A B F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15
0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
Examples of the use of Boolean algebra in a cheaper version of the logic circuit. The following equations
simplifying equations are generally random functions chosen to demonstrate the
Having presented the basic rules of Boolean algebra, the next rules of Boolean algebra.
step is to show how it’s used to simplify Boolean expressions. (a) X Y X ⋅ Y (X Y) ⋅ X ⋅ Y
By simplifying these equations you can sometimes produce (b) X ⋅ Y ⋅ Z X ⋅ Y ⋅ Z X ⋅ Y ⋅ Z X ⋅ Y ⋅ Z
2.5 An Introduction to Boolean algebra 61
Solutions
When I simplify Boolean expressions, I try to keep the order of the variables alphabetical, making it easier to pick out logical
groupings.
(a) X+Y X ⋅ Y (X Y)⋅X ⋅ Y X Y X ⋅ Y X ⋅ X ⋅ Y X ⋅ Y ⋅ Y
X Y X⋅Y As A ⋅ A 0
XYY as A A ⋅ B A B
1 as A A 1
Note: When a Boolean expression can be reduced to the constant 1, the expression is always true and is independent of the
variables.
(b) X ⋅ Y ⋅ Z X ⋅ Y ⋅ Z X ⋅ Y ⋅ Z X ⋅ Y ⋅ Z X ⋅ Y ⋅ (Z Z) X ⋅ Z ⋅ (Y Y)
X ⋅ Y ⋅ (1) X ⋅ Z ⋅ (1)
X⋅Y X⋅Z
Note: Both expressions in examples (b) and (c) simplify to X ⋅ Y X ⋅ Z, demonstrating that these two expressions are equiv-
alent. These equations are those of the multiplexer with (b) derived from the truth table (Table 2.9) and (c) from the circuit
diagram of Fig. 2.14.
Input X0 Z0
X X1 Z1
2-bit by 2-bit 4-bit product
multiplier Z
Input Y0 Z2
Y Y1 Z3
Inputs Output
X Y Z
X xY = Z X1 X0 Y1 Y0 Z3 Z2 Z1 Z0
000 0 0 0 0 0 0 0 0
010 0 0 0 1 0 0 0 0
020 0 0 1 0 0 0 0 0
030 0 0 1 1 0 0 0 0
100 0 1 0 0 0 0 0 0
111 0 1 0 1 0 0 0 1
122 0 1 1 0 0 0 1 0
133 0 1 1 1 0 0 1 1
200 1 0 0 0 0 0 0 0
212 1 0 0 1 0 0 1 0
224 1 0 1 0 0 1 0 0
236 1 0 1 1 0 1 1 0
300 1 1 0 0 0 0 0 0
313 1 1 0 1 0 0 1 1
326 1 1 1 0 0 1 1 0
339 1 1 1 1 1 0 0 1
These examples illustrate the art of manipulating Boolean The multiplier has four inputs, X1, X0, Y1, Y0, (indicat-
expressions. It’s difficult to be sure we have reached an optimal ing a 16-line truth table) and four outputs. Table 2.18 pro-
solution. Later we study Karnaugh maps, which provide an vides a truth table for the binary multiplier. Each 4-bit
approach that gives us confidence that we’ve reached an opti- input represents the product of two 2-bit numbers so that,
mal solution. for example, an input of X1, X0, Y1, Y0 1011 represents
the product 102 112 or 2 3. The corresponding out-
The Design of a 2-bit Multiplier put is a 4-bit product, which, in this case, is 6 or 0110 in
The following example illustrates how Boolean algebra is binary form.
applied to a practical problem. A designer wishes to produce From Table 2.18, we can derive expressions for the four
a 2-bit by 2-bit binary multiplier. The two 2-bit inputs are X1, outputs, Z0 to Z3. Whenever a truth table has m output
X0 and Y1, Y0 and the four-bit product at the output terminals columns, a set of m Boolean equations must be derived. One
is Z3, Z2, Z1, Z0. We have not yet introduced binary arithmetic equation is associated with each of the m columns. To derive
(see Chapter 4), but nothing difficult is involved here. We an expression for Z0, the four minterms in the Z0 column are
begin by considering the block diagram of the system ORed logically.
(Fig. 2.52) and constructing its truth table.
Z0 X1 ·X0 ·Y1 ·Y0 X1 ·X0 ·Y1 ·Y0 X1 ·X0 ·Y1 ·Y0 X1 ·X0 ·Y1 ·Y0
X1 ·X0 ·Y0(Y1 Y1) X1 ·X0 ·Y0(Y1 Y1)
X1 ·X0 ·Y0 X1 ·X0 ·Y0
X0 ·Y0(X1 X1)
X0 ·Y0
Z1 X1 ·X0 ·Y1 ·Y0 X1 ·X0 ·Y1 ·Y0 X1 ·X0 ·Y1 ·Y0 X1 ·X0 ·Y1 ·Y0 X1 ·X0 ·Y1 ·Y0 X1 ·X0 ·Y1 ·Y0
X1 ·X0 ·Y1(Y0 Y0) X1 ·X0 ·Y0(Y1 Y1) X1 ·X0 ·Y1 ·Y0 X1 ·X0 ·Y1 ·Y0
X1 ·X0 ·Y1 X1 ·X0 ·Y0 X1 ·X0 ·Y1 ·Y0 X1 ·X0 ·Y1 ·Y0
X0 ·Y1(X1 X1 ·Y0) X1 ·Y0(X0 X0 ·Y1)
X0 ·Y1(X1 Y0) X1 ·Y0(X0 Y1)
X1 ·X0 ·Y1 X0 ·Y1 ·Y0 X1 ·X0 ·Y0 X1 ·Y1 ·Y0
We have now obtained four simplified sum of products itself is symmetric in X and Y (i.e. 3 1 1 3), then the
expressions for Z0 to Z3; that is, result should also demonstrate this symmetry. There are
Z0 X0 ·Y0 many ways of realizing the expressions for Z0 to Z3. The
circuit of Fig. 2.53 illustrates one possible way.
Z1 X1 ·X0 ·Y1 X0 ·Y1 ·Y0 X1 ·X0 ·Y0 X1 ·Y1 ·Y0
Z2 X1 ·X0 ·Y1 X1 ·Y1 ·Y0
Z3 X1 ·X0 ·Y1 ·Y0 2.5.2 De Morgan’s theorem
It’s interesting to note that each of the above expressions is Theorems 9 and 10 provide the designer with a powerful tool
symmetric in X and Y. This is to be expected—if the problem because they enable an AND function to be implemented by
64 Chapter 2 Gates, circuits, and combinational logic
X1 X0 Y1 Y0
X1 X0 Y1 Y0
X0Y0
Z0
X1X0Y1
X0Y1Y0
Z1
X1X0Y0
X1Y1Y0
X1X0Y1
Z2
X1Y1Y0
X1X0Y1Y0 Z3
an OR gate and inverter. Similarly, these theorems enable an literals) are complemented. The following examples illustrate
OR gate to be implemented by an AND gate and inverter. the application of de Morgan’s theorem.
We first demonstrate how de Morgan’s theorem is applied
1. F X·Y X·Z We wish to apply de Morgan’s
to Boolean expressions and then show how circuits can be
theorem to the right-hand side
converted to NAND-only or NOR-only forms. You may
X ·Y·X ·Z The becomes ⋅ and variables
wonder why anyone should wish to implement circuits in
‘X ⋅ Y’ and ‘X ⋅ Z’ complemented
NAND (or NOR) logic only. There are several reasons for
(X Y)(X Z) Variables X·Y and X ·Y are
this, but, in general, NAND gates operate at a higher speed
themselves complemented
than AND gates and NAND gates can be built with fewer
components (at the chip level). Later we shall examine in As you can see, the first step is to replace the OR by an AND
more detail how a circuit can be designed entirely with operator. The compound variables X ⋅ Y and X ⋅ Z are comple-
NAND gates only. mented to get X ·Y and X·Z. The process is continued by
To apply de Morgan’s theorem to a function the ANDs are applying de Morgan to the two complemented groups (i.e.
changed into ORs, ORs and the into ANDs and variables (and X·Y becomes X Y and X·Z becomes X Z).
2.5 An Introduction to Boolean algebra 65
2.5.3 Implementing logic functions in We’ve now converted the OR function into a NAND func-
tion. The three NOT functions that generate A, B, and C can
NAND or NOR two logic only
be implemented in terms of NOT gates, or by means of two-
Some gates are better than others; for example, the NAND input NAND gates with their inputs connected together.
gate is both faster and cheaper than the corresponding AND Figure 2.54 shows how the function F A B C can
gate. Consequently, it’s often necessary to realize a circuit be implemented in NAND logic only. If the inputs of a
using one type of gate only. Engineers sometimes implement NAND gate are A and B, and the output is C, then C A·B.
a digital circuit with one particular type of gate because there But if A B, then C A·A or C A. You can better under-
is not a uniform range of gates available. For obvious eco- stand this by looking at the truth table for the NAND gate,
nomic reasons manufacturers don’t sell a comprehensive and imagining the effect of removing the lines A, B 0, 1 and
range of gates (e.g. two-input AND, three-input AND, . . . , A, B 1, 0.
10-input AND, two-input OR, . . . ). For example, there are It’s important to note that we are not using de Morgan’s
many types of NAND gate, from the quad two-input NAND theorem here to simplify Boolean expressions. We are using
to the 13-input NAND, but there are few types of AND gates. de Morgan’s theorem to convert an expression into a form
NAND logic We first look at the way in which circuits can suitable for realization in terms of NAND (or NOR) gates.
be constructed from nothing but NAND gates and then
demonstrate that we can also fabricate circuits with NOR
gates only. To construct a circuit solely in terms of NAND
gates, de Morgan’s theorem must be invoked to get rid of all A A
OR operators in the expression. For example, suppose we
wish to generate the expression F A B C using NAND
gates only. We begin by applying a double negation to the
expression, as this does not alter the expression’s value but it B
does give us the opportunity to apply de Morgan’s theorem. B
X1 X0 Y1 Y0
X1 X0 Y1 Y0
Z0
Z1
Z2
Z3
only. By way of illustration, the value of Z3 in the 2-bit multi- Fig. 2.57 shows the construction of the two versions of
plier can be converted to NOR logic form in the following way AB CD in Digital Works. We have provided an LED at each
output and manually selectable inputs to enable you to inves-
Z3 X1 ·X0 ·Y1 ·Y0
tigate the circuits.
X1 ·X0 ·Y1 ·Y0
X1 X0 Y1 Y0 2.5.4 Karnaugh maps
Note that negation may be implemented by an inverter or by When you use algebraic techniques to simplify a Boolean
a NOR gate with its inputs connected together. expression you sometimes reach a point at which you
As a final example of NAND logic consider Fig. 2.56. A can’t proceed, because you’re unable to find further
Boolean expression can be expressed in sum-of-products simplifications. The Karnaugh map, or more simply the
form as A ⋅ B C ⋅ D. This expression can be converted to K-map, is a graphical technique for the representation and
NAND logic as simplification of a Boolean expression that shows unambigu-
ously when a Boolean expression has been reduced to its
A·B ·C ·D
most simple form.
Note how the three-gate circuit in Fig. 2.56(a) can be Although the Karnaugh map can simplify Boolean equa-
converted into the three-gate NAND circuit of Fig. 2.56(b). tions with five or six variables, we will use it to solve problems
A AB A AB
B B
C AB+CD C
AB.CD
D CD D CD
EXAMPLE
Show that the exclusive or, EOR, operator is associative, so (A䊝B)䊝C (A·B A·B)䊝C
that A 䊝(B 䊝 C) (A 䊝 B) 䊝 C.
(A·B A·B)C (A·B A·B)C
A 䊝 (B 䊝 C) A 䊝 (B·C B·C) A·B·C A·B·C (A·B·A·B)C
A(B·C B·C) A(B·C B ·C) A·B·C A·B·C (A B)·(A B)C
A(B C)(B C) A·B·C A·B·C A·B·C A·B·C (A·B A·B)C
A(B·C B·C) A·B·C A·B·C Both these expressions are equal and therefore the 䊝
A·B·C A·B·C A·B·C A·B·C operator is associative.
with only three or four variables. Other techniques such as a Boolean expression can immediately be seen from the loca-
the Quine–McCluskey method can be applied to the simplifi- tion of 1s on the map. A system with n variables has 2n lines in
cation of Boolean expressions in more than six variables. its truth table and 2n squares on its Karnaugh map. Each
However, these techniques are beyond the scope of this book. square on the Karnaugh map is associated with a line (i.e.
The Karnaugh map is just a two-dimensional form of the minterm) in the truth table. Figure 2.58 shows Karnaugh
truth table, drawn in such a way that the simplification of maps for one to four variables.
2.5 An Introduction to Boolean algebra 69
A
0 1 0 1
B
A A A AB AB
0
expression onto the map). The second skill is the ability to map. If it isn’t clear how the entries in the table are plotted on
group the 1s you’ve plotted on the map. The third skill is to the Karnaugh map, examine Fig. 2.60 and work out which cell
read the groups of 1s on the map and express each group as a on the map is associated with each line in the table. A square
product term. containing a logical 1 is said to be covered by a 1.
We now use a simple three-variable map to demonstrate At this point it’s worth noting that no two 1s plotted on the
how a truth table is mapped onto a Karnaugh map. One-and Karnaugh map of Fig. 2.60 are adjacent to each other, and
two-variable maps represent trivial cases and aren’t consid- that the function F A ⋅ B ⋅ C A ⋅ B ⋅ C A ⋅ B ⋅ C cannot
ered further. Figure 2.60 shows the truth table for a three- be simplified. To keep the Karnaugh maps as clear and
variable function and the corresponding Karnaugh map. uncluttered as possible, squares that do not contain a 1 are left
Each of the three 1s in the truth table is mapped onto its unmarked even though they must, of course, contain a 0.
appropriate square on the Karnaugh map. Consider Fig. 2.61 in which the function F1 A ⋅ B ⋅ C
A three-variable Karnaugh map has four vertical columns, A ⋅ B ⋅ C is plotted on the left-hand map. The two minterms in
one for each of the four possible values of two out of the three this function are A ⋅ B ⋅ C and A ⋅ B ⋅ C and occupy the cells for
variables. For example, if the three variables are A, B, and C, which A 1, B 1, C 0, and A 1, B 1, C 1,
the four columns represent all the combinations of A and B. respectively. If you still have difficulty plotting minterms, just
The leftmost column is labeled 00 and represents the region for think of them as coordinates of squares; for example, A ⋅ B ⋅ C
which A 0, B 0. The next column is labeled 01, and repre- has the coordinates 1,1,0 and corresponds to the square
sents the region for which A 0, B 1. The next column is ABC 110.
labeled 11 (not 10), and represents the region for which A 1, In the Karnaugh map for F1 two separate adjacent squares
B 1. Remember that adjacent columns differ by only one are covered. Now look at the Karnaugh map for F2 A ⋅ B at
variable at a time. The fourth column, 10, represents the region the right-hand side of Fig. 2.61. In this case a group of two
for which A 1, B 0. In fact, a Karnaugh map is made up of squares is covered, corresponding to the column A 1,
all possible 2n minterms for a system with n variables. B 1. As the function for F2 does not involve the variable C,
The three-variable Karnaugh map in Fig. 2.60 has two a 1 is entered in the squares for which A B 1 and C 0,
horizontal rows, the upper row corresponding to C 0 and and A B 1 and C 1; that is, a 1 is entered for all values
the lower to C 1. Any square on this Karnaugh map repre- of C for which AB 11. When plotting a product term like
sents a unique combination of the three variables, from A ⋅ B A ⋅ B on the Karnaugh map, all you have to do is to locate the
⋅ C to A ⋅ B ⋅ C. region for which AB 11.
Figure 2.60 demonstrates how a function of three variables, It is immediately obvious that both Karnaugh maps in
F A ⋅ B ⋅ C A ⋅ B ⋅ C A ⋅ B ⋅ C is plotted on a Karnaugh Fig. 2.61 are identical, so that F1 F2 and A ⋅ B ⋅ C A ⋅ B ⋅ C
A ⋅ B. From the rules of Boolean algebra A ⋅ B ⋅ C A ⋅ B ⋅ C
A ⋅ B (C C) A ⋅ B(1) A ⋅ B. It should be apparent that
A B C F AB
0 0 0 1 C 00 01 11 10 two adjacent squares in a Karnaugh map can be grouped
0 0 1 0 together to form a single simpler term. It is this property that
0 1 0 0 0 1
the Karnaugh map exploits to simplify expressions.
0 1 1 1
1 0 0 0
1 0 1 1 1 1 1 Simplifying Sum-of-Product expressions with a
1 1 0 0 Karnaugh map
1 1 1 0
The first step in simplifying a Boolean expression by means of
a Karnaugh map is to plot all the 1s (i.e. minterms) in the
Figure 2.60 Relationship between a Karnaugh map and function’s truth table on the Karnaugh map. The next step is
truth table. to combine adjacent 1s into groups of one, two, four, eight, or
AB AB
C 00 01 11 10 C 00 01 11 10
0 1 ABC 0
1 1 ABC 1
16. The groups of minterms should be as large as possible—a three-variable product terms cover 2 squares
single group of four minterms yields a simpler expression four-variable product terms cover 1 square.
than two groups of two minterms. The final stage in simplify- 2. A square covered by a 1 may belong to more than one term
ing an expression is reached when each of the groups of in the sum-of-products expression. For example, in
minterms (i.e. the product terms) are ORed together to form Fig. 2.63 the minterm A ⋅ B ⋅ C ⋅ D belongs to two groups,
the simplified sum-of-products expression. This process is A ⋅ B and C ⋅ D. If a 1 on the Karnaugh map appears in two
best demonstrated by means of examples. In what follows, a groups, it is equivalent to adding the corresponding
four-variable map is chosen to illustrate the examples. minterm to the overall expression for the function plotted
Transferring a truth table to a Karnaugh map is easy on the map twice. Repeating a term in a Boolean expression
because each 1 in the truth table is placed in a unique square does not alter the value of the expression, because one of
on the map. We now have to demonstrate how the product the axioms of Boolean algebra is X X X.
terms of a general Boolean expression are plotted on the map.
Figures 2.62–2.67 present six functions plotted on Karnaugh 3. The Karnaugh map is not a square or a rectangle as it
maps. In these diagrams various sum-of-products expressions appears in these diagrams. A Karnaugh map is a torus or
have been plotted directly from the equations themselves, doughnut shape. That is, the top edge is adjacent to the
rather than from the minterms of the truth table. The follow- bottom edge and, the left-hand edge is adjacent to the
ing notes should help in understanding these diagrams. right-hand edge. For example, in Figure 2.65 the term A ⋅
D covers the two minterms A ⋅ B ⋅ C ⋅ D and A ⋅ B ⋅ C ⋅ D at
1. For a four-variable Karnaugh map the top, and the two minterms A ⋅ B ⋅ C ⋅ D and A ⋅ B ⋅ C ⋅ D
one-variable product term covers 8 squares at the bottom of the map. Similarly, in Fig. 2.66 the term
two-variable product terms cover 4 squares B ⋅ D covers all four corners of the map. Whenever a group
AB 00 01 11 10
CD
00
AB
CD 00 01 11 10
00 1
AB The two-variable term A ⋅ B covers four squares
(the region A 0 and B 0). The two-variable
01 1
CD term C ⋅ D covers four squares (the region C 1
and D 1). The term A ⋅ B ⋅ C⋅ D is common to
11 1 1 1 1 both groups.
10 1
AB
CD 00 01 11 10
00 1 1
AB
10 1 1 A
AB 00 01 11 10
CD
ABCD
00 1 1
AD
10 1 1
AB
00 01 11 10
CD
00 1 1
The four-variable term B·D covers four squares
BD (the region B 0, D 0). In this case the
01 1 adjacent squares are the corner squares. If you
examine any pair of horizontally or vertically
ABCD
adjacent corners, you will find that they differ in
11 one variable only.
ACD
10 1 1 1
–– – – –
Figure 2.66 Plotting F B D ABCD ACD on a Karnaugh map.
of terms extends across the edge of a Karnaugh map, we question, ‘what minterms (squares) are covered by this
have shaded it to emphasize the wraparound nature of term?’ Consider the term A ⋅ D in Fig. 2.62. This term covers
the map. all squares for which A 0 and D 1 (a group of 4).
4. In order either to read a product term from the map, or to Having shown how terms are plotted on the Karnaugh
plot a product term on the map, it is necessary to ask the map, the next step is to apply the map to the simplification of
2.5 An Introduction to Boolean algebra 73
AB
CD 00 01 11 10
00 1 1
ABC
10 1
ACD
AB AB
CD 00 01 11 10 CD 00 01 11 10
00 1 1 00 1 1
01 1 1 01 1 1
11 1 1 1 1
11
10 1 10 1
Figure 2.68 Karnaugh map for Example 1.
AB AB
CD 00 01 11 10 CD 00 01 11 10
00 1 1 00 1 1
01 1 1 1 01 1 1 1
1 1 1 1
11 11
10 1 10 1
Figure 2.69 Karnaugh map for Example 2.
the expressions. Once again, we demonstrate this process by Example 2 F A·C ·D A·B·C A·C·D A·B ·D (Fig.
means of examples. In each case, the original function is plot- 2.69). In this case only one regrouping is possible. The simpli-
ted on the left-hand side of the figure and the regrouped ones fied function is F B ·D A·C ·D A·C·D A·B·C.
(i.e. minterms) are plotted on the right-hand side.
Example 3 F A ·B·C·D A·B ·C·D A·B·C·D
Example 1 Figure 2.68 gives a Karnaugh map for the expres- A·B·C ·D A·B ·C ·D A·B·C·D A·B·C ·D A·B·C ·D
sion F A·B A ·B·C·D A·B·C·D A·B ·C·D. The (Fig. 2.70). This function can be simplified to two product
simplified function is F A·B B·D A·C ·D. terms with F B ·D B·D.
74 Chapter 2 Gates, circuits, and combinational logic
AB AB
CD 00 01 11 10 CD 00 01 11 10
00 1 1 00 1 1
01 1 1 01 1 1
1 1 1 1
11 11
10 1 1 10 1 1
Figure 2.70 Karnaugh map for Example 3.
AB AB AB
C 00 01 11 10 C 00 01 11 10 C 00 01 11 10
0 1 1 1 0 1 1 1 0 1 1 1
1 1 1 1 1 1
1 1 1
(a) Ones placed. (b) Ones grouped. (c) Alternate grouping. Figure 2.71 Karnaugh map for Example 4.
AB AB
00 01 11 10 CD 00 01 11 10
CD
00 1 1 1 1 00
01 1 1 01 0 0
1 1 0 0
11 11
Example 4 F A ·B·C A·B·C A·B·C A·B·C complement of this function. The group of four 0s corre-
A·B·C (Fig. 2.71). We can group the minterms together in two sponds to the expression F B·D.
ways, both of which are equally valid; that is, there are two Example 6 We can use a Karnaugh map to convert of sum-
equally correct simplifications of this expression. We can write of-products expression into a product-of-sums expression.
either F A ·B A·C A·B or F A·B B·C A·B. In Example 5, we used the Karnaugh map to get the comple-
ment of a function in a product-of-sums form. If we then
Applications of Karnaugh maps
complement the complement, we get the function but in a
Karnaugh maps can also be used to convert sum-of-products sum-of-products form (because de Morgan’s theorem allows
expressions to the corresponding product-of-sums form. The us to step between SoP and PoS forms). Let’s convert
first step in this process involves the generation of the com- F A·B·C C·D A·B·D into product of sums form
plement of the sum-of-products expression. (Fig. 2.73).
Example 5 The Karnaugh map in Fig. 2.72 demonstrates how The complement of F is defined by the zeros on the map
we can obtain the complement of a sum-of-products expression. and may be read from the right-hand map as
Consider the expression F C ·D A·B A·B C·D F C·D B ·C A·D
(left-hand side of Fig. 2.72). If the squares on a Karnaugh
F C ·D B ·C A·D
map covered by 1s represent the function F, then the remain-
ing squares covered by 0s must represent F, the complement (C D)(B C)(A D)
of F. In the right-hand side of Fig. 2.72, we have plotted the We now have an expression for F in product-of-sums form.
2.5 An Introduction to Boolean algebra 75
AB AB
CD 00 01 11 10 CD 00 01 11 10
00 00 0 0 0 0
01 1 1 1 1 01
11 1 1 11 0 0
AB AB
CD 00 01 11 10 CD 00 01 11 10
00 1 00 1
01 1 1 1 01 1 1 1
1 1 1 1 1 1 1 1
11 11
10 1 1 1 10 1 1 1
Figure 2.74 Karnaugh map corresponding
(a) Location of the 1s. (b) After grouping the 1s. to Table 2.19.
A
Inputs Number F
B A B C D
C 0 0 0 0 0 0
0 0 0 1 1 0
D 0 0 1 0 2 0
0 0 1 1 3 0
0 1 0 0 4 1 Divisible by 4
0 1 0 1 5 1 Divisible by 5
0 1 1 0 6 1 Divisible by 6
0 1 1 1 7 1 Divisible by 7
1 0 0 0 8 1 Divisible by 4
1 0 0 1 9 0
1 0 1 0 10 1 Divisible by 5
Figure 2.75 NAND-only circuit for fire detector. 1 0 1 1 11 0
1 1 0 0 12 1 Divisible by 6
of Boolean variables, we can write X (Y ⋅ Z) (X ⋅ Y)Z and 1 1 0 1 13 0
hence extending this to our equation we get 1 1 1 0 14 1 Divisible by 7
1 1 1 1 15 0 False by definition
F A·B·A ·C·A ·D·B ·C·B·D·C·D
Figure 2.75 shows how this expression can be implemented Table 2.20 Truth table for example.
in terms of two- and three-input NAND gates.
AB A A
CD 00 01 11 10
00 1 1 1
AB
B
01 1 F
AD AB
1
11 AD
D
D
10 1 1 1
the map and to incorporate them in the smallest number of The input condition C 1, H 1 in Table 2.20 has no real
large groups. meaning, because it’s impossible to be too hot and too cold
The following example demonstrates the concept of simultaneously. Such an input condition could arise only if at
impossible input conditions. An air conditioning system has least one of the thermostats failed. Consider now the example
two temperature control inputs. One input, C, from a cold- of an air conditioning unit with four inputs and four outputs.
sensing thermostat is true if the temperature is below 15C Table 2.22 defines the meaning of the inputs to the controller.
and false otherwise. The other input, H, from a hot-sensing The controller has four outputs P, Q, R, and S. When P 1
thermostat is true if the temperature is above 22C and false a heater is switched on and when Q 1 a cooler is switched
otherwise. Table 2.21 lists the four possible logical conditions on. Similarly, a humidifier is switched on by R 1 and a
for the two inputs. dehumidifier by S 1. In each case a logical 0 switches off the
appropriate device. The relationship
between the inputs and outputs is as
AB AB follows.
CD 00 01 11 10 CD 00 01 11 10
●
If the temperature and humidity are
00 00 both within limits, switch off the
heater and the cooler. The humidifier
and dehumidifier are both switched
01 1 X X 01 1 X X
off unless stated otherwise.
●
If the humidity is within limits,
1 1 1 1
11 11 switch on the heater if the tempera-
ture is too low and switch on the
10 10 cooler if the temperature is too high.
●
If the temperature is within limits,
switch on the heater if the humidity is
(a) The function F = ABD + ABCD. (b) The function F = BD.
too low and the cooler if the humidity
Note that the inputs ABCD and ABCD Minterm ABCD is included to simplify
cannot occur the expression is too high.
●
If the humidity is high and the tem-
Figure 2.79 The effect of don’t care conditions. perature low, switch on the heater. If
the humidity is low and the tempera-
ture high, switch on the cooler.
●
If both the temperature and humidity are high switch on
Inputs Meaning
the cooler and dehumidifier.
C H ●
If both the temperature and humidity are too low switch on
the heater and humidifier.
0 0 Temperature OK
0 1 Too hot The relationship between the inputs and outputs can now be
1 0 Too cold expressed in terms of a truth table (Table 2.23). We can draw
1 1 Impossible condition Karnaugh maps for P to S, plotting a 0 for a zero state, a 1 for a
one state, and an X for an impossible state. Remember that an
X on the Karnaugh map corresponds to a state that cannot
Table 2.21 Truth table for a pair of temperature sensors. exist and therefore its value is known as a don’t care condition.
HC HC
WD 00 01 11 10 WD 00 01 11 10
00 1 X 00 X 1
01 1 1 X 01 X 1
11 X X X X 11 X X X X
10 1 X 10 1 X 1
HC HC
00 01 11 10 00 01 11 10
WD WD
00 X 00 X
01 1 X 01 X
11 X X X X 11 X X X X
10 X 10 X 1
Display
Decoder a a D ·C·B·A C·B ·A
a (D C B A) (C B A)
b f b
c g
D·C D·B D·A C·C C ·B C·A C·B
d
e B·B B·A C ·A B·A A·A
f e c D·C D·B D·A C·B C·A C ·B B
g d
The decoder converts
B·A C ·A B·A
a 4-bit binary numeric
code on D, C, B, A into
D·C D·A C·A B C ·A
the signals that light up
segments a to g of the D·C C·A B C ·A
display
This expression offers no improvement over the first realiza-
Figure 2.82 The seven-segment display. tion of a.
Figure 2.85 provides the Karnaugh map for segment b,
which gives b C B ·A B·A. We can proceed as we did
Figure 2.83 gives the Karnaugh map for segment a. From for segment a and see what happens if we use b. Plotting zeros
the Karnaugh map we can write down the expression for on the Karnaugh map for b we get b C B ·A ·C ·B·A
a D B C·A C ·A. Fig. 2.86. Therefore,
An alternative approach is to obtain a by considering
the zeros on the map to get the complement of a. From b C·B ·A C·B·A
the Karnaugh map in Fig. 2.84 we can write (C B A)(C B A)
a D·C·B·A C·B ·A. Therefore, C B·A B ·A
2.5 An Introduction to Boolean algebra 81
0 0 0 0 1 1 1 1 1 1 0
0 0 0 1 0 1 1 0 0 0 0
0 0 1 0 1 1 0 1 1 0 1
0 0 1 1 1 1 1 1 0 0 1
0 1 0 0 0 1 1 0 0 1 1
0 1 0 1 1 0 1 1 0 1 1
0 1 1 0 1 0 1 1 1 1 1
0 1 1 1 1 1 1 0 0 0 0
1 0 0 0 1 1 1 1 1 1 1
1 0 0 1 1 1 1 0 0 1 1
1 0 1 0 Forbidden code X X X X X X X
1 0 1 1 X X X X X X X
1 1 0 0 X X X X X X X
1 1 0 1 X X X X X X X
1 1 1 0 X X X X X X X
1 1 1 1 X X X X X X X
DC DC
BA 00 01 11 10 BA 00 01 11 10
00 1 X 1 00 1 X 1
01 1 X 1 01 1 X 1
11 1 1 X X 11 1 1 X X
DC DC
BA 00 01 11 10 BA 00 01 11 10
00 0 X 00 0 X
01 0 X 01 0 X
11 X X 11 X X
DC DC
BA 00 01 11 10 BA 00 01 11 10
00 1 1 X 1 00 1 1 X 1
01 1 X 1 01 1 X 1
11 1 1 X X 11 1 1 X X
DC DC
BA 00 01 11 10 BA 00 01 11 10
00 X 00 X
01 0 X 01 0 X
11 X X 11 X X
DC DC DC
BA 00 01 11 10 BA 00 01 11 10 BA 00 01 11 10
00 00 1 1 00 1 1
01 1 01 1 01 1 1
11 1 1 1 11 1 1 11 1 1
10 1 1 10 1 10 1 1
(a) Karnaugh map for n. (b) Karnaugh map for F1. (c) Karnaugh map for F0.
DC DC DC
BA 00 01 11 10 BA 00 01 11 10 BA 00 01 11 10
00 00 1 1 00 1 1
01 01 1 01 1 1
1
11 1 1 1 11 1 1 11 1 1
10 1 1 10 1 10 1 1
(a) Regrouped Karnaugh map for n. (b) Regrouped Karnaugh map for F1. (c) Regrouped Karnaugh map for F0.
pins are spaced by 0.1 inch. Two pins are used for the power
2.6 Special-purpose logic supply (Vcc 5.0 V and ground 0 V). These devices are
elements often called 74-series logic elements because the part number
of each chip begins with 74; for example, a 7400 chip contains
So far, we’ve looked at the primitive logic elements from four NAND gates. Today, the packaging of such gates has
which all digital systems can be constructed. As technology shrunk to the point where the packages are very tiny and are
progressed, more and more components were fabricated on attached to circuit boards by automatic machines.
single chips of silicon to produce increasingly complex cir- It soon became possible to put tens of gates on a chip and
cuits. Today, you can buy chips with tens of millions of gates manufacturers connected gates together to create logic func-
that can be interconnected electronically (i.e. the chip pro- tions such as a 4-bit adder, a multiplexer, and a decoder. Such
vides a digital system whose structure can be modified elec- circuits are called medium-scale integration (MSI). By the
tronically by the user). Indeed, by combining microprocessor 1970s entire systems began to appear on a single silicon chip,
technology, electronically programmable with arrays of of which the microprocessor is the most spectacular example.
gates, we can now construct self-modifying (self-adaptive) The technology used to make such complex systems is called
digital systems. large-scale integration (LSI). In the late l980s LSI gave way to
Let’s briefly review the development of digital circuits. The very-large-scale integration (VLSI), which allowed designers
first digital circuits contained a few basic NAND, NOR, AND to fabricate millions of transistors on a chip. Initially, VLSI
gates, and were called small-scale integration (SSI). Basic SSI technology was applied to the design of memories rather
gates were available in 14-pin dual-in-line (DIL) packages. than microprocessors. Memory systems are much easier to
Dual-in-line simply means that there are two parallel rows of design because they have a regular structure (i.e. a simple
pins (i.e. contacts) forming the interface between the chip memory cell is replicated millions of times).
and the outside world. The rows are 0.3 inches apart and the
84 Chapter 2 Gates, circuits, and combinational logic
A major change in digital technology occurred in the mid Figure 2.89 illustrates the structure of a 1-of-8 data
1990s. From the 1970s to the 1990s, digital logic had largely multiplexer, which has eight data inputs, D0, D1, D2, . . ., D7,
used a power supply of 5 V. As the number of gates per chip an output Y, and three data select inputs, S0, S1, S2. When S0,
approached the low millions, the problem of heat manage- S1, S2 0, 0, 0 the output is Y D0, and when S0, S1, S2 1,
ment created a limit to complexity. It was obvious that more 0, 0 the output Y D1, etc. That is, if the binary value at the
and more transistors couldn’t be added to a chip without data select input is i, the output is given by Y Di.
limit because the power they required would destroy the chip. A typical application of the 1-of-8 multiplexer is in the
Radiators and fans were used to keep chips cool. Improvements selection of one out of eight logical conditions within a digital
in silicon technology in the 1990s provided digital logic ele- system. Figure 2.90 demonstrates how the 1-of-8 multiplexer
ments that could operate at 3 V or less and, therefore, create might be used in conjunction with a computer’s flag register
less heat. A further impetus to the development of low-power to select one of eight logical conditions. We cover registers in
systems was provided by the growth of the laptop computer the next chapter—all we need know at this points that a regis-
market. ter is a storage unit that holds the value of 1 or more bits.
We now look at the characteristics of some of the simple The flag register in Fig. 2.90 stores the value of up to eight
digital circuits that are still widely available—even though so-called flags or marker bits. When a computer performs an
VLSI systems dominate the digital world, designers often operation (such as addition or subtraction) it sets a zero flag
have to use simple gates to interface these complex chips to if the result was zero, a negative flag if the result was negative,
each other. and so on. These flags define the state of the computer. In
Fig. 2.90 the eight flag bits are connected to the eight inputs of
the multiplexer. The 3-bit code on S0 to S2 determines which
2.6.1 The multiplexer flag bit is routed to the multiplexer’s Y output. This code
A particularly common function arising regularly in digital might be derived from the instruction that the computer is
design is the multiplexer, which we met earlier in this chapter. currently executing. That is, the bits of the instruction can be
Figure 2.88 shows the 74157, a quad two-input multiplexer, used to select a particular flag (via the multiplexer) and the
which is available in a 16-pin MSI circuit. The prefix quad state of this flag bit used to determine what happens next.
simply means that there are four multiplexers in one package. Suppose a computer instruction has the form IF x 0
Each of the four Y outputs is connected to the correspond- THEN do something. The computer compares x with 0, which
ing A input pin when SELECT 0 and to the B input when sets the zero flag if x is equal to zero. The bits that encode this
SELECT 1. The multiplexer’s STROBE input forces all Y instruction provide the code on S0 to S2 that routes the Z flag to
outputs into logical 0 states whenever STROBE 1. We have the Y output. Finally, the computer uses the value of the Y output
already described one use of the multiplexer when we looked to ‘do something’ or not to ‘do something’. Later we shall see how
at some simple circuits. alternative courses of action are implemented by a computer.
Figure 2.88 The 74157 quad two-input multiplexer. Figure 2.89 The 1-of-8 multiplexer.
2.6 Special-purpost logic elements 85
Inputs Outputs
A B C Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7
0 0 0 1 0 0 0 0 0 0 0
0 0 1 0 1 0 0 0 0 0 0
0 1 0 0 0 1 0 0 0 0 0
0 1 1 0 0 0 1 0 0 0 0
1 0 0 0 0 0 0 1 0 0 0
1 0 1 0 0 0 0 0 1 0 0
1 1 0 0 0 0 0 0 0 1 0
1 1 1 0 0 0 0 0 0 0 1
Instruction register
The 3-bit op-code 74138 with the code 101 (representing CBA), output Y5 will
Op-code
in the instruction register be asserted low.
is decoded in to one of
eight actions.
By ORing together the appropriate minterms we can gen-
erate an arbitrary sum of products expression in n variables.
Decoded instruction In other words, any function can be implemented by a
demultiplexer and OR gate.
Add Figure 2.94 demonstrates how a 3-line to 8-line decoder
Subtract can be used to implement a full-adder that adds three bits
to generate a sum and a carry. Chapter 4 discusses binary
Load
arithmetic and adders—all we need say here is that the sum of
Store bits A, B, and Cin is given by the Boolean expression Cin ⋅ A ⋅ B
Cin ⋅ A ⋅ B Cin ⋅ A ⋅ B Cin ⋅ A ⋅ B and the carry by Cin ⋅ B ⋅ A
Branch on zero
Cin ⋅ B ⋅ A Cin ⋅ A ⋅ B Cin ⋅ A ⋅ B.
Branch on not zero Note that the outputs of the 74LS138 are active-low and
therefore it is necessary to employ a NAND gate to generate
Branch unconditonally
the required sum-of-products expression.
Stop Another application of the demultiplexer is in decoding
binary characters. Consider the ISO/ASCII character code (to
be described in Chapter 4) which represents the alpha-
Figure 2.92 Application of a demultiplexer as an instruction
decoder. numeric characters (A–Z, 0–9, and symbols such as !, @, #, $,
% . . .) together with certain non-printing symbols such
74138 3-line to 8-line as the back space and carriage return. The ASCII codes for
demultiplexer some of these non-printing control codes are given in
Table 2.27.
Y0 Suppose we receive an ASCII code from a keyboard
A Y1 and wish to decode its function in hardware. First note
Control that all the codes of interest start with 00001. We can use
B Select Y2
inputs the most-significant five bits to enable a 74LS138 3-line
C Y3 to 8-line decoder and then decode the three least-significant
Eight active-low
Y 4 outputs bits of the word 00001d2d1d0 to distinguish between the
0 E1 Y5 control codes. Figure 2.95 demonstrates how this is
Enable achieved. Each output from the decoder can be fed to a cir-
0 E2 Y6
inputs cuit to perform the appropriate action (e.g. carriage
1 E3 Y7 return).
Medium-scale logic devices like the 74138 make it easy to
design circuits with just a handful of chips. However, many
Figure 2.93 The 74138 3-line to 8-line decoder.
2.7 Tri-state logic 87
74138 3-line to
8-line C inBA
D 0demultiplexer Y0
A A
D1 Y1 C inBA
Control C inBA
B B
D2 Y2
inputs
C in CD 3 Y3 C inBA
D Y4 C inBA
4
0 E1 Y5 C inBA
D5
0 E2 Y6 C inBA
1 E3 Y7 C inBA
2.7.1 Buses the bus to any other device. Buses may be unidirectional (i.e.
data always flows the same way) or bidirectional (i.e. data can
A computer is like a city. Just as roads link homes, shops, and flow in two directions—but not simultaneously).
factories, buses link processing units, storage devices, and A bus is normally represented diagrammatically by a single
interfaces. Figure 2.96 shows a digital system composed of thick line or a wide shaded line as in Fig. 2.96. Real buses are
five functional units, A, B, C, D, and E. These units are linked composed of several individual wires (i.e. electrical connec-
together by means of two data highways (or buses), P and Q, tions). Modern computer buses have 100 or more lines,
permitting data to be moved from one unit to another. Data because a bus has to carry data, addresses, control signals, and
can flow onto the bus from a device connected to it and off even the power supply. Indeed the nature of a bus can be an
important factor in the choice of a computer (consider the
PC with its USB, and PCI buses).
Figure 2.97 demonstrates how a bus is arranged. Logical
C units A and B are connected to an m-bit data bus and can
A transmit data to the bus or receive data from it. We are not
concerned with the nature of the processes A and B here, but
Q bus
simply wish to show how they communicate with each other
P bus via the bus. For clarity, the connections to only one line of the
bus are shown. Similar arrangements exist for bits d1 to dm1.
D
Suppose unit A wishes to send data to unit B. The system in
B unit A puts data on the bus via gate Aout and B receives the
data from the bus via gate Bin. These two gates look like
inverters but they aren’t because they don’t have bubbles at
E their output. Such a gate is called a buffer and it just copies the
Logical unit signal at its input terminal to its output terminal (i.e. the gate
or storage device
doesn’t change the state of the data passing through it). We
will soon see why such a gate is needed.
Such an arrangement is, in fact, unworkable and a glance at
Figure 2.96 Functional units and buses. Fig. 2.98 shows why. In Fig. 2.98(a) the outputs of two AND
gates are connected together. Figure 2.98(b) shows the same
circuit as Fig. 2.98(a) except that we’ve
included the internal organization of
the two gates. Essentially, a gate’s out-
put circuit consists of two electronic
A in switches that can connect the output to
the 5 V power supply or to the 0 V (i.e.
A ground) power supply. These switches
Aout are transistors that are either conduct-
ing or non-conducting. Because only
one switch is closed at a time, the out-
Data path between put of a gate is always connected either
functional unit and to 5 V or to ground.
system bus (only one
bit shown) In Figure 2.98(b) the output from
m-bit data bus gate G1 is in a logical 1 state and is
pulled up towards 5 V by a switch
inside the gate. Similarly, the output
B in
from G2 is a logical 0 state and is pulled
B down towards 0 V. Because the two out-
B out puts are wired together and yet their
states differ, two problems exist. The
d 0 d 1 d m–1 first is philosophical. The logical level at
all points along a conductor is constant,
because the voltage along the conduc-
Figure 2.97 Connecting systems to the bus. tor is constant. Because the two ends of
2.7 Tri-state logic 89
G1
Switch Switch
closed open
Output = 5 V
Bus
G2
Output = 0 V
Switch Switch
open closed
0 V 0 V
(a) Logical arrangement (b) Physical arrangement Figure 2.98 Connecting two
Two outputs connected together Two outputs connected together outputs together.
+5 V +5 V +5 V
0 V 0 V 0 V
(a) Lower switch closed. (b) Upper switch closed. (c) Both switches open. Figure 2.99 The operation of
Output connected to ground Output connected to +5V Output disconnected the tri-state output.
the bus in Fig. 2.98(b) are connected to different voltages, the a bus, but only one of them may be actively connected to the
logical level on the conductor is undefined and breaks one of bus internally. We shouldn’t speak of tri-state logic or tri-
the rules of Boolean algebra. We have stated that in a Boolean state gates, we should speak of (conventional) gates with
system there is no such thing as a valid indeterminate state tri-state outputs.
lying between a logical 1 and a logical 0. Secondly, and more Figure 2.99 illustrates the operation of a gate with a tri-state
practically, a direct physical path exists between the 5 V enable output. In fact, any type of gate can have a tri-state
power supply and ground (0 V). This path represents is a output. All tri-state gates have a special ENABLE input. When
short circuit and the current flowing through the two output ENABLE 1. the gate behaves normally and its output is
circuits could even destroy the gates. either a logical 1 or a logical 0 depending on its input
The tri-state gate lets you connect outputs together. Tri- (Fig. 2.99(a) shows a 0 state and Fig. 2.99(b) a 1 state).
state logic is not, as its name might suggest, an extension of When ENABLE 0, both switches in the output circuit of
Boolean algebra into ternary or three-valued logic. It is a the gate are open and the output is physically disconnected
method of resolving the conflict that arises when two outputs from the gate’s internal circuitry (Fig. 2.99(c)). If I were to ask
are connected as in Fig. 2.98. Tri-state logic disconnects from what state the output is in when ENABLE 0, the answer
the bus all those gates not actively engaged in transmitting should be that the question is meaningless. In fact, because
data. In other words, a lot of tri-state outputs may be wired to the output of an un-enabled tri-state gate is normally
90 Chapter 2 Gates, circuits, and combinational logic
connected to a bus, the logic level at the output terminal is the data, it enables its input buffer by asserting one of EAi, EBi, or
same as that on the bus to which it is connected. For this rea- ECi, as appropriate. For example, if network C wishes to trans-
son, the output of a tri-state gate in its third state is said to be mit data to network A, all that is necessary is for ECO and EAI
floating. It floats up and down with the bus traffic. to be set to a logical 1 simultaneously. All other enable signals
Most practical tri-state gates do, in fact, have active-low remain in a logical 0 state for the duration of the information
enable inputs rather than active-high enable inputs. transfer.
Figure 2.100 provides the circuit symbols for four tri-state Input buffers (Ai, Bi, Ci) are not always necessary. If the
buffers, two of which are inverting buffers (i.e., NOT gates) data flowing from the bus into a network goes only into the
and two of which are non-inverting buffers. Two of these input of one or more gates, a buffer is not needed. If however,
gates have active-low enable inputs and two have active-high the input data is placed on an internal bus (local to the net-
enable inputs. The truth table of an inverter with a tri-state work) on which other gates may put their output, the buffer
output is given in Table 2.28. is necessary to avoid conflict between the various other out-
Figure 2.101 demonstrates how tri-state buffers imple- puts that may drive the local bus.
ment a bused structure. The buffers connect or disconnect The bus in Fig. 2.101 is bidirectional; that is, data can flow
the three networks A, B, and C, to the bus. The outputs of net- onto the bus or off the bus. The pairs of buffers are arranged
works A, B, and C are placed on the bus by three tri-state back to back (e.g. Ai and Ao) so that one buffer reads data
buffers Ao, Bo, and Co, which are enabled by signals EAo, EBo, from the bus and the other puts data on the bus—but not at
and ECo, respectively. If any network wishes to put data on to the same time.
the bus it sets its enable signal (e.g. EBo) to a 1. It is vital that In the description of the bused system in Fig. 2.101 the
no more than one of EAo, EBo, and ECo be at a 1 level at any names of the gates and their control signals have been care-
instant. fully chosen. Ao stands for Aout, and Ai for Ain. This labels the
Each of the networks receives data from the bus via its own gate and the direction in which it transfers data with respect to
input buffers (Ai, Bi, and Ci). If a network wishes to receive the network it is serving. Similarly, EAo stands for enable gate
A out, and EAi for enable gate A in. By
choosing consistent and meaningful
P P P P names, the reading of circuit diagrams
and their associated text is made easier.
Further details of a bused system will
be elaborated on in Chapter 3, and
E E Chapter 7 on the structure of the CPU
(a) Non-inverting buffer. (b) Inverting buffer. makes extensive use of buses in its
Active-high enable Active-high enable description of how the CPU actually
carries out basic computer operations.
P X P P Digital Works supports tri-state
buffers. The device palette provides a
simple non-inverting tristate buffer with
an active-high enable input. Figure 2.102
E E shows a system with a single bus to
(c) Non-inverting buffer. (d) Inverting buffer. which three tri-state buffers are con-
Active-low enable Active-low enable nected. One end of the bus is connected
to an LED to show the state of the bus.
Figure 2.100 Logic symbol for the tri-state buffer. Digital Works requires you to con-
nect a wire between two points so we’ve
added a macro tag to the bottom of the
ENABLE Input Output bus to provide an anchor point (we don’t use the macro tag
for its normal purpose in this example).
0 0 X Output floating
The input of each tri-state gate in Fig. 2.102 is connected to
0 1 X Output floating
the interactive input tool that can be set to a 0 or a 1 by the
1 0 0 Output same as input hand tool. Similarly, the enable input of each gate is con-
1 1 1 Output same as input nected to an interactive input tool.
By clicking on the run icon and then using the hand tool to
Table 2.28 Truth table for the non-inverting tri-state set the input and enable switches, we can investigate the oper-
buffer with an active-high enable input. ation of the tristate buffer. In Fig. 2.102 inputs 1 and 3 are set
2.8 Programmable logic 91
E Ai
A
Ai
Ao
E Ao
E Bi
B
Bi
Bo
E Bo
E Ci
C Ci
Co
d0 d1 dm–1
E Co
Figure 2.101 Interconnecting logic elements with a bus and tri-state buffers.
to 1 and only buffer 3 is enabled. Consequently, the output of scale integration by the major semiconductor manufacturers
buffer 3 is placed on the bus and the bus LED is illuminated. generated a range of basic building blocks from multiplexers
We have stated that you shouldn’t enable two or more of to digital multiplier circuits and allowed the economic design
the tri-state gates at the same time. If you did, that would cre- of more complex systems. We now introduce the next step
ate bus contention as two devices attempted to put data on in the history of digital systems—programmable logic that can
the bus simultaneously. In Fig. 2.103 we have done just that be configured by the user.
and used the hand tool to enable buffer 2 as well as buffer 3.
As you can see, the simulation has stopped (the run button is
in the off state) and an error message has been generated at 2.8.1 The read-only memory as a logic
the buffer we’ve attempted to enable. element
Semiconductor manufactures find it easier to design regular
circuits with repeated circuit elements than special-purpose
2.8 Programmable logic highly complex systems. A typical regular circuit is the read
only memory or ROM. We deal with memory in a later
In this short section we introduce some of the single-chip chapter. All we need say here is that a ROM is a device with n
programmable logic elements that can be configured by the address input lines specifying 2n unique locations within it.
user to perform any function they require. In the earlier days Each location, when accessed, produces an m-bit value on its
of logic design, systems were constructed with lots of basic m output lines. It is called read only because the output corre-
logic elements; for example, the two-input OR gate, the five- sponding to a given input cannot be modified (i.e. written
input NAND gate, and so on. The introduction of medium into) by the user. A ROM is specified by its number of locations
Icon for non-inverting
tri-state buffer.
Tri-state buffer.
x width of each location; for example, a 16 4 ROM has than ROMs containing a regular structure of AND and OR
16 locations each containing 4 bits. gates that can be interconnected by the user to generate the
An alternative approach to the design of digital systems required logical function.
with basic gates or MSI elements is to use ROMs to imple- Figure 2.105 provides a simplified picture of how pro-
ment the required function as a look-up table. Figure 2.104 grammable logic devices operate. The three inputs on the
shows how a 16 4 ROM implements the 4-bit multiplier left-hand side of the diagram are connected to six vertical
we designed earlier in this chapter using AND, OR, and NOT lines (three lines for the inputs and three for their comple-
gates. The binary code, X1, X0, Y1, Y0, at the four address ments). On the right of the diagram are three two-input AND
inputs selects one of the 16 possible locations, each contain- gates whose inputs run horizontally. The key to programma-
ing a 4-bit word corresponding to the desired result. The ble logic is the programmable link between each horizontal
manufacturer or user of the ROM writes the appropriate out- and vertical conductor.
put into each of these 16 locations; for example, the location Fusible links between gates are broken by passing a suffi-
1011, corresponding to 10 11 (i.e. 2 3), has 0110 (i.e. 6) ciently large current through the link to melt it. By leaving a
written into it. link intact or by blowing it, the outputs of the AND gates can
The ROM directly implements not the circuit but the truth be determined by the designer. Modern programmable logic
table. The value of the output is stored for each of the possible devices have electrically programmed links that can be made
inputs. The ROM look-up table doesn’t even require Boolean and un-made many times.
algebra to simplify the sum-of-products expression derived A real programmable device has many more inputs vari-
from the truth table. Not only does a ROM look-up table save ables than in Fig. 2.105 and the AND gates can have an input
a large number of logic elements, but the ROMs themselves for each of the variables and their complements. The digital
can be readily replaced to permit the logic functions to be designer selects the appropriate programmable device from a
modified (to correct errors or to add improved facilities). manufacturer’s catalogue and adapts the Boolean equations
Unfortunately, the ROM look-up table is limited to about 20 to fit the type of gates on the chip. The engineer then plugs
inputs and eight outputs (i.e. 220 8 8 Mbits). The ROM the chip into a special programming machine that intercon-
can be programmed during its manufacture or a PROM (pro- nects the gates in the desired way.
grammable ROM) can be programmed by means of a special Programmable logic elements enable complex systems to be
device. designed and implemented without requiring large numbers
of chips. Without the present generation of programmable
logic elements, many of the low-cost microcomputers would
2.8.2 Programmable logic families be much more bulky, consume more power, and cost consid-
Because ROM requires a very large number of bits to imple- erably more.
ment moderately complex digital circuits, semiconductor Today’s designers have several types of programmable logic
manufacturers have created much simpler logic elements element at their disposal; for example, the PAL (programmable
Z0
Z 1 4-bit product
Data output Z2 Z Figure 2.104 Using a ROM to
Z3 implement a multiplier.
94 Chapter 2 Gates, circuits, and combinational logic
0 1 2 3 4 5
Input 1
0
Output 1
1
Input 2
2
Output 2
3
Input 3
4
Output 3
5
array logic), the PLA (programmable logic array), and the the more complex programmed logic array. The PLA has
PROM (programmable read-only memory). The PROM and both programmable AND and OR arrays, whereas the PAL
the PAL are special cases of the PLA. The difference between has a programmable AND array but a fixed OR array. In
the various types of programmable logic element depends short, the PAL is an AND gate array whose outputs are ORed
on whether one or both of the AND and OR arrays are together in a way determined by the device’s programming.
programmable. Consider a hypothetical PAL with three inputs x0 to x2 and
three outputs y0 to y2. Assume that inputs x0 to x2, generate six
Programmable Logic Array product terms P0 to P5. These product terms are, of course,
The programmable logic array (PLA) was one of the first field user programmable and may include an input variable in a
programmable logic elements to become widely available. It true, complement, or don’t care form. In other words, you
has an AND–OR gate structure with a programmable array of can generate any six product terms you want.
AND gates whose inputs may be variables, their comple- The six product terms are applied to three two-input OR
ments, or don’t care states. The OR gates are also program- gates to generate the outputs y0 to y2 (Fig. 2.107). Each output
mable, which means that you can define each output as the is the logical OR of two product terms. Thus, y0 P0 P1,
sum of any of the product terms. A typical PLA has 48 AND y1 P2 P3, and y2 P4 P5. We have chosen to OR three
gates (i.e. 48 product terms) for 16 input variables, compared pairs of products. We could have chosen three triplets so that
with the 65 536 required by a 16-input PROM. Figure 2.106 y0 P1 P2 P3, y1 P4 P5 P6, etc. In other words,
provides a simple example of a PLA that has been pro- the way in which the product terms are ORed together is a
grammed to generate three outputs (no real PLA is this function of the device and is not programmable by the user.
simple). Because the PLA has a programmable address decoder
implemented by the AND gates, you can create product terms
containing between one and n variables.
2.8.3 Modern programmable logic
Over the years, logic systems have evolved. Once the designer
Programmable array logic was stuck with basic gates and MSI building blocks. The
A more recent programmable logic element is the program- 1980s were the era of the programmable logic element with
mable array logic (PAL), which is not to be confused with the PROMs, PALs, PLAs, and so on. Today’s programmable logic
PLA we discussed earlier. The PAL falls between the simple elements are constructed on a much grander scale. Typical
gate array that contains only programmable AND gates and programmable logic devices extend the principles of the PLA
2.8 Programmable logic 95
Inputs
X0 X0
X0
X1 X1
X1
X2 X2
X2
These OR gates combine
product terms to generate
user-programmable sum
terms
Y0
The inputs are
used to generate
user-programmable
product terms Y1
Y2
Outputs
P0 P1 P2 P3
Inputs
X0 X0
X0
X1 X1
X1
X2 X2
X2
Each input used to
generate a product
term can be open
or closed
Programmable array
Y0
Y2
Fixed array
Y3
Outputs
P0 P1 P2 P3 P4 P5
These OR gates are fixed; that is, Figure 2.107 Structure of
you cannot program them the PAL.
and employ macro cells that implement more complex build- which can be programmed, erased, and reprogrammed.
ing blocks containing storage elements as well as AND, OR, Reprogrammable logic elements represent a considerable
and EOR gates. saving at the design stage. Moreover, they can be used to con-
A more recent innovation in programmable logic is struct systems that can be reconfigured by downloading data
the electrically programmable and erasable logic element from disk.
96 Chapter 2 Gates, circuits, and combinational logic
(e.g. a microprocessor chip) would take longer to test than number of defects. In other words, if we test a system by
the anticipated life of the entire universe. A way out of this considering all possible stuck-at-1 and stuck-at-0 faults, we
dilemma is to perform a test that provides a reasonable level are likely to detect almost all of the probable defects.
of confidence in its ability to detect a large fraction of possible
faults without requiring an excessive amount of time. The sensitive path test
The first step in devising such a test is to distinguish A sensitive path between an input and an output is con-
between the idea of a defect and a fault. A real system fails structed to make the output a function of the input being
because of a defect in its manufacture. For example, a digital tested (i.e. the output is sensitive to a change in the input).
system may fail because of a defect at the component level (a Figure 2.108(a) illustrates a circuit with three gates and six
crystal defect in a silicon chip), or at the system level (a solder inputs A, B, C, F, I and J. The sensitive path to be tested is
splash joining together two adjacent tracks on a printed cir- between input A and output K.
cuit board). The observed failure is termed a fault. Figure 2.108(b) demonstrates how we have chosen the sen-
Although there are an infinite number of possible defects sitive path by ensuring that a change in input A is propagated
that might cause a system to fail, their effects (i.e. faults) are through the circuit. By setting AND gate 1’s B and C inputs
relatively few. In simpler terms, an automobile may suffer high, input A is propagated through this gate to the E input of
from many defects, but many of these defects result in a AND gate 2. The second input of AND gate 2, F, must be set
single observable fault—the car doesn’t move. That is, a fault high to propagate E through gate 2. Output G of AND gate 2
is the observable effect due to a defect. A digital system can be is connected to input H of the three-input OR gate 3. In this
described in terms of a fault model (i.e. the list of observable case, inputs I and J must be set low to propagate input H
effects of defects). Typical faults are given below. (i.e. A) through OR gate 3.
Stuck-at-one The input or output of a circuit remains in a By setting inputs B, C, F, I, and J to 1, 1, 1, 0, and 0, the out-
logical 1 state independently of all other circuit conditions. put becomes K A and, therefore, by setting A to 0 and then
This is usually written s_a_1. to 1, we can test the sensitive path between A and K and deter-
Stuck-at-zero In this case the input or output is permanently mine whether any A_stuck_at fault exists.
stuck in a 0 state (i.e. s_a_0). A fault-list can be prepared for the circuit, which, in this
Bridging faults Two inputs or outputs of a circuit are effect- case, might consist of A s_a_0, A s_a_1, B s_a_0, B s_a_1, . . . .
ively connected together and cannot assume independent A convenient notation for the fault list is A/0, A/1, B/0,
logic levels. That is, they must both be 0s or 1. B/1, . . . etc. The ‘/’ is read as ‘stuck at’.
To test for A s_a_0 (i.e. A/0), the other inputs are set to the
It is possible to devise a longer list of fault models, but the values necessary to create a sensitive path and A is switched
stuck-at fault model is able to detect a surprisingly large from 0 to 1. If the output changes state, A is not stuck at zero.
The same test also detects A/1.
Fault tests are designed by engineers
A
B 1
D E (possibly using CAD techniques) and can be
2 G H
C implemented either manually or by means
3 K of computer-controlled automatic test equip-
F ment (ATE). This equipment sets up the
appropriate input signals and tests the out-
I
put against the expected value. We can spec-
J ify the sensitive path for A in the circuit of
(a) A simple three-gate digital circuit. Fig. 2.108(b) as B ⋅ C ⋅ F ⋅ I ⋅ J.
It’s not always possible to test digital cir-
A D E cuits by this sensitive path analysis because
B 1 G H
2 of the topological properties of some digital
C
I 3 K circuits. For example, a digital signal may
J
take more than one route through a circuit
Set high to propagate
Set low to
and certain faults may lead to a situation in
A through gate 1 Set high to
propagate E propagate H which an error is cancelled at a particular
through gate 2 through gate 3 node. Similarly, it’s possible to construct
(b) Establishing a sensitive path between input A and output K. logic circuits that have an undetectable
fault. Figure 2.109 provides an example of
Figure 2.108 Using sensitive path analysis to test digital circuits. such a circuit. This type of undetectable
98 Chapter 2 Gates, circuits, and combinational logic
A 0
1 0
D
1 1
Node to tested
1
B G
E 3 5 H
0 1 0
F 0
1
F 0
C 1
In order to establish a sensitive path for internal node D
to external node H, it is necessary to set inputs G and F to
Contradiction OR gate 5 low. G is set low by setting inputs B and
E to NAND gate 3 high. Input E is derived from NOT gate
2 and is set high by setting input A low. Similarly, output F
of NAND gate 4 is set low by setting inputs A and C to
gate 4 high. Unfortunately, in order to set G and F low
requires that input A be both 0 and 1 simultaneously. Figure 2.109 Circuit with an
This condition is a contradiction and therefore node D undetectable fault.
■ SUMMARY
In this chapter we have looked at the basic set of logic elements S
R U
used to create any digital system—the AND, OR, and NOT
gates. We have demonstrated how simple functions can be
generated from gates by first converting a problem in words into
a truth table and then using either graphical or algebraic
methods to convert the truth table into a logical expression and T
finally into a circuit made up of gates. At the end of this chapter
we briefly mentioned the new families of programmable logic
elements and their design tools that have revolutionized the Figure 2.110 Circuit for Question 2.2.
creation of today’s complex digital systems.
We have introduced Digital Works, a design tool that enables
you to create digital circuits and to observe their behavior. We
also introduced the tri-state buffer, a device that enables you to (c) Minterm
connect logic subsystems to each other via a common data (d) Truth table
highway called a bus. (e) Literal
In the next chapter we look at sequential circuits built from (f) Constant
flip-flops. As the term sequential suggests, these circuits involve (g) Variable
the time factor, because the logical state of a sequential device is
2.2 Tabulate the values of the variables P, Q, R, S, T, and U in the
determined by its current inputs and its past history (or
circuit of Fig. 2.110 for all possible input variables A, B, C, and D.
behavior). Sequential circuits form the basis of counters and data
The truth table for this question should be expressed in the form
storage devices. Once we have covered sequential circuits, we
of Table 2.29.
will have covered all the basic building blocks necessary to design
a digital system of any complexity (e.g. the digital computer). 2.3 For the circuit of Fig. 2.110 in Question 2.2 obtain a
Boolean expression for the output U, in terms of the inputs A, B,
C, and D. You should obtain an expression for the output U by
■ PROBLEMS considering the logic function of each gate.
2.1 Explain the meaning of the following terms. 2.4 For the truth table in Question 2.2 (Table 2.29) obtain a
(a) Sum-of-products sum-of-minterms expression for U and use Boolean algebra to
(b) Product of sums obtain a simplified sum-of-products expression for U.
2.8 Programmable Logic 99
Inputs Output
– –
A B C D P B C Q P⋅A RCD S B⋅R T B⋅D UQST
0 0 0 0 1 0 1 0 0 0
0 0 0 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
0 0 1 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
0 0 1 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
0 1 0 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
0 1 0 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅
1 1 1 1 1 1 1 1 1 1
2.12 Convert the following expressions to sum-of-products form. output is given by the input plus 1; for example, if C, B, A 0, 1,
(a) (A B) (B C) (A C) 1, the output R, Q, P is 1, 0, 0. Note that 111 1 000 (i.e.
there is no carry out). Design a circuit to implement this system.
(b) (C D) (A ⋅ B A ⋅ C) (A ⋅ C B)
(c) (A B C) (A C ⋅ D) (D F) 2.20 A 4-bit binary number is applied to a circuit on four lines
D, C, B, and A. The circuit has a single output, F, which is true if
2.13 Simplify the number is in the range 3 to 12, inclusive. Draw a truth table
(a) A 䊝 B 䊝 C for this problem and obtain a simplified expression for F in
(b) A·B(C䊝A) terms of the inputs. Implement the circuit
(a) in terms of NAND gates only
2.14 Convert the following expressions to product-of-sums form. (b) in terms of NOR gates only
(a) A ⋅ B A ⋅ B B ⋅ C
2.21 A circuit has four inputs D, C, B, and A encoded in 8421
(b) A ⋅ B B ⋅ C B ⋅ C ⋅ D natural binary form. The inputs in the range 00002 0 to
(c) A ⋅ B A ⋅ C B ⋅ C 10112 11 represent the months of the year from January (0)
(d) A ⋅ B ⋅ C A ⋅ B ⋅ C A ⋅ B ⋅ C A ⋅ B ⋅ C to December (11). Inputs in the range 1100 to 1111 (i.e. 12 to
15) cannot occur. The output of the circuit is a logical one if the
2.15 A circuit has four inputs, P, Q, R, and S, representing the month represented by the input has 31 days. Otherwise the
natural binary numbers 0000 0, to 1111 15. P is the output is false. The output for inputs in the range 1100 to 1111
most-significant bit. The circuit has one output, X, which is true is undefined.
if the number represented by the input is divisible by three
(a) Draw a truth table to represent the problem and use it to
(regard zero as being indivisible by three.) Design a truth table
construct a Karnaugh map.
for this circuit and hence obtain an expression for X in terms
(b) Use the Karnaugh map to obtain a simplified
of P, Q, R, and S. Give the circuit diagram of an arrangement of
expression for the function.
AND, OR, and NOT gates to implement this circuit. Design a
(c) Construct a circuit to implement the function using AND,
second circuit to implement this function using NAND
OR, and NOT gates.
gates only.
(d) Construct a circuit to implement the function using NAND
2.16 A device accepts natural binary numbers in the range gates only.
0000 to 1111 which represent 0 to 15. The output of the circuit 2.22 A multiplexer has eight inputs Y0 to Y7 and a single output
is true if the input to the circuit represents a prime number and Z. A further three inputs A, B, and C (A least-significant bit)
is false otherwise. Design a circuit using AND, OR, and NOT determine which output the single input X is connected to. For
gates to carry out this function. A prime number is an integer example, if A, B, C 110, the output Y6 X and all other
that is greater than 1 and is divisible only by itself and 1. Zero outputs are low. Design a circuit to implement this function.
and 1 are not prime numbers.
2.23 What is tri-state logic and why is it used in digital
2.17 Demonstrate how you would use a 4-line to 16-line systems?
demultiplexer to implement the system in Question 2.16.
2.24 Use Digital Works to construct a circuit that realizes the
2.18 A logic circuit accepts a natural binary number DCBA in expression
the range 0 to 15 (the least-significant bit is bit A). The output is
A⋅B⋅C A⋅B⋅C A⋅B⋅C A⋅B⋅C
the square of the input; for example, if DCBA 01012 510, the
output is 000101012 2510. Design a circuit to implement this Simplify the above expression and use Digital Works to
function. construct a new circuit. Demonstrate that the two circuits are
equivalent (by comparing their outputs for all inputs).
2.19 A logic circuit has three inputs C, B, and A, where A is the
least-significant bit. The circuit has three outputs R, Q, and P. For 2.25 Use Digital Works to construct the system of
any binary code applied to the input terminals (A, B, and C) the Question 2.20 and demonstrate that your system works.
Sequential Logic 3
CHAPTER MAP
2 Logic elements and 3 Sequential logic 4 Computer arithmetic 5 The instruction set
Boolean algebra The output of a sequential circuit Computer arithmetic concerns architecture
We begin our introduction to the is a function of both its current the representation of numbers in In this chapter we introduce the
computer with the basic building inputs and its past inputs; that is, a computer and the arithmetic computer’s instruction set
block from which we construct a sequential circuit has memory. used by digital computers. We architecture (ISA), which
all computers, the gate. The building blocks used to look at how decimal numbers are describes the low-level
A combinational digital circuit construct devices that store data converted into binary form and programmer’s view of the
such as an adder is composed are called flip-flops. In this the properties of binary numbers computer. The ISA describe the
of gates and its output is a chapter we look at basic and we demonstrate how type of operations a computer
Boolean (logical) function of sequential elements and the operations like addition and carries out. We are interested in
its inputs only. counters, registers, and shifters subtraction are carried out. We three aspects of the ISA: the
that are constructed from also look at how computers deal nature of the instructions, the
flip-flops. with negative numbers and resources used by the
fractional numbers. instructions (registers and
memory), and the ways in which
the instructions access data
(addressing modes). The 68K
microprocessor is used to
illustrate the operation of a real
device.
INTRODUCTION
We now introduce a new type of circuit that is constructed from devices that remember their
previous inputs. The logic circuits in Chapter 2 were all built with combinational elements whose
outputs are functions of their inputs only. Given a knowledge of a combinational circuit’s inputs
and its Boolean function, we can always calculate the state of its outputs. The output of a
sequential circuit depends not only on its current inputs, but also on its previous inputs. Even if
we know a sequential circuit’s Boolean equations, we can’t determine its output state without
knowing its past history (i.e. its previous internal states). The basic building blocks of sequential
circuits are the flip-flop, bistable, and latch just as the basic building block of the combinational
circuit is the gate.
It’s not our intention to deal with sequential circuits at anything other than an introductory
level, as their full treatment forms an entire branch of digital engineering. Sequential circuits can’t
be omitted from introductory texts on computer hardware because they are needed to implement
registers, counters, and shifters, all of which are fundamental to the operation of the central
processing unit.
Figure 3.1 describes the conceptual organization of a sequential circuit. An input is applied
to a combinational circuit using AND, OR, and NOT gates to generate an output that is fed to
a memory circuit that holds the value of the output. The information held in this memory is
called the internal state of the circuit. The sequential circuit uses its previous output together
with its current input to generate the next output. This statement contains a very important
implicit concept, the idea of a next state. Sequential circuits have a clock input, which triggers
the transition from the current state to the next state. The counter is a good example of a
sequential machine because it stores the current count that is updated to become the next
count. We ourselves are state machines because our future behavior depends on our past
102 Chapter 3 Sequential logic
Input Output
Combinational logic
Memory
inputs—if you burn yourself getting something out of the oven, you approach the oven with
more care next time.
We begin our discussion of sequential circuits with the bistable or flip-flop. A bistable is so called
because its output can remain in one of two stable states indefinitely, even if the input changes.
For a particular input, the bistable’s output may be high or low, the actual value depending on the
3.1 The RS flip-flop 103
previous inputs. Such a circuit remembers what has happened to it in the past and is therefore
a form of memory element. A more detailed discussion of memory elements is given in
Chapter 8. A bistable is the smallest possible memory cell and stores only a single bit of
information. The term flip-flop, which is synonymous with bistable, gives the impression of the
circuit going flip into one state and then flop into its complement. Bistables were constructed
from electromagnetic relays that really did make a flip-flop sound as they jumped from one
state into another.
The term latch is also used to describe certain types of flip-flop. A latch is a flip-flop that is
unclocked (i.e. its operation isn’t synchronized with a timing signal called a clock). The RS
flip-flop that we describe first can also be called a latch.
Sequential systems can be divided into two classes: synchronous and asynchronous.
Synchronous systems use a master clock to update the state of all flip-flops periodically.
The speed of a synchronous system is determined by its slowest device and all signals must
have settled to steady-state values by the time the system is clocked. In an asynchronous
system a change in an input signal triggers a change in another circuit and this change ripples
through the system (an asynchronous system is rather like a line of closely spaced dominoes
on edge—when one falls it knocks its neighbor over and so on). Reliable asynchronous systems
are harder to design than synchronous systems, although they are faster and consume less
power. We will return to some of these topics later.
We can approach flip-flops in two ways. One is to demonstrate what they do by defining
their characteristics as an abstract model and then show how they are constructed. That is, we
say this is a flip-flop and this is how it behaves—now let’s see what it can do. The other way
of approaching flip-flops is to demonstrate how they can be implemented with just two gates
and then show how their special properties are put to work. We intend to follow the latter
path. Some readers may prefer to skip ahead to the summary of flip-flops at the end of this
section and then return when they have a global picture of the flip-flop.
the NOR gates we can readily write down expressions for out-
3.1 The RS flip-flop puts X and Y in terms of inputs A and B.
We begin our discussion of the flip-flop with the simplest 1. X A Y
member of the family, the RS flip-flop. Consider the circuit of 2. Y B X
Fig. 3.2. What differentiates this circuit from the combina-
If we substitute the value for Y from equation (2) in equation
tional circuits of Chapter 2 is that the gates are cross-coupled
(1), we get
and the output of a gate is fed back to its input. Although
Fig. 3.2 uses no more than two two-input NOR gates, its 3. X A B X
operation is not immediately apparent. A ·B X By de Morgan’s theorem
The circuit has two inputs, A and B, and two outputs, X
A ·(B X) Two negations cancel
and Y. A truth table for the NOR gate is provided alongside
Fig. 3.2 for reference. From the Boolean equations governing A ·B A ·X Expand the expression
A
Gate X A B AB
G1
0 0 1
0 1 0
1 0 0
Gate 1 1 0
G2 Y Figure 3.2 Two cross-coupled NOR
B gates.
104 Chapter 3 Sequential logic
Because Boolean algebra doesn’t define the operations gate is low if either of its inputs are high. As X is already high,
of division or subtraction we can’t simplify this equation the state of B has no effect on the state of Y.
any further and are left with an expression in which the If now input A goes high while B remains low, the output,
output is a function of the output; that is, the value of X X, of gate G1 must fall to a logical 0 state. The inputs to gate G2
depends on X. Equation (3) is correct but its meaning are now both in logical 0 states and its output Y rises to a
isn’t obvious. We have to look for another way of analyz- logical 1. However, because Y is fed back to the input of gate
ing the behavior of cross-coupled gates. Perhaps a better G1, the output X is maintained at a logical 0 even if A returns
approach to understanding this circuit is to assume a value to a 0 state.
for output X and for the inputs A and B and then see The effect of setting A to a 1 causes output X to flip over
where it leads us. from a 1 to a 0 and to remain in that state when A returns to
a 0. We call an RS flip-flop a latch because of its ability to
capture a signal. Table 3.1 provides a truth table for the circuit
3.1.1 Analyzing a sequential circuit by of Fig. 3.2. Two tables are presented—one appropriate to the
circuit we have described and one with its inputs and outputs
assuming initial conditions relabeled.
Figure 3.3(a) shows the cross-coupled NOR gate circuit with Table 3.1(a) corresponds exactly to the two-NOR gate
the initial condition X 1 and A B 0 and Fig. 3.3(b) circuit of Fig. 3.2 and Table 3.1(b) to the idealized form of this
shows the same circuit redrawn to emphasize the way in circuit that’s called an RS flip-flop. There are two differences
which data flows between the gates. between Tables 3.1(a) and 3.1(b). Table 3.1(b) uses the
Because the inputs to gate G2 are X 1, B 0, its output, conventional labeling of an RS flip-flop with inputs R and S
Y X B, must be 0. The inputs to gate G1 are Y 0 and and an output Q. The other difference is in the entry for the
A 0, so that its output, X, is Y A, which is 1. Note that case in which A B 1 and R S 1. The effect of these
this situation is self-consistent. The output of gate G1 is X 1, differences will be dealt with later.
which is fed back to the input of gate G1 to keep X in a logical We’ve already stated that Fig. 3.2 defines its output in terms
1 state. That is, the output actually maintains itself. It should of itself (i.e. X A ⋅ B A ⋅ X). The truth table gets round
now be a little clearer why equation (3) has X on both sides this problem by creating a new variable, X (or Q), where
(i.e. X A ⋅ B A ⋅ X). X is the new output generated by the old output X and
Had we assumed the initial state of X to be 0 and inputs the current inputs A and B. We can write X A ⋅ B A ⋅ X.
A B 0, we could have proceeded as follows. The inputs The input and output columns of the truth table are now not
to G2 are X 0, B 0 and therefore its output is only separated in space (e.g. input on the left and output on
Y X B 0 0 1. The inputs to G1 are Y 1 and the right) but also in time. The current output X is combined
A 0, and its output is X Y A 1 0 0. Once with inputs A and B to generate a new output X. The value
more we can see that the circuit is self-consistent. The output of X that produced X no longer exists and belongs only to
can remain indefinitely in either a 0 or a 1 state for the inputs the past.
A B 0. Labels R and S in the Table 3.1(b) correspond to reset
The next step in the analysis of the circuit’s behavior is to and set, respectively. The word reset means make 0 (clear has
consider what happens if we change inputs A or B. Assume the same meaning) and set means make 1. The output of all
that the X output is initially in a logical 1 state. If input B to flip-flops is called Q by a historical convention. Examining
gate G2 goes high while input A remains low, the output of the truth table reveals that whenever R 1, the output Q
gate G2 (i.e. Y) is unaffected, because the output of a NOR is reset to 0. Similarly, when S 1 the output is set to 1.
(a) 0 0 (b)
A A Gate
Gate 1 1 X
G1 X G1 Gate
0 Y
Assume that G2
B
A and B are
initially 0 Assume that X Note that the gates are cross-coupled
1 is initially 1 with the output of one gate connected
Gate 0 0 to the input of the other gate
0 0 G2 Y
B
Analyzing the circuit by assuming initial conditions. An alternative view of the circuit.
(a) Truth table for Fig. 3.2. (b) Truth table for relabeled Fig. 3.2.
A B X X R S Q Q
0 0 0 0 0 0 0 0 No change
0 0 1 1 0 0 1 1 No change
0 1 0 1 0 1 0 1 Set
0 1 1 1 0 1 1 1 Set
1 0 0 0 1 0 0 0 Clear
1 0 1 0 1 0 1 0 Clear
1 1 0 0 1 1 0 ? Undefined
1 1 1 0 1 1 1 ? undefined
↑ ↑ ↑ ↑
Old X New X Old Q New Q
The truth table is interpreted as follows. The output of the circuit is currently X (or Q) and the new inputs to be applied to the input terminals are A, B
(or R, S). When these new inputs are applied to the circuit, its output is given by X (or Q). For example, if the current output X is 1 and the new
values of A and B are A 1, B 0, then the new output, X, will be 0. This value of X then becomes the next value of X when new inputs A and B
are applied to the circuit.
When R and S are both 0, the output does not change; that R R Q
is, Q Q.
If both R and S are simultaneously 1, the output is concep- S S Q
Inputs Outputs
tually undefined (hence the question marks in Table 3.1(b),
because the output can’t be set and reset at the same time. In Figure 3.4 Circuit representation of the RS flip-flop as a black box.
the case of the RS flip-flop implemented by two NOR gates,
the output X does, in fact, go low when A B 1. In prac-
tice, the user of an RS flip-flop should avoid the condition Inputs Output Description
R S 1. R S Q
The two-NOR gate flip-flop of Fig. 3.2 has two outputs X
and Y. An examination of the circuit for all inputs except 0 0 Q No change
A B 1 reveals that X and Y are complements. Because of 0 1 1 Set output to 1
the symmetric nature of flip-flops, almost all flip-flops have 1 0 0 Reset output to 0
two outputs, Q and its complement Q. The complement of Q 1 1 X Forbidden
may not always be available to the user of the flip-flop
because many commercial devices leave Q buried on the chip
Table 3.2 An alternative truth table for the RS flip-flop.
and not brought out to a pin. Figure 3.4 gives the circuit
representation of an RS flip-flop.
We can draw the truth table of the RS or any other flip- to indicate a don’t care condition. An indeterminate condi-
flop in two ways. Up to now we’ve presented truth tables tion is one whose outcome can’t be calculated, whereas a
with two output lines for each possible input, one line don’t care condition is one whose outcome does not matter to
for Q 0 and one for Q 1. An alternative approach is to the designer.
employ the algebraic value of Q and is illustrated by
Table 3.2. 3.1.2 Characteristic equation of
When R S 0 the new output Q is simply the old
output Q. In other words, the output doesn’t change state and
an RS flip-flop
remains in its previous state as long as R and S are both 0. We have already demonstrated that you can derive an equa-
The inputs R S 1 result in the output Q X. The tion for a flip-flop by analyzing its circuit. Such an equation is
symbol X is used in truth tables to indicate an indeterminate called the flip-flop’s characteristic equation. Instead of using
or undefined condition. In Chapter 2 we used the same symbol an actual circuit, we can derive a characteristic equation from
106 Chapter 3 Sequential logic
SR
Inputs Output Comment
Q 00 01 11 10
0 X 1 R S Q
1 1 X 1
0 0 X Forbidden
0 1 1 Reset output to 0
Figure 3.5 Karnaugh map for the characteristic equation of an 1 0 0 Set output to 1
RS flip-flop.
1 1 Q No change
R
Gate
G1 Q Table 3.3 Truth table for an RS flip-flop constructed from
NAND gates.
Active-low
inputs
Gate Q 1
G2
S S
0
Figure 3.6 RS flip-flop constructed from two cross-coupled
NAND gates.
Inputs Rising edge
1
the flip-flop’s truth table. Figure 3.5 plots Table 3.1(b) on a of S sets Q
Karnaugh map. We have indicated the condition R S 1 R
0
by X because it is a forbidden condition. From this truth table
Rising edge
we can write Q S Q ⋅ R. of R resets Q
Note that this equation is slightly different from the one 1
we derived earlier because it treats R S 1 as a don’t care Output Q
condition. 0
Vne
Comparator Overspeed
S Q
Pressure to FF1 warning light
voltage
Pressure transducer R
sensing
head Vfl Comparator
Flap extension
1 S Q
Flap selection warning light
Accelerometer switch FF2
measures R
g-force
1
Master reset
Master
From other warning
warning circuits
simple, we will consider three possible events that are consid- test button is pushed, all warning lights should be illumin-
ered harmful and might endanger the aircraft. ated and remain so until the reset button is pressed. A test
1. Exceeding the maximum permissible speed Vne. facility verifies the correct operation of the flip-flops and the
warning lights.
2. Extending the flaps above the flap-limiting speed Vfl. That is,
the flaps must not be lowered if the aircraft is going faster A pulse-train generator
than Vfl.
Figure 3.9 gives the circuit of a pulse-train generator that
3. Exceeding the maximum acceleration (g-force) Gmax.
generates a sequence of N pulses each time it is triggered by a
If any of the above parameters are exceeded (even for only positive transition at its START input. The value of N is user
an instant), a lasting record of the event must be made. supplied and is fed to the circuit by three switches to select the
Figure 3.8 shows the arrangement of warning lights used values of Cc, Cb, Ca. This circuit uses the counter that we will
to indicate that one of these conditions has been violated. meet later in this chapter.
Transducers that convert acceleration or velocity into a The key to this circuit is the RS flip-flop, G6, used to start
voltage measure the acceleration and speed of the aircraft. and stop the pulse generator. Assume that initially the R and
The voltages from the transducers are compared with the three S inputs to the flip-flop are R 0 and S 0 and that its
threshold values (Vne, Vfl, Gmax) in comparators, whose outputs output Q is a logical 0. Because one of the inputs to AND gate
are true if the threshold is exceeded, otherwise false. In order G1 is low, the pulse train output is also low.
to detect the extension of flaps above the flap-limiting When a logical 1 pulse is applied to the flip-flop’s START
speed, the output of the comparator is ANDed with a signal input, its Q output rises to a logical 1 and enables AND gate
from the flap actuator circuit that is true when the flaps G1. A train of clock pulses at the second input of G1 now
are down. appears at the output of the AND gate. This gated pulse train
The three signals from the comparators are fed, via OR is applied to the input of a counter (to be described later),
gates, to the S inputs of three RS flip-flops. Initially, on which counts pulses and generates a three-bit output on Qa,
switching on the system, the flip-flops are automatically reset Qb, Qc, corresponding to the number of pulses counted in the
by applying a logical 1 pulse to all R inputs simultaneously. If range 0 to 7. The outputs of the counter are fed to an equality
at any time one of the S inputs becomes true, the output of detector composed of three EOR gates, G2 to G4, plus NOR
that flip-flop is set to a logical 1 and triggers an alarm. All gate G5. A second input to the equality detector is the user-
outputs are ORed together to illuminate a master warning supplied count value Ca, Cb, Cc. The outputs of the EOR gates
light. A master alarm signal makes it unnecessary for the pilot are combined in NOR gate G5 (notice that it’s drawn in
to have to scan all the warning lights periodically. An addi- negative logic form to emphasize that the output is 1 if all its
tional feature of the circuit is a test facility. When the warning inputs are 0).
108 Chapter 3 Sequential logic
S
START Start/stop flip-flop
G6
Pulse train
R Q output
G1
Clock
Clock
RESET Counter G7
The counter
The counter's Qc Qb Qa counts pulses
RESET input
resets its outputs
to zero
G2 G3 G4 Gates G2, G3, G4, and G5
constitute a comparator
that compares Qc Qb Qa
RESET Cc Cb Ca with Cc Cb Ca
G5
The values of Cc, Cb, Ca are
RESET asserted when the counter user selected to determine
reaches the preselected value of the length of the pulse train Figure 3.9 Pulse train
Cc, Cb, Ca generator.
Clock
START
Output
Counter 0 1 2 3 4 0
output
RESET
Figure 3.10 gives a timing diagram for the pulse generator. Resetting the RS flip-flop disables AND gate G1 and no further
Initially the counter is held in a reset state (Qa Qb Qc 0). clock pulses appear at the output of G1. In this application of the
When the counter is clocked, its output is incremented by 1 on RS flip-flop, its S input is triggered to start an action and its
the falling edge of each clock pulse. The counter counts upward R input is triggered to terminate the action.
from 0 and the equality detector compares the current count on
Qa, Qb, Qc output with the user-supplied inputs Ca, Cb, Cc.When
the output of the counter is equal to the user-supplied input, the
3.1.5 The clocked RS flip-flop
output of gate G5 goes high and resets both the counter and the The RS flip-flop of Fig. 3.2 responds to signals applied to its
RS flip-flop.Resetting the counter forces the counter output to 0. inputs according to its truth table. There are situations when
3.2 The D flip-flop 109
we want the RS flip-flop to ignore its inputs until a particular R and S inputs. The output of the RS flip-flop remains
time. The circuit of Fig. 3.11 demonstrates how this is accomp- constant as long as these R and S inputs are both 0.
lished by turning the RS flip-flop into a clocked RS flip-flop. Whenever C 1, the external R and S inputs to the
A normal, unmodified, RS flip-flop lies in the inner box in circuit are transferred to the flip-flop so that R R and
Fig. 3.11. Its inputs, R and S , are derived from the external S S, and the flip-flop responds accordingly. The clock
inputs R and S by ANDing them with a clock input C—some input may be thought of as an inhibitor, restraining the flip-
texts call these two AND gates ‘steering gates’. As long as flop from acting until the right time. Figure 3.12 demon-
C 0, the inputs to the RS flip-flop, R and S , are forced to strates how we can build a clocked RS flip-flop from NAND
remain at 0, no matter what is happening to the external gates. Clocked flip-flops are dealt with in more detail later
in this chapter.
R R⬘
Q 3.2 The D flip-flop
Like the RS flip-flop, the D flip-flop has two inputs, one called
D and the other C. The D input is referred to as the data input
S S⬘ Q and C as the clock input. The D flip-flop is, by its nature, a
RS flip-flop clocked flip-flop and we will call the act of pulsing the C input
C high and then low clocking the D flip-flop.
The AND gates ensure that When a D flip-flop is clocked, the value at its D input is
the inputs to the RS flip-flop transferred to its Q output and the output remains constant
are low unless C is high
until the next time it is clocked. The D flip-flop is a staticizer
Figure 3.11 The clocked RS flip-flop. because it records the state of the D input and holds it con-
stant until it’s clocked. Others call it a delay element because,
if the D input changes state at time T but the flip-flop is
R
clocked t seconds later, the output Q doesn’t change state
until t seconds after the input. I think of the D flip-flop as a
Q
census taker because it takes a census of the input and remem-
bers it until the next census is taken. The truth table for a
D flip-flop is given in Table 3.4.
The circuit of a D flip-flop is provided in Fig. 3.13 and
S Q
consists of an RS flip-flop plus a few gates. The two AND
gates turn the RS flip-flop into a clocked RS flip-flop. As long
C
as the C input to the AND gates is low, the R and S inputs are
Figure 3.12 Building a clocked RS flip-flop with NAND gates. clamped at 0 and Q cannot change.
0 0 0 0 Q←Q No change 0 0 Q
0 0 1 1 Q←Q No change 0 1 Q
0 1 0 0 Q ←Q No change 1 0 0
0 1 1 1 Q←Q No change 1 1 1
1 0 0 0 Q←D
1 0 1 0 Q←D
1 1 0 1 Q←D
1 1 1 1 Q←D
When C goes high, the S input is connected to D and the Figure 3.14 The 74LS74 D flip-flop.
R input to D. Consequently, (R, S) must either be (0, 1) if
D 1, or (1, 0) if D 0. Therefore, D 1 sets the RS flip-
flop, and D 0 clears it. dm–1 d1 d0
Register
3.2.1 Practical sequential logic
D Q Q0
elements C
Just as semiconductor manufacturers have provided combi-
national logic elements in single packages, they have done the D Q Q1
same with sequential logic elements. Indeed, there are more C
special-purpose sequential logic elements than combina-
tional logic elements. Practical flip-flops are more complex
than those presented hitherto in this chapter. Real circuits
have to cater for real-world problems. We have already said D Q Qm–1
that the output of a flip-flop is a function of its current inputs C
and its previous output. What happens when a flip-flop is m D flip-flops
first switched on? The answer is quite simple. The Q output
takes on a random state, assuming no input is being applied
that will force Q into a 0 or 1 state. m-bit data bus Clock
Random states may be fine at the gaming tables in Las
Vegas; they’re less helpful when the control systems of a Figure 3.15 Using D flip-flops to create a register.
nuclear reactor are first energized. Many flip-flops are pro-
vided with special control inputs that are used to place them momentarily asserted active-low by a single pulse shortly
in a known state. Figure 3.14 illustrates the 74LS74, a dual after the power is switched on.
positive-edge triggered D flip-flop that has two active-low
control inputs called preset and clear (abbreviated PRE and 3.2.2 Using D flip-flops to
CLR). In normal operation both PRE and CLR remain in
logical 1 states. If PRE 0 the Q output is set to a logical 1
create a register
and if CLR 0 the Q output is cleared to a logical 0. As in Later we shall discover that a computer is composed of little
the case of the RS flip-flop, the condition PRE CLR 0 more than combinational logic elements, buses, and groups of
should not be allowed to occur. flip-flops called registers that transmit data to and receive data
These preset and clear inputs are unconditional in the sense from buses. A typical example of the application of D flip-
that they override all activity at the other inputs of this flip- flops is provided by Fig. 3.15 in which an m-bit wide data bus
flop. For example, asserting PRE sets Q to 1 irrespective of transfers data from one part of a digital system to another.
the state of the flip-flop’s C and D inputs. When a digital Data on the bus is constantly changing as different devices use
system is made up from many flip-flops that must be set or it to transmit their data from one register to another.
cleared at the application of power, their PRE or CLR lines The D inputs of a group of m D flip-flops are connected to
are connected to a common RESET line and this line is the m lines of the bus. The clock inputs of all flip-flops are
3.2 The D filp-flop 111
3.2.4 A typical register chip supply pins, and two control inputs. The clock input, G, is a
You can obtain a single package containing the flip-flops that level-sensitive clock, which, when high, causes the value at Di
implement a register. Figure 3.19 illustrates the 74LS373, an to be transferred to Qi. All eight clock inputs are connected
octal register composed of D flip-flops that is available in a together internally so that the G input clocks each flip-flop
20-pin package with eight inputs, eight outputs, two power simultaneously.
3.3 Clocked flip-flops 113
Output enable the four registers. Because the clock inputs are active-high
OE and the outputs of the decoder are active-low, it’s necessary to
invert these outputs. Four inverters, IC6, perform this func-
1D D
tion. When IC5b is enabled, one of its outputs is asserted and
G Q 1Q
the corresponding register clocked. Clocking a register
2D D
latches data from the data bus.
G Q 2Q
Suppose the contents of register 1 are to be copied into reg-
3D D ister 3. The source code at IC5a is set to 01 and the destination
G Q 3Q
code at IC5b is set to 11. This puts the data from register 1 on
4D D the bus and latches the data into register 3. We can easily
G Q 4Q
relate the example of Fig. 3.20 to the digital computer. One of
5D D the most fundamental operations in computing is the assign-
G Q 5Q ment that can be represented in a high-level language as
6D D B A and in a low-level language as MOVE A, B. The action
G Q 6Q MOVE A, B (i.e. transfer the contents of A to B) is imple-
7D D mented by specifying A as the source and B as the destination.
G Q 7Q Note that throughout this text we put the destination of a
8D D data transfer in bold font to stress the direction of data
G Q 8Q transfer.
Clock
G
OE Q Q Q Q Q Q Q Q
0 1 2 3 4 5 6 7
G D0 D1 D2 D3 D4 D5 D6 D7
IC6 74LS04
Y3 Y2 Y1 Y0 Y3 Y2 Y1 Y0
2-line to 4-line
74LS139 74LS139 decoders
IC 5a IC 5b
OE = Output enable
S1 S0 DE1 DE0 G = Clock
Source code Destination code
Figure 3.20 Using the
74LS373 octal register in a
Enable source Enable destination 8-bit parallel data bus bused system.
at the output of process C and is clocked four units of time systems and popular wisdom says that they are best avoided
after t 0, the desired data will be latched into the flip-flop because they are inherently less reliable than synchronous
and held constant until the next clock pulse. Clocked systems circuits. The 1990s saw a renewed interest in asynchronous
hold digital information constant in flip-flops while the infor- systems because of their speed and lower power consumption.
mation is operated on by groups of logic elements, analogous
to the processes of Fig. 3.21. Between clock pulses, the outputs
3.3.1 Pipelining
of the flip-flops are processed by the logic elements and the
new data values are presented to the inputs of flip-flops. Now consider the effect of placing D flip-flops at the outputs of
After a suitable time delay (longer than the time taken for processes A, B, and C in the system of Fig. 3.23. Figure 3.23
the slowest process to be completed), the flip-flops are clocked. shows the logical state at several points in a system as a function
The outputs of the processes are held constant until the next of time. The diagram is read from left to right (the direction of
time the flip-flops are clocked. A clocked system is often called time flow). Signals are represented by parallel lines to demon-
synchronous, as all processes are started simultaneously on each strate that the signal values may be 1s or 0s (we don’t care).
new clock pulse. An asynchronous system is one in which the What matters is the time at which signals change. Changes are
end of one process signals (i.e. triggers) the start of the next. shown by the parallel lines crossing over. Lines with arrow-
Obviously, an asynchronous system must be faster than the heads are drawn between points to demonstrate cause and
corresponding synchronous system. Asynchronous systems effect; for example, the line from Input A to Output A shows
are more complex and difficult to design than synchronous that a change in Input A leads to a change in Output A.
3.3 Clocked flip-flops 115
Two-unit delay
Two-unit delay
Input A Process A
Process C Output C
Input A valid
Input A
Input B valid
Input B
Two-unit delay
Output A valid
Output A
One-unit delay
Output B valid
Output B
Two-unit delay
Output C valid
Output C
0 1 2 3 4 5 6 Time
Figure 3.21 Processes and
Delay before C is valid delays.
In this example we assume that each of the processes intro- A level-sensitive clock triggers a flip-flop whenever the
duces a single unit of delay and the flip-flops are clocked clock is in a particular logical state (some flip-flops are
simultaneously every unit of time. Figure 3.23 gives the tim- clocked by a logical 1 and some by a logical 0). The clocked RS
ing diagram for this system. Note how a new input can be flip-flop of Fig. 3.11 is level sensitive because the RS flip-flop
accepted every unit of time, rather than every two units of responds to its R and S inputs whenever the clock input is
time as you might expect. The secret of our increase in high. A level-sensitive clock is unsuitable for certain
throughput is called pipelining because we are operating on applications. Consider the system of Fig. 3.24 in which the
different data at different stages in the pipeline. For example, output of a D flip-flop is fed through a logic network and
when process A and process B are operating on data i, process then back to the flip-flop’s D input. If we call the output of the
C is operating on data i 1 and the latched output from flip-flop the current Q, then the current Q is fed through the
process C corresponds to data i 2. logic network to generate a new input D. When the flip-flop
When we introduce the RISC processor we will discover is clocked, the value of D is transferred to the output to
that pipelining is a technique used to speed up the operation generate Q.
of a computer by overlapping consecutive operations. If the clock is level sensitive, the new Q can rush through
the logic network and change D and hence the output. This
3.3.2 Ways of clocking flip-flops chain of events continues in an oscillatory fashion with the
dog chasing its tail. To avoid such unstable or unpredictable
A clocked flip-flop captures a digital value and holds it
behavior, we need an infinitesimally short clock pulse to
constant. There are, however, three ways of clocking a
capture the output and hold it constant. As such a short pulse
flip-flop.
can’t easily be created, the edge-sensitive clock has been intro-
1. Whenever the clock is asserted (i.e. a level-sensitive flip-flop). duced to solve the feedback problem. Level-sensitive clocked
2. Whenever the clock is changing state (i.e. an edge-sensitive D flip-flops are often perfectly satisfactory in applications
flip-flop). such as registers connected to data buses, because the dura-
3. Capture data on one edge of the clock and transfer it to the tion of the clock is usually small compared to the time for
output on the following edge (i.e. a master–slave flip-flop). which the data is valid.
116 Chapter 3 Sequential logic
Input A Process A
Process C D Q
Output
Input B Process B C The output of
process C is latched
by a D flip-flop and
Clock held constant
Input A
i –1 i i +2
Time
Input B
i –1 i i +2
Output A
i –1 i
Output B
i –1 i
Output C
i –1 i
Clock
Output Q
i –1 i
The input to the
D flip-flop is sampled Figure 3.22 Latching the
at this point output of a system.
3.3.3 Edge-triggered flip-flops reach each flip-flop. Electrical impulses move through
circuits at somewhat less than the speed of light, which is
An edge-triggered flip-flop is clocked not by the level or state 30 cm/ns. Unless each flip-flop is located at the same distance
of the clock (i.e. high or low), but by the transition of the from the source of the clock pulse and unless any additional
clock signal from zero to one, or one to zero. The former case delays in each path due to other logic elements are identical,
is called a positive or rising-edge sensitive clock and the latter the clock pulse will arrive at the flip-flops at different
is called a negative or falling-edge sensitive clock. As the ris- instants. Moreover, the delay a signal experiences going
ing or falling edge of a pulse may have a duration of less than through a gate changes with temperature and even the age of
1 ns, an edge-triggered clock can be regarded as a level- the gate. Suppose that the output of flip-flop A is connected
sensitive clock triggered by a pulse of an infinitesimally short to the input of flip-flop B and they are clocked together.
duration. A nanosecond (ns) is a thousand millionth (109) Ideally, at the moment of clocking, the old output of A is
of a second. The feedback problem described by Fig. 3.24 clocked into B. If, by bad design or bad luck, flip-flop A is trig-
ceases to exist if you use an edge-sensitive flip-flop because gered a few nanoseconds before flip-flop B, B sees the new
there’s insufficient time for the new output to race back to the output from A, not the old (i.e. previous) output—it’s as if
input within the duration of a single rising edge. A were clocked by a separate and earlier clock.
There are circumstances when edge-triggered flip-flops are Figure 3.25 gives the circuit diagram of a positive edge-
unsatisfactory because of a phenomenon called clock skew. triggered D flip-flop that also has unconditional preset and
If, in a digital system, several edge-triggered flip-flops are clear inputs. Edge triggering is implemented by using the
clocked by the same edge of a pulse, the exact times at which active transition of the clock to clock latches 1 and 2 and then
the individual flip-flops are clocked vary. Variation in the feeding the output of latch 2 back to latch 1 to cut off the
arrival time of pulses at each clock input is called clock skew clock in the NAND gate. That is, once the clock has been
and is caused by the different paths by which clock pulses detected, the clock input path is removed.
3.3 Clocked flip-flops 117
Input A Process A D Q
C Process C
Output
Input B Process B D Q
The outputs from processes
C A and B are captured and
Clock latched and held constant
as the inputs to process C
Clock
Time
Input A
i –1 i i +1 i+2 i+3 i+4 i+5
Output A
i–2 i –1 i i+1 i+2 i+3 i+4
Latched A
i–3 i–2 i –1 i i+1 i+2 i+3
Latched B
i–3 i–2 i –1 i i+1 i+2 i+3
Output C
i–4 i–3 i–2 i –1 i i+1 i+2
Figure 3.23 Latching the
Latched C input and output of processes
i–5 i–4 i–3 i–2 i –1 i i+1
to implement pipelining.
D input to
flip-flop
Clock input
Q input of
level-sensitive
flip-flop
Q output of
positive-edge
triggered
flip-flop
Q output of
negative-edge
triggered
flip-flop
Q output of
master–slave Figure 3.28 Comparison of
flip-flop
flip-flop clocking modes.
Data Data
A AA Pre
D Q D Q Grant1
Request1
CLK CLK
Q
Latch 1a Latch 1b
Request2 B BB Pre
D Q D Q Grant2
CLK CLK
Q
Latch 2a Latch 2b
CLK CLK
Figure 3.30 An
arbiter circuit.
120 Chapter 3 Sequential logic
CLK
CLK
Request2
AA
BB
Grant1
Suppose that Request1 and Request2 are asserted almost arbiter. Once Grant2 is high, Grant2 goes low, causing the
simultaneously when the clock is in a high state. This results in output of OR gate 1 (i.e. A) to go low. This is clocked through
the outputs of both OR gates (A and B) going low simultan- latches 1a and 1b to force Grant1 low and therefore permit
eously. The cross-coupled feedback inputs to the OR gates processor 1 to access the memory. Of course, once Grant1 is
(Grant1 and Grant2) are currently both low. asserted, any assertion of Request2 is ignored.
On the next rising edge of the clock, the Q output of latch
1a (i.e. AA) and the Q output of latch 2a (i.e. BB) both go low.
However, as latch 2a sees a rising edge clock first, its Q output 3.4 The JK flip-flop
goes low one half a clock cycle before latch 1’s output also
goes low. The JK flip-flop can be configured, or programmed, to oper-
When a latch is clocked at the moment its input is chang- ate in one of two modes. All JK flip-flops are clocked and the
ing, it may enter a metastable1 state lasting for up to about majority of them operate on the master–slave principle. The
75 ns before the output of the latch settles into one state or truth table for a JK flip-flop is given in Table 3.5 and Fig. 3.32
the other. For this reason a second pair of latches is used to gives its logic symbol. A bubble at the clock input to a flip-
sample the input latches after a period of 80 ns. flop indicates that the flip-flop changes state on the falling
One clock cycle after Request2 has been latched and out- edge of a clock pulse.
put BB forced low, the output of latch 2b, Grant2 goes low. Its Table 3.5 demonstrates that for all values of J and K, except
complement, Grant2 is fed back to OR gate 1, forcing input A J K 1, the JK flip-flop behaves exactly like an RS flip-flop
high. After a clock cycle AA also goes high. Because Grant2 is with J acting as the set input and K acting as the reset input.
connected to latch 1b’s active-low preset input, latch 1b is When J and K are both true, the output of the JK flip-flop
held in a high state.
At this point, Grant1 is negated and Grant2 asserted, per-
1
mitting processor 2 to access the bus. If a latch is clocked at the exact moment its input is changing state, it
can enter a metastable state in which its output is undefined and it may
When processor 1 relinquishes the memory, Request2
even oscillate for a few nanoseconds.You can avoid the effects of metasta-
becomes inactive-high, causing first B, then BB and finally bility by latching a signal, waiting for it to settle, and then capturing it in
Grant2 to be negated as the change ripples through the a second latch.
3.5 Summary of flip-flop types 121
J K Q Q J K Q
0 0 0 0 No change 0 0 Q No change
0 0 1 1 No change 0 1 0 Clear
0 1 0 0 Reset Q 1 0 1 Set
0 1 1 0 Reset Q 1 1 Q Toggle
1 0 0 1 Set Q
1 0 1 1 Set Q
1 1 0 1 Q←Q
1 1 1 0 Q←Q
(a) Falling-edge (b) Rising-edge next section. Note that the T flip-flop is a JK flip-flop with
clock clock J K 1, which changes state on each clock pulse (we don’t
J
deal with T flip-flops further in this text).
J J Q Q
C Clk Clk We can derive the characteristic equation for a JK flip-flop
K K Q K Q by plotting Table 3.5 on a Karnaugh map, Fig. 3.33. This gives
Positive-edge triggered Negatitive-edge triggered Q J ⋅ Q K ⋅ Q.
JK flip-flop. JK flip-flop. Figure 3.34 demonstrates how a JK flip-flop can be
constructed from NAND gates and Fig. 3.35 describes a
Figure 3.32 Representation of the JK flip-flop. master–slave JK flip-flop.
The master stage captures the input The slave stage copies the previous
and holds it constant captured input to the output terminals
and holds it constant while the next
input is being captured
Master Slave
J
Q
K
Q
Clock
The invertor ensures that the master stage
operates on a rising edge and the slave stage Figure 3.35 Circuit diagram of a
on a falling edge master–slave JK flip-flop.
remains at zero when R returns to 0). When S 1 and R 0, 3.6.1 Shift register
the Q output is forced to one (and remains at one when S
returns to 0). The input conditions R S 1 produce an By slightly modifying the circuit of the register we can build a
indeterminate state and should be avoided. Clocked RS flip- shift register whose bits can be moved one place right every
flops behave as we have described, except that their R and S time the register is clocked. For example, the binary pattern
inputs are treated as zero until the flip-flop is clocked. When 01110101
the RS flip-flop is clocked, its Q output behaves as we have becomes 00111010 after the shift register is clocked once
just described. and 00011101 after it is clocked twice
The JK flip-flop The JK flip-flop always has three inputs, J, K, and 00001110 after it is clocked three times, and so on.
and a clock input C. As long as a JK flip-flop is not clocked, its
Note that after the first shift, a 0 has been shifted in from
output remains in the previous state. When a JK flip-flop is
the left-hand end and the 1 at the right-hand end has been
clocked, it behaves like an RS flip-flop (where J S, K R)
lost. We used the expression binary pattern because, as we
for all input conditions except J K 1. If J K 0, the
shall see later, the byte 01110101 can represent many things.
output does not change state. If K 1 and J 0, the Q out-
However, when the pattern represents a binary number, shift-
put is reset to zero. If J 1 and K 0, the Q output is set to
ing it one place right has the effect of dividing the number by
1. If both J and K are 1, the output changes state (or toggles)
two (just as shifting a decimal number one place right divides
each time it is clocked.
it by 10). Similarly, shifting a number one place left multiplies
The T flip-flop The T flip-flop has a single clock input. Each it by 2. Later we will see that special care has to be taken when
time it is clocked, its output toggles or changes state. A T flip- shifting signed two’s complement binary numbers right (the
flop is functionally equivalent to a JK flip-flop with sign-bit has to be dealt with).
J K 1. Figure 3.36 demonstrates how a shift register is con-
structed from D flip-flops. The Q output of each flip-flop is
connected to the D input of the flip-flop on its right. All clock
3.6 Applications of inputs are connected together so that each flip-flop is clocked
sequential elements simultaneously. When the ith stage is clocked, its output, Qi,
takes on the value from the stage on its left, that is, Qi ← Qi1.
Just as the logic gate is combined with other gates to form Data presented at the input of the left-hand flip-flop, Din, is
combinational circuits such as adders and multiplexers, flip- shifted into the (m1)th stage at each clock pulse.
flops can be combined together to create a class of circuits Figure 3.36 describes a right-shift register—we will look at
called sequential circuits. Here, we are concerned with two registers that shift the data sequence left shortly.
particular types of sequential circuit: the shift register, which The flip-flops in a shift register must either be edge-
moves a group of bits left or right and the counter, which steps triggered or master-slave flip-flops, otherwise if a level-sensitive
through a sequence of values. flip-flop were used, the value at the input to the left-hand
3.6 Applications of sequential elements 123
Din D Q D Q D Q D Q
C C C C
Clock
On each clock pulse
data is copies to the Figure 3.36 The right-shift
next stage on the right register.
Din J Q J Q J Q J Q
C C C C
K Q K Q K Q K Q
Figure 3.37 Shift register
Shift clock composed of JK flip-flops.
Q4 Q3 Q2 Q1 Q0
0 D Q D Q D Q D Q D Q
C C C C C
Clock
Clock
Q4
Q3
Q2
Q1
Q0
State 11010 01101 00110 00011 00001 00000 Figure 3.38 Example of a five-stage
shift-right register.
stage would ripple through all stages as soon as the clock went right and a 0 enters the most-significant bit stage. This figure
high. We can construct a shift register from JK flip-flops just also provides a timing diagram for each of the five Q outputs.
as easily as from RS flip-flops as Fig. 3.37 demonstrates. The output of the right-hand stage, Q0, consists of a series of
Figure 3.38 shows a five-stage shift register that contains five sequential pulses, corresponding to the five bits of the
the initial value 01101. At each clock pulse the bits are shifted word in the shift register (i.e. 11010).
124 Chapter 3 Sequential logic
Parallel output
Input
D Q D Q D Q D Q D Q D Q
C C C C C C
Shift
clock Parallel to serial converter Serial to parallel converter
Note: A real parallel to serial register would
have a means of loading parallel data into it Only two lines are required
to transmit serial data
Multiplexer
Q i+1 Di Qi D i –1 Q i–1
D i –2
C C
Shift clock
Load/shift
Di D i –1 D i –2
A shift register can be used to convert a parallel word of parallel load capacity. A two-input multiplexer, composed of
m bits into a serial word of m consecutive bits. Such a circuit two AND gates, an OR gate, and an inverter switches a flip-
is called a parallel to serial converter. If the output of an m-bit flop’s D input between the output of the previous stage to the
parallel to serial converter is connected to the Din input of an left (shift mode) and the load input (load mode). The control
m-bit shift register, after m clock pulses the information in inputs of all multiplexers are connected together to provide
the parallel to serial converter has been transferred to the the mode control, labeled load/shift. When we label a variable
second (right-hand) shift register. Such a shift register is name1/name2, we mean that when the variable is high it
called a serial to parallel converter and Fig. 3.39 describes a carries out action name1 and when it is low it carries out
simplified version. In practice, a means of loading parallel action name2. If load/shift 0 the operation performed
data into the parallel-to-serial converter is necessary (see is a shift and if load/shift 1 the operation performed is a
Fig. 3.40). There is almost no difference between a parallel to load.
serial converter and a serial to parallel converter.
A flaw in our shift register (when operating as a parallel to Constructing a left-shift register with
serial converter) is the lack of any facilities for loading it with JK flip-flops
m bits of data at one go, rather than by shifting in m bits Although we’ve considered the right-shift register, a left-shift
through Din. Figure 3.40 shows a right-shift register with a register is easy to design. The input of the ith stage, Di, is
3.6 Applications of sequential elements 125
J Q J Q J Q J Q
C C C C
K Q K Q K Q K Q
Shift clock
The input to stage i
comes from the register Figure 3.41 The left-shift
on the right (i.e. stage i –1) register.
its logic function doesn’t matter to the end user. The 74LS95 is a
Shift type Shift left Shift right
versatile shift register and has the following functions.
Original bit pattern before shift 11010111 11010111 Parallel load The four bits of data to be loaded into the shift
Logical shift 10101110 01101011 register are applied to its parallel inputs, the mode control
Arithmetic shift 10101110 11101011 input is set to a logical one, and a clock pulse applied to the
Circular shift 10101111 11101011 clock 2 input. The data is loaded on the falling edge of the
clock 2 pulse.
Table 3.6 The effect of logical, arithmetic, and circular shifts. Right-shift A shift right is accomplished by setting the mode
control input to a logical zero and applying a pulse to the
clock 1 input. The shift takes place on the falling edge of the
connected to the output of the (i⫺1)th stage so that, at each clock pulse.
clock pulse, Qi ← Di 1. In terms of the previous example
Left-shift A shift left is accomplished by setting the mode con-
01110101 trol input to a logical one and applying a pulse to the clock 2
becomes 11101010 after one shift left input. The shift takes place on the falling edge of the clock
and 11212100 after two shifts left pulse. A left shift requires that the output of each flip-flop be
connected to the parallel input of the previous flip-flop and
The structure of a left-shift register composed of JK flip-
serial data entered at the D input.
flops is described in Fig. 3.41.
Table 3.7 provides a function table for this shift register
When we introduce the instruction set of a typical computer
(taken from the manufacturer’s literature). This table
we’ll see that there are several types of shift (logical, arithmetic,
describes the behavior of the shift register for all combina-
circular). These operations all shift bits left or right—the only
tions of its inputs. Note that the table includes don’t care
difference between them concerns what happens to the bit
values of inputs and the effects of input transitions (indicated
shifted in. So far we’ve described the logical shift where a 0 is
by ↓ and ↑).
shifted in and the bit shifted out at the other end is lost. In an
arithmetic shift the sign of 2’s complement number is preserved
when it is shifted right (this will become clear when we intro- Designing a versatile shift register—an example
duce the representation of negative numbers in the next chap-
Let’s design an 8-bit shift register to perform the following
ter). In a circular shift the bit shifted out of one end becomes the
operations.
bit shifted in at the other end. Table 3.6 describes what happens
when the 8-bit value 11010111 undergoes three types of shift.
(a) Load each stage from
an 8-bit data bus (parallel load)
A typical shift register
(b) Logical shift left (0 in, MSB lost)
Figure 3.42 gives the internal structure of a 74LS95 parallel-
(c) Logical shift right (0 in, LSB lost)
access bidirectional shift register chip. You access the shift regis-
ter through its pins and cannot make connections to the (d) Arithmetic shift left (same as logical shift left)
internal parts of its circuit. Indeed, its actual internal imple- (e) Arithmetic shift right (MSB replicated, LSB lost)
mentation may differ from the published circuit. As long as it (f) Circular shift left (MSB moves to LSB position)
behaves like its published circuit, the precise implementation of (g) Circular shift right (LSB moves to MSB position)
126 Chapter 3 Sequential logic
Parallel inputs
A B C D
Mode
control
Serial
input
Clock1
right shift
Clock2 R R R R
left shift Clk Clk Clk Clk
S Q S Q S Q S Q
Qa Qb Qc Qd
Outputs
Inputs Outputs
Mode Clocks Serial Parallel inputs
control
2 (L) 1(R) A B C D Qa Qb Qc Qd
Shift clock
D7 Di+1 Di D i–1 D0
Parallel input
Therefore, J7 S⋅D7 S(R⋅L⋅0 R⋅A⋅Q7 R⋅C⋅Q0 From Fig. 3.44 it can be seen that conditions C3 or C4 cause
R⋅L⋅Q6 R⋅A⋅Q6 R⋅C⋅Q6) a transition from state S0 to state S1. Similarly, conditions
S⋅D7 S(R⋅A⋅Q7 R⋅C⋅Q0 C2 or C4 cause a transition from state S1 to state S0. Condition
R⋅Q6 (L A C)) C4 causes a change of state from S0 to S1 and also from S1 to S0.
S⋅D7 S(R (A⋅Q7 C⋅Q0) R⋅Q6) This is, of course, the condition J K 1, which causes the
JK flip-flop to toggle its output. Some conditions cause a state
to change to itself; that is, there is no overall change. Thus,
3.6.2 Asynchronous counters conditions C1 or C2, when applied to the system in state S0,
A counter is a sequential circuit with a clock input and have the effect of leaving the system in state S0.
m outputs. Each time the counter is clocked, one or more of
The binary up-counter
its outputs change state. These outputs form a sequence with
N unique values. After the Nth value has been observed at the The state diagram of a simple 3-bit binary up-counter is given
counter’s output terminals, the next clock pulse causes in Fig. 3.45 (an up-counter counts upward 0, 1, 2, 3, . . . in
the counter to assume the same output as it had at the start of contrast with a down-counter, which counts downward . . . ,
the sequence; that is, the sequence is cyclic. For example, a 3, 2, 1, 0). In this state diagram, there is only a single path from
counter may display the sequence 01234501234501 . . . or the each state to its next higher neighbor. As the system is clocked,
sequence 9731097310973 . . . it cycles through the states S0 to S7 representing the natural
A counter composed of m flip-flops can generate an arbit- binary numbers 0 to 7. The actual design of counters in gen-
rary sequence with a length of not greater than 2m cycles eral can be quite involved, although the basic principle is to
before the sequence begins to repeat itself. ask ‘What input conditions are required by the flip-flops to
One of the tools frequently employed to illustrate the oper- cause them to change from state Si to state Si1?’
ation of sequential circuits is the state diagram. Any system The design of an asynchronous natural binary up-counter
with internal memory and external inputs such as the flip- is rather simpler than the design of a counter for an arbitrary
flop can be said to be in a state that is a function of its internal sequence. Figure 3.46 gives the circuit diagram of a 3-bit
and external inputs. A state diagram shows some (or all) of binary counter composed of JK flip-flops and Fig. 3.47 pro-
the possible states of a given system. A labeled circle repres- vides its timing diagram. The J and K inputs to each flip-flop
ents each of the states and the states are linked by unidirec-
tional lines showing the paths by which one state becomes J K Condition
another state.
0 0 C1
Figure 3.44 gives the state diagram of a JK flip-flop that has
0 1 C2
just two states, S0 and S1. S0 represents the state Q 0 and
S1 represents the state Q 1. The transitions between states 1 0 C3
S0 and S1 are determined by the values of the JK inputs at the 1 1 C4
time the flip-flop is clocked. In Fig. 3.44 we have labeled the
flip-flop’s input states C1 to C4. Table 3.8 defines the four pos- Table 3.8 Relationship between JK inputs
sible input conditions, C1, C2, C3, and C4, in terms of J and K. and conditions C1 to C4.
C3 + C4
S0 S1
C1 + C2 (Q = 0) (Q = 0) C1 + C3
C2 + C4
Lines with arrows indicate a change of state.
A line from a state back to The boolean equation indicates the condition
itself indicates that the that causes this state transition
corresponding condition
does not cause a change
of state
Figure 3.44 The state diagram of a JK flip-flop.
3.6 Applications of sequential elements 129
S0
000
S7 S1
111 001
S6 S2
110 010
State S0 S1 S2 S3 S4 S5 S6 S7 S0
Clock
Q0
Q1
Q2
Digital Works
initializes flip-flops to
Q = 0 at the start of
a simulation.
(i.e. 0 to 15). To create a decade counter the state 10 (1010) The combined duration of flip-flop reset time plus a gate
must be detected and used to reset the flip-flops. Fig. 3.50 delay will normally provide sufficient time to ensure that all
provides a possible circuit. flip-flops are reset.
The binary counter counts normally from 0 to 9. On the It is possible to imagine situations in which the circuit
tenth count Q3 1 and Q1 1. This condition is detected by would not function correctly. Suppose that the minimum
the NAND gate whose output goes low, resetting the flip- reset pulse required to guarantee the reset of a flip-flop were
flops. The count of 10 exists momentarily as Fig. 3.51 demon- 50 ns. Suppose also that the minimum time between the
strates. We could have detected the state 10 with Q3, Q2, Q1, application of a reset pulse and the transition Q ← 0 were
Q0 1010. However, that would have required a four-input 10 ns and that the propagation delay of a NAND gate were
gate and is not strictly necessary. Although Q3 1 and 10 ns. It would indeed be possible for the above error to
Q1 1 corresponds to counts 10, 11, 14, and 15, the counter occur. This example demonstrates the dangers of designing
never gets beyond 10. asynchronous circuits!
The reset pulse must be long enough to reset all flip-flops
to zero. If the reset pulse were too short and, say, Q1 was reset The pulse generator revisited
before Q3, the output might be reset to 1000. The counting When we introduced the RS flip-flop we used it to start and
sequence would now be: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, (10), 8, 9, 8, stop a simple pulse generator that created a train of n pulses.
9, . . . . However, such a problem is unlikely to occur in this Figure 3.52 shows a pulse generator in Digital Works. This
case, because the reset pulse is not removed until at least the system is essentially the same as that in Fig. 3.9, except that
output of one flip-flop and the NAND gate has changed state. we’ve built the counter using JK flip-flops and we’ve added
132 Chapter 3 Sequential logic
Q0 Q1 Q2 Q3
Q1 Q3
State S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S0
Clock
Q0
Q1
Although Q goes high
when the counter
reaches 10, it is reset
Q2 by the NAND gate
Q3
Q3Q1
The NAND gate
Count 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 0000 detects when the
counter reaches 10
LEDs to examine the signals produced when the system runs. designed to count through any arbitrary sequence just as well
Note also that the RS flip-flop can be set only when the as the natural sequence 0, 1, 2, 3, . . . .
flip-flop is in the reset mode. We design a synchronous counter by means of a state dia-
gram and the excitation table for the appropriate flip-flop
(either RS or JK). An excitation table is a version of a flip-
3.6.3 Synchronous counters flop’s truth table arranged to display the input states required
Synchronous counters are composed of flip-flops that are all to force a given output transition. Table 3.9 illustrates the
clocked at the same time. The outputs of all stages of a syn- excitation table of a JK flip-flop. Suppose we wish to force the
chronous counter become valid at the same time and the Q output of a JK flip-flop to make the transition from 0 to 1
ripple-through effect associated with asynchronous counters the next time it is clocked. Table 3.9 tells us that the J, K input
is entirely absent. Synchronous counters can be easily should be 1, d (where d don’t care).
3.6 Applications of sequential elements 133
Qd Qc Qb Qa Qd Qc Qb Qa Jd Kd Jc Kc Jb Kb Ja Ka
0 0 0 0 0 0 0 0 1 0 d 0 d 0 d 1 d
1 0 0 0 1 0 0 1 0 0 d 0 d 1 d d 1
2 0 0 1 0 0 0 1 1 0 d 0 d d 0 1 d
3 0 0 1 1 0 1 0 0 0 d 1 d d 1 d 1
4 0 1 0 0 0 1 0 1 0 d d 0 0 d 1 d
5 0 1 0 1 0 1 1 0 0 d d 0 1 d d 1
6 0 1 1 0 0 1 1 1 0 d d 0 d 0 1 d
7 0 1 1 1 1 0 0 0 1 d d 1 d 1 d 1
8 1 0 0 0 1 0 0 1 d 0 0 d 0 d 1 d
9 1 0 0 1 0 0 0 0 d 1 0 d 0 d d 1
10 1 0 1 0 x X x x x x x x x x x x
11 1 0 1 1 x x x x x x x x x x x x
12 1 1 0 0 x x x x x x x x x x x x
13 1 1 0 1 x x x x x x x x x x x x
14 1 1 1 0 x x x x x x x x x x x x
15 1 1 1 1 x x x x x x x x x x x x
The ds in the table correspond to don’t care conditions in the excitation table of the JK flip-flop. The x’s correspond to don’t care conditions due to
unused states; for example, the counter never enters states 1010 to 1111. There is, of course, no fundamental difference between x and d. We’ve
chosen different symbols in order to distinguish between the origins of the don’t care states.
G1 G2
G3
1 J Qa J Qb J Qc J Qd
C FF1 C FF2 C FF3 C FF4
1 K Qa K Qb K Qc K Qd Figure 3.54 Circuit diagram for
Clock a 4-bit synchronous BCD
The clock triggers all flip-flops simultaneously counter.
Figure 3.53 gives the Karnaugh maps for this counter. These
maps can be simplified to give 3.7 Introduction to state machines
Jd Qc⋅ Qb⋅ Qa Kd Qa
Jc Qb⋅ Qa Kc Qb⋅Qa No discussion of sequential circuits would be complete with-
Jb Qd⋅ Qa Kb Qa out at least a mention of state machines. The state machine
Ja 1 Ka 1 offers the designer a formal way of specifying, designing, test-
ing, and analyzing sequential systems. Because the detailed
We can now write down the circuit diagram of the study of state machines is beyond the scope of this intro-
synchronous counter (Fig. 3.54). Remember that d denotes a ductory text, we shall simply introduce some of the basic
don’t care condition and indicates that the variable marked concepts here.
by a d may be a 0 or a 1 state. The same technique can be It would be impossible to find a text on state machines
employed to construct a counter that will step through any without encountering the general state machines called
arbitrary sequence. We will revisit this technique when we Mealy machines and Moore machines (after G. H. Mealy and
look at state machines. E. Moore). Figure 3.55 illustrates the structure of a Mealy
3.7 Introduction to state machines 135
Qd Qc QdQ c
QbQ a 00 01 11 10 Q bQ a 00 01 11 10
00 X d 00 d d X
01 X d 01 d d X 1
11 1 X X 11 d d X X
10 X X 10 d d X X
Jd = Q cQ bQ a K d = Qa
QdQ c QdQ c
Q bQa 00 01 11 10 Q bQ a 00 01 11 10
00 d X 00 d X d
01 d X 01 d X d
11 1 d X X 11 d 1 X X
10 d X X 10 d X X
Jc = Q bQ a K c = Q bQ a
QdQ c Q dQ c
Q bQa 00 01 11 10 Q bQ a 00 01 11 10
00 X 00 d d X d
01 1 1 X 01 d d X d
11 d d X X 11 1 1 X X
10 d d X X 10 X X
Jb = Q d Q a K b = Qa
QdQ c Q dQ c
Q bQa 00 01 11 10 Q bQ a 00 01 11 10
00 1 1 X 1 00 d d X d
01 d d X X 01 1 1 X 1
11 d d X X 11 1 1 X X
10 1 1 X X 10 d d X X
Ja = 1
Figure 3.53 Karnaugh maps for a
Ka = 1
synchronous counter.
136 Chapter 3 Sequential logic
Inputs
Outputs
Input logic Output logic
(combinational) (combinational)
Memory
Inputs
Outputs
Input logic Output logic
(combinational) (combinational)
Memory
state machine and Fig. 3.56 the structure of a Moore state Figure 3.57 shows a black box state machine that detects
machine. Both machines have a combinational network that the sequence 010 in a bit stream. We have provided input and
operates on the machine’s inputs and on its internal states to output sequences to demonstrate the machine’s action.
produce a new internal state. The output of the Mealy We solve the problem by constructing a state diagram as
machine is a function of the current inputs and the internal illustrated in Fig. 3.58. Each circle represents a particular state
state of the machine, whereas the output of a Moore machine of the system and transitions between states are determined
is a function of the internal state of the machine only. by the current input to the system at the next clock pulse.
A state is marked name/value, where name is the label we
use to describe the state (e.g. states A, B, C, and D in Fig. 3.58)
3.7.1 Example of a state machine and value is the output corresponding to that state. The trans-
As we have already said, the state machine approach to the ition between states is labeled a/b, where a is the input condi-
design of sequential circuits is by no means trivial. Here, we tion and b the output value after the next clock. For example,
will design a simple state machine by means of an example. the transition from state A to state B is labeled 0/0 and
Suppose we require a sequence detector that has a serial indicates that if the system is in state A and the input is 0, the
input X and an output Y. If a certain sequence of bits appears next clock pulse will force the system into state B and set
at the input of the detector, the output goes true. Sequence the output to 0.
detectors are widely used in digital systems to split a stream of Figure 3.59 provides a partial state diagram for this
bits into units or frames by providing special bit patterns sequence detector with details of the actions that take place
between adjacent frames and then using a sequence detector during state transitions. State A is the initial state in Fig. 3.59.
to identify the start of a frame. Suppose we receive an input while in state A. If input X is a 0
In the following example we design a sequence detector we may be on our way to detecting the sequence 010 and
that produces a true output Y whenever it detects the therefore we move to state B along the line marked 0/0 (the
sequence 010 at its X input. output is 0 because we have not detected the required
sequence yet). If the input is 1, we return to state A because we
For example, if the input sequence is 000110011010110001011, have not even begun to detect the start of the sequence.
the output sequence will be 000000000000100000010 From state B there are two possible transitions. If we detect
a 0 we remain in state B because we are still at the start of the
(the output generates a 1 in the state following the detection desired sequence. If we detect a 1, we move on to state C (we
of the pattern). have now detected 01). From state C a further 1 input takes us
3.7 Introduction to state machines 137
0011010110001011 0000000100000010
X=1
1/0
Start
X=0
X= 1 X=0 0/0
1/0 0 /0
X=0 X=1
0/0 1/0
A/0 B/0
Current state Output Next state Current state Flip-flop outputs Output Next state
A 0 B A A 0 0 0 0,1 0,0
B 0 B C B 0 1 0 0,1 1,0
C 0 D A C 1 0 0 1,1 0,0
D 1 B A D 1 1 1 0,1 0,0
Table 3.11 State table for a 010 sequence detector. Table 3.12 Modified state table for a sequence detector.
Q1 Q2 X Q1 Q2 J1 K1 J2 K2
0 0 0 0 1 0 d 1 d
0 0 1 0 0 0 d 0 d
0 1 0 0 1 0 d d 0
0 1 1 1 0 1 d d 1
1 0 0 1 1 d 0 1 d
1 0 1 0 0 d 1 0 d
1 1 0 0 1 d 1 d 0
1 1 1 0 0 d 1 d 1
J1 Q1 Q2⋅X
J2 X
K1 Q2 X
K2 X
Q1
Output = Q1Q2
Q2 X
X· Q 2 J1 Q1 J2 Q2
C C
X + Q2 X
K1 K2
Input X
Figure 3.60 Circuit to detect
Clock the sequence 010.
right back to state A (because we have received 011). 3.7.2 Constructing a circuit to implement
However, if we detect a 0 we move to state D and set the out-
put to 1 to indicate that the sequence has been detected. From
the state table
state D we move back to state A if the next input is a 1 and The next step is to go about constructing the circuit itself. If a
back to state B if it is a 0. From the state diagram we can con- system can exist in one of several states, what then defines the
struct a state table that defines the output and the next state current state? In a sequential system flip-flops are used to hold
corresponding to each current state and input. Table 3.11 state information—in this example there are four states,
provides a state table for Fig. 3.58. which requires two flip-flops.
3.7 Introduction to state machines 139
Table 3.12 expands Table 3.11 to represent internal states A counters and shift registers as well as more general forms of
to D by flip-flop outputs Q1, Q2 0, 0 to 1, 1. We next state machine. We have introduced the RS, D, and JK flip-flops.
construct Table 3.13 to determine the JK input of each JK All these flip-flops can capture data and the JK flip-flop is able
flip-flop that will force the appropriate state transition, given to operate in a toggle mode in which its output changes state
each time it is clocked. Any of these flip-flops can be converted
the next input X. Table 3.13 is derived by using the excitation
into the other two flip-flops by the addition of a few gates.
table of the JK flip-flop (see Table 3.9). The final step is to cre-
We have also introduced the idea of clocking or triggering
ate a circuit diagram from Table 3.13 (i.e. Fig. 3.60). flip-flops. A flip-flop can be triggered by a clock at a given level
Figure 3.61 demonstrates the construction of the or by the change in state of a clock. The master–slave flip-flop
sequence detector in Digital Works. We’ve added LEDs to show latches data at its input when the clock is high (or low) and
the state of the flip-flop outputs and control signals and have transfers data to the output (slave) when the clock changes
provided an example of a run. Note the output pulse after the state.
sequence 010. We used the programmable sequence generator We have looked at the counter and shift register. The counter
to provide a binary pattern for the test. counts through a predetermined sequence such as the natural
integers 0, 1, 2, 3, . . . . A shift register holds a word of data and
is able to shift the bits one or more places left or right. Shift
■ SUMMARY
registers are used to divide and multiply by two and to
In this chapter we’ve looked at the flip-flop, which provides data manipulate data in both arithmetic and logical operations.
storage facilities in a computer and which can be used to create Counters and shift registers can be combined with the type of
140 Chapter 3 Sequential logic
combinational logic we introduced in the previous chapter to 3.2 Explain why it is necessary to employ clocked flip-flops in
create a digital computer. sequential circuits (as opposed to unclocked flip-flops)?
Sequential machines fall into two categories. Asynchronous 3.3 What are the three basic flip-flop clocking modes and why
sequential machines don’t have a master clock and the output is it necessary to provide so many clocking modes?
from one flip-flop triggers the flip-flop it’s connected to. In a
synchronous sequential machine all the flip-flops are triggered 3.4 The behavior of an RS flip-flop is not clearly defined when
at the same time by means of a common master clock. R 1 and S 1. Design an RS flip-flop that does not suffer from
Synchronous machines are more reliable. In this chapter we this restriction. (Note:What assumptions do you have to
have briefly demonstrated how you can construct a make?)
synchronous counter and a machine that can detect a specific 3.5 For the waveforms in Fig. 3.62 draw the Q and Q outputs
binary pattern in a stream of serial data. of an RS flip-flop constructed from two NOR gates (as in
Fig. 3.2).
■ PROBLEMS 3.6 For the input and clock signals of Fig. 3.63, provide a
timing diagram for the Q output of a D flip-flop. Assume that
3.1 What is a sequential circuit and in what way does it differ the flip-flop is
from a combinational circuit?
(a) Level sensitive
(b) positive edge triggered
(c) negative-edge triggered
(d) a master–slave flip-flop
R input 3.7 What additional logic is required to
convert a JK flip-flop into a D flip-flop?
3.8 Assuming that the initial state of the
S input circuit of Fig. 3.64 is given by C 1, D 1,
P 1, and Q 0, complete the table. This
question should be attempted by calculating
the effect of the new C and D on the inputs to
Q output both cross-coupled pairs of NOR gates and
therefore on the outputs P and Q. As P and Q
are also inputs to the NOR gates, the change
in P and Q should be taken into account when
Q output calculating the effect of the next inputs C and
D. Remember that the output of a NOR is 1 if
Figure 3.62 R and S inputs to an RS flip-flop. both its inputs are 0, and is 0 otherwise.
D input to
flip-flop
Clock input
D P
Figure 3.64 Circuit for Question 3.8. Figure 3.65 Circuit for Question 3.9.
3.7 Introduction to state machines 141
Output
Input J0 Q0 J1 Q1
C C
K0 Q0 K1 Q1
J Q3 J Q2 J Q1 J Q0
C C C C
K Q K Q K Q K Q
1
J1 Q1 J2 Q2 J3 Q3
C C C
1
K1 K2 Q2 K3 Q3
Figure 3.68 Circuit diagram for
Clock Question 3.17.
142 Chapter 3 Sequential logic
G1 G2
1 J Q J Q J Q J Q
C FF1 C FF2 C FF3 C FF4
1 J Q J Q J Q J Q
Clock
Q3 Q2 Q1 Q0
J Q J Q J Q J Q
C C C C
K Q K Q K Q K Q
Load
QA
Data A
QB
Data B
Clock
QC
Data C
QD
Data D Q
Clear
Carry out
P
Enable
inputs
T
Input
X FF1 Q Qa FF2 Q Qb
G1 J J
C C
G2
G3 K Q K Q
S0
0/0 1/0
3.23 Design a programmable modulo 10/modulo 12
synchronous counter using JK flip-flops. The counter
1/0 has a control input, TEN/TWELVE, which when high, causes
S1 S4 1/0
the counter to count modulo 10. When low, TEN/TWELVE
causes the counter to count
modulo 12.
0/0 0/0 1/0
3.24 How would you determine the maximum rate at which a
1/0 synchronous counter could be clocked?
0/0 S2 S5
3.25 The circuit in Fig. 3.70 represents a Johnson counter.
0/1 This is also called a twisted ring counter because
1/0 feedback from the last (rightmost) stage is fed back
0/1
0/1 to the first stage by crossing over the Q and Q connections.
Investigate the operation of this
circuit.
S3 S6
1/0
3.26 Design a simple digital time of day clock that can display
the time from 00:00:00 to 23:59:59. Assume that
Figure 3.73 Circuit diagram of a sequence processor.
you have a clock pulse input derived from the public
electricity supply of 50 Hz (Europe) or
60 Hz (USA).
observer) as if they constitute a random series of 1s and 0s. Longer
sequences of random numbers are generated by increasing the 3.27 Figure 3.71 gives the internal organization of a 74162
number of stages in the shift register.The input is the exclusive OR synchronous decade (i.e. modulo 10) counter.
of two or more outputs. Investigate its operation. Explain the function of the various
control inputs. Note that the flip-flops are master–slave JKs
3.17 Use Digital Works to construct the circuit of Fig. 3.68 and with asynchronous (i.e. unconditional) clear inputs.
then investigate its behavior.
3.18 Investigate the behavior of the circuit in Fig. 3.69. 3.28 Design a modulo 8 counter with a clock and a
control input UP. When UP 1, the counter counts 0, 1, 2, . . . ,
3.19 Explain the meaning of the terms asynchronous and 7. When UP 0, the counter counts down 7, 6, 5, . . . 0. This
synchronous in the context of sequential logic systems. What is circuit is a programmable up-/down-counter.
the significance of these terms?
3.29 Design a counter using JK flip-flops to count through the
3.20 Design an asynchronous base 13 counter that counts following sequence.
through the natural binary sequence from 0 (0000) to
12 (1100) and then returns to zero on the next count.
Q2 Q1 Q0
3.21 Design a synchronous binary duodecimal (i.e. base 12)
0 0 1
counter that counts through the natural binary sequence
0 1 0
from 0 (0000) to 11 (1011) and then returns to zero on
the next count. The counter is to be built from four JK 0 1 1
flip-flops. 1 1 0
1 1 1
3.22 Design a synchronous modulo 9 counter using
(a) JK flip-flops 0 0 1 sequence repeats
(b) RS flip-flops (with a master–slave clock).
144 Chapter 3 Sequential logic
3.30 Investigate the action of the circuit in Fig. 3.72 when it is 3.32 Figure 3.74 provides a screen shot of a session using
presented with the input sequence 111000001011111, where Digital Works. Examine the behavior of the circuit both by
the first bit is the rightmost bit. Assume that all flip-flops are constructing it and by analyzing it.
reset to Q 0 before the first bit is received.
3.31 Design a state machine to implement the state diagram
defined in Fig. 3.73.
Computer arithmetic 4
CHAPTER MAP
2 Logic elements and 3 Sequential logic 4 Computer arithmetic 5 The instruction set
Boolean algebra Computers use sequential We show how both positive and architecture
This chapter introduces the basic circuits such as counters to step negative numbers are This is the heart of the book and
component of the digital through the instructions of a represented in binary and how is concerned with the structure
computer, the gate. We show program. This chapter simple arithmetic operations are and operation of the computer
how a few simple gates can be demonstrates how sequential implemented. We also look at itself. We examine the instruction
used to create circuits that circuits are designed using the other aspects of binary set of a processor with a
perform useful functions. We also flip-flop. information such as sophisticated architecture.
demonstrate how Boolean error-detecting codes and data
algebra and Karnaugh maps can compression. Part of this chapter
be used to design and even is devoted to the way in which
simplify digital circuits. multiplication and division is
carried out.
INTRODUCTION
Because of the ease with which binary logic elements are manufactured and because of their
remarkably low price, it was inevitable that the binary number system was chosen to represent
numerical data within a digital computer. This chapter examines how numbers are represented in
digital form, how they are converted from one base to another, and how they are manipulated
within the computer. We begin with an examination of binary codes in general and demonstrate
how patterns of ones and zeros can represent a range of different quantities.
We demonstrate how computers use binary digits to implement codes that detect errors in
stored or transmitted data and how some codes can even correct bits that have been corrupted.
Similarly, we show how codes can be devised that reduce the number of bits used to encode
information (e.g. the type of codes used to zip files).
The main theme of this chapter is the class of binary codes used to represent numbers in digital
computers. We look at how numbers are converted from our familiar decimal (or denary) form to
binary form and vice versa. Binary arithmetic is useless without the hardware needed to
implement it, so we examine some of the circuits of adders and subtractors. We also introduce
error-detecting codes, which enable the computer to determine whether data has been corrupted
(i.e. inadvertently modified). Other topics included here are ways in which we represent and
handle negative as well as positive numbers. We look at the way in which the computer deals with
very large and very small numbers by means of a system called floating point arithmetic. Finally,
we describe how computers carry out multiplication and division—operations that are much
more complex than addition or subtraction.
We should stress that error-detecting codes, data compressing codes, and computer arithmetic
are not special properties of the binary representation of data used by computers. All these
applications are valid for any number base. The significance of binary arithmetic is its elegance and
simplicity.
146 Chapter 4 Computer arithmetic
4.1 Bits, bytes, words, and characters As Humpty Dumpty said to Alice in Through the Looking
Glass, ‘When I use a word,’ Humpty Dumpty said, ‘in a rather
The smallest quantity of information that can be stored and scornful tone, ‘it means just what I choose it to mean—
manipulated inside a computer is the bit, which can take the neither more nor less.’
value 0 or 1. Digital computers store information in the form The following are some of the entities that a word may
of groups of bits called words. The number of bits per word represent.
varies from computer to computer. A computer with a 4-bit An instruction An instruction or operation to be per-
word is not less accurate than a computer with a 64-bit word; formed by the CPU is represented by a binary pattern such as
the difference is one of performance and economics. 00111010111111111110000010100011. The relationship
Computers with small words are cheaper to construct than between the instruction’s bit pattern and what it does is arbit-
computers with long words. Typical word lengths of com- rary and is determined by the computer’s designer. A partic-
puter both old and new are ular sequence of bits that means add A to B on one computer
might have an entirely different meaning on another com-
Cray-1 supercomputer 64 bits puter. Instructions vary in length from 8 to about 80 bits.
ICL 1900 series mainframe 24 bits
UNIVAC 1100 mainframe 36 bits A numeric quantity A word, either alone or as part of a
PDP-11 minicomputer 16 bits sequence of words, may represent a numerical quantity.
VAX minicomputer 32 bits
The first microprocessor (4004) 4 bits Bits (n) Patterns 2n Values
First-generation microprocessors 8 bits
1 2 0, 1
8086 microprocessor 16 bits
Third-generation microprocessors 32 bits 2 4 00, 01, 10, 11
Fourth-generation microprocessors 64 bits 3 8 000, 001, 010, 011, 100, 101, 110, 111
Special-purpose graphics processors 128 bits 4 16 0000, 0001, 0010, 0011, 0100, 0101, 0110,
0111, 1000, 1001, 1010, 1011, 1100, 1101,
A group of 8 bits has come to be known as a byte. 1110, 1111
Today’s microprocessors and minicomputers are byte ori-
ented with word lengths that are integer multiples of 8 bits Table 4.1 The relationship between the number of bits in a word
(i.e. their data elements and addresses are 8, 16, 32, or 64 bits). and the number of patterns.
A word is spoken of as being 2, 4, or 8 bytes long, because its
bits can be formed into two, four, or eight groups of 8 bits,
respectively.1 1
Some early computers grouped bits into sixes and called them bytes.
An n-bit word can be arranged into 2n unique bit patterns Computer science uses flexible jargon where a term sometimes has differ-
as Table 4.1 demonstrates for n 1, 2, 3, and 4. So, what do ent meanings in different contexts; for example, some employ the term
the n bits of a word represent? The simple and correct answer word to mean a 16-bit value and longword to mean a 32-bit value. Others
use the term word to refer to a 32-bit value and halfword to refer to a
is nothing, because there is no intrinsic meaning associated 16-bit value. Throughout this text we will use word to mean the basic
with a pattern of 1s and 0s. The meaning of a particular unit of information operated on by a computer except when we are
pattern of bits is the meaning given to it by the programmer. describing the 68K microprocessor.
4.1 Bits, bytes, words, and characters 147
Numbers can be represented in one of many formats: BCD To convert an ASCII character into its 7-bit binary code,
integer, unsigned binary integer, signed binary integer, BCD you read the upper-order three bits of the code from the col-
floating point, binary floating point, complex integer, com- umn in which the character falls and the lower-order four bits
plex floating point, double precision integer, etc. The mean- of code from the row. Table 4.2 numbers the rows and
ing of these terms and the way in which the computer carries columns in both binary and hexadecimal forms (we’ll intro-
out its operations in the number system represented by the duce hexadecimal numbers shortly); for example, the ASCII
term is examined later. Once again we stress that the byte representation of the letter ‘Z’ is given by 5A16 or 10110102.
10001001 may represent the value 119 in one system, 137 in Because most computers use 8-bit bytes, the ASCII code for
another system, and 89 in yet another system. We can think of ‘Z’ would be 01011010. If you wish to print the letter ‘Z’ on a
a more human analogy. What is GIFT? To a Bulgarian it printer, you send the ASCII code for Z, 01011010, to the
might be their login password; to an American it might be printer.
something to look forward to on their birthday; to a German The ASCII codes for the decimal digits 0, 1, 2, 3, 4, 5, 6, 7, 8,
it is something to avoid because it means poison. Only the and 9, are 3016, 3116, 3216, 3316, 3416, 3516, 3616, 3716, 3816, and
context in which GIFT is used determines its meaning. 3916, respectively. For example, the symbol for the number 4 is
A character The alphanumeric characters (A to Z, a to z, represented by the ASCII code 001101002, whereas the binary
0 to 9) and the symbols *, , , !, ?, etc. are assigned binary value for 4 is represented by 000001002. When you hit the key
patterns so that they can be stored and manipulated within ‘4’ on a keyboard, the computer receives the input 00110100
the computer. The ASCII code (American Standard Code and not 00000100. Input from a keyboard or output to a dis-
for Information Interchange) is widely used throughout play must be converted between the codes for the numbers
the computer industry to encode alphanumeric characters. and the values of the numbers. In high-level language this
Table 4.2 defines the relationship between the bits of the translation takes place automatically.
ASCII code and the character it represents. This is also called The two left-hand columns of Table 4.2, representing
the ISO 7-bit character code. ASCII codes 0000000 to 0011111, don’t contain letters, num-
The ASCII code represents a character by 7 bits, allowing a bers, or symbols. These columns are non-printing codes that
maximum of 27 128 different characters. 96 characters are are used either to control printers and display devices or to
the normal printing characters. The remaining 32 characters control data transmission links. Data link control characters
are non-printing characters that carry out special functions, such as ACK (acknowledge) and SYN (synchronous idle) are
such as carriage return, backspace, line feed, etc. associated with communications systems that mix the text
0 1 2 3 4 5 6 7
000 001 010 011 100 101 110 111
being transmitted with the special codes used to regulate the or approximately 8 Mbytes. Typical high-quality color video
flow of the information. displays have a resolution of 1600 by 1 200 (i.e. 221 pixels) per
The 7-bit ASCII code has been extended to the 8-bit frame. These values explain why high-quality computer
ISO 8859-1 Latin code to add accented characters such as graphics requires such expensive hardware to store and
Å, ö, and é. Although suited to Europe and the USA, manipulate images in real time. There are techniques for com-
ISO 8859-1 can’t deal with many of the World’s languages. A pressing the amount of storage required by a picture. Some
16-bit code, called Unicode, has been designed to represent techniques operate by locating areas of a constant color and
the characters of most of the World’s written languages such intensity and storing the shape and location of the area and its
as Chinese and Japanese. The first 256 characters of Unicode color. Other techniques such as JPEG work by performing a
map onto the ASCII character set, making ASCII to Unicode mathematical transformation on an image and deleting data
conversion easy. The programming language Java has that contributes little to the quality of the image.
adopted Unicode as the standard means of character
representation.
A picture element One of the many entities that have to be 4.2 Number bases
digitally encoded is the picture or graphical display. Pictures
vary widely in their complexity and there are a correspond- Our modern number system, which includes a symbol to
ingly large number of ways of representing pictorial informa- represent zero, was introduced into Europe from the
tion. For example, pictures can be parameterized and stored Hindu–Arabic world in about 1400. This system uses a posi-
as a set of instructions that can be used to recreate the image tional notation to represent decimal numbers. By positional
(i.e. the picture is specified in terms of lines, arcs, and poly- we mean that the value or weight of a digit depends on its
gons and their positions within the picture). When the pic- location within a number. In our system, when each digit
ture is to be displayed or printed, it is recreated from its moves one place left, it is multiplied by 10 (the base or radix).
parameters. Thus, the 9 in 95 is worth 10 times the 9 in 49. Similarly, a
A simple way of storing pictorial information is to employ digit is divided by 10 when moved one place right (e.g. con-
symbols that can be put together to make a picture. Such an sider 0.90 and 0.09).
approach was popular with the low-cost microprocessor sys- If the concept of positional notation seems obvious and not
tems associated with computer games where the symbols worthy of mention consider the Romans. They conquered
were called sprites. most of the known world, invented Latin grammar, wrote the
Complex pictures can be stored as a bit-map (an array of screenplays of many Hollywood epics, and yet their math-
pixels or picture elements). By analogy with the bit, a pixel is ematics was terribly cumbersome. Because the Roman World
the smallest unit of information of which a picture is com- did not use a positional system to represent numbers, each
posed. Unlike a bit, the pixel can have attributes such as color. new large number had to have its own special symbol. Their
If we wish to store a 10 in 8 in image at a reasonably high number system was one of give and take so that if X 10 and
resolution of 300 pixels/in in both the horizontal and vertical I 1, then XI 11 (i.e. 10 1) and IX 9 (i.e. 10 1). The
axes, we require (10 300) (8 300) 7 200 000 pixels. decimal number 1970 is represented in Roman numerals by
If the picture is in color and each pixel has one of 256 differ- MCMLXX (i.e. 1000 (1000100) (50 10 10)).
ent colors, the total storage requirement is 8 7 200 000 bits, The Romans did not have a symbol for zero.
The number base lies at the heart of both conventional and In base seven arithmetic, the number 123 is equal to the
computer arithmetic. Humans use base 10 and computers decimal number 1 72 2 71 3 70 49 14
use base 2. We sometimes use other bases even in our every- 3 66. Because we are talking about different bases in this
day lives; for example, we get base 60 from the Babylonians chapter, we will sometimes use a subscript to indicate the base;
(60 seconds 1 minute and 60 minutes 1 hour). We can for example, 1237 6610.
express the time 1:2:3 (1 hour 2 minutes 3 seconds) as We should make it clear that we’re talking about natural
1 60 60 2 60 3 seconds. Similarly, we occasion- positional numbers with positional weights of 1, 10, 100,
ally use the base 12 (12 1 dozen, 12 12 1 gross). 1000, . . . (decimal) or 1, 2, 4, 8, 16, 32, . . . (binary). The
Indeed, the Docenal Society of America exists to promote the weight of a number is the value by which it is multiplied by
base 12 (also called duodecimal). virtue of its position in the number. It’s perfectly possible to
We now examine how a number is represented in a general have weightings that are not successive powers of an integer;
base using positional notation. Integer N, which is made up for example, we can choose a binary weighting of 2, 4, 4, 2
of n digits can be written in the form which means that the number 1010 is interpreted as
1 2 0 4 1 4 0 2 6.
an1an2...a1a0 We are interested in three bases: decimal, binary, and hexa-
decimal (the term hexadecimal is often abbreviated to hex).
The ais that make up the number are called digits and can take Although some texts use base-8 octal arithmetic, this base is
one of b values (where b is the base in which the number is ill-fitted to the representation of 8- or 16- bit binary values.
expressed). Consider the decimal number 821 686, where the Octal numbers were popular when people used 12-, 24- or
six digits are a0 6, a1 8, a2 6, a3 1, a4 2, and 36-bit computers. We do not discuss octal numbers further.
a5 8, and these digits are taken from a set of 10 symbols Table 4.3 shows the digits used by each of these bases. Because
{0 to 9}. the hexadecimal base has 16 digits, we use the letters A to F to
The same notation can be used to express real values by indicate decimal values between 10 and 15.
using a radix point (e.g. decimal point in base 10 arithmetic or People work in decimal and computers in binary. We use
binary point in binary arithmetic) to separate the integer and base 10 because we have 10 fingers and thumbs. The hexa-
fractional parts of the number. The following real number decimal system is used by people to handle computer arithm-
uses n digits to the left of the radix point and m digits to the etic. By converting binary numbers to hexadecimal form (a
right. very easy task), the shorter hexadecimal numbers can be
more readily remembered. For example, the 8-bit binary
an1 an2 ...a1 a0.a1a2am
number 10001001 is equivalent to the hexadecimal number
The value of this number, expressed in positional notation in 89 which is easier to remember than 10001001. Because hexa-
the base b is written, is defined as decimal numbers are more compact than binary numbers (1
hexadecimal digit 4 binary digits), they are used in com-
puter texts and core-dumps. The latter term refers to a print-
N an1bn1... a1b1 a0b0 a1b1 out of part of the computer’s memory, an operation normally
a2b2... ambm performed as a diagnostic aid.
i n1 There are occasions where binary numbers offer people
advantages over other forms of representation. Suppose a
兺aibi computer-controlled chemical plant has three heaters, three
i m valves, and two pumps, which are designated H1, H2, H3, V1,
V2, V3, P1, P2, respectively. An 8-bit word from the computer
The value of a number is equal to the sum of its digits, each is fed to an interface unit that converts the binary ones and
of which is multiplied by a weight according to its position in zeros into electrical signals to switch on (logical one) or
the number. Let’s look at some examples of how this switch off (logical zero) the corresponding device. For
formula works. The decimal number 1982 is equal example, the binary word 01010011 has the effect described
to 1 103 9 102 8 101 2 100 (i.e. one thou- in Table 4.4 when presented to the control unit:
sand nine hundreds eight tens two). Similarly, 12.34
is equal to 1 101 2 100 3 101 4 02. The
Decimal b 10 a {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
value of the binary number 10110.11 is given by
Binary b2 a {0,1}
1 24 0 23 1 22 1 21 0 20 1 21
1 22, or, in decimal, 16 4 2 0.5 0.25 22.75. Hexadecimal b 16 a {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F}
Remember that the value of r 0 is 1 (i.e. any number to the
power zero is 1). Table 4.3 Three number bases.
150 Chapter 4 Computer arithmetic
RULES OF ARITHMETIC
If there’s one point that we would like to emphasize here, it’s decimal, 1 52 2 51 3 50 3810 and
that the rules of arithmetic are the same in base x as they are 2 52 2 51 1 50 6110, respectively. Let’s add both
in base y. All the rules we learned for base 10 arithmetic can pairs of numbers together using the conventional rules of
be applied to base 2, base 16, or even base 5-arithmetic. For arithmetic in base 5 and base 10.
example, the base 5 numbers 1235 and 2215 represent, in
Base 5 Base 10
123 38
221 61
344 99
If we add 1235 to 2215 we get 3445, which is equal to the decimal number 3 52 4 51 4 50 9910. Adding the
decimal numbers 3810 and 6110 also gives us 9910.
4.3 Number base conversion 151
For example, 24510 becomes A more methodical technique is based on a recursive algo-
rithm as follows. Take the leftmost non-zero bit, double it,
and add it to the bit on its right. Now take this result, double
it, and add it to the next bit on the right. Continue in this way
until the least-significant bit has been added in. The recursive
procedure may be expressed mathematically as
(a02(a12(a2⋅ ⋅ ⋅)))
For example, 24E16 becomes For example, consider the conversion of 0.011012 into
decimal form.
Consider the following example where we convert 0.12316 4.4.1 BCD codes
into a decimal fraction.
A common alternative to natural binary arithmetic is called
BCD or binary-coded decimal. In theory BCD is a case of hav-
ing your cake and eating it. We have already stated that com-
puter designers use two-state logic elements on purely
economic grounds. This, in turn, leads to the world of binary
arithmetic and the consequent problems of converting
between binary and decimal representations of numeric
quantities. Binary-coded decimal numbers accept the
inevitability of two-state logic by coding the individual deci-
mal digits into groups of four bits. Table 4.5 shows how the
10 digits, 0 to 9, are represented in BCD, and how a decimal
number is converted to a BCD form.
BCD arithmetic is identical to decimal arithmetic and
differs only in the way the 10 digits are represented. The
following example demonstrates how a BCD addition is
carried out.
Decimal BCD
Binary to hexadecimal fraction conversion and
vice versa
The conversion of binary fractions to hexadecimal bases is as
easy as the corresponding integer conversions. The only point
worth mentioning is that when binary digits are split into As you can see, the arithmetic is decimal with the digits 0 to
groups of four, we start grouping bits at the binary point and 9 represented by 4-bit codes. When 6 is added to 8 (i.e. 0110
move to the right. Any group of digits remaining on the right to 1000), the result is not the binary value 1110, but the
containing fewer than 4 bits must be made up to 4 bits by the decimal 6 8 14 01002 (i.e. 4) carry 1.
addition of zeros to the right of the least-significant bit. The
following examples illustrate this point.
Although BCD makes decimal to binary conversion easy, it In a unit distance code, the Hamming distance between
suffers from two disadvantages. The first is that BCD arith- consecutive code words is equal to one, and no two consecu-
metic is more complex than binary arithmetic simply tive code words differ in more than one bit position. Natural
because the binary tables (i.e. addition, subtraction, multipli- binary numbers are not unit distance codes; for example, the
cation, and division) can be implemented in hardware by a sequential values 01112 7 and 10002 8 differ by a
few gates. The decimal tables involve all combinations of the Hamming distance of four. The most widely encountered
digits 0 to 9 and are more complex. Today’s digital technology unit distance code is the Gray code, the first 16 values of
makes these disadvantages less evident than in the early days which are given in Table 4.6. Figure 4.1 illustrates the timing
of computer technology where each gate was an expensive diagrams of 4-bit binary and 4-bit Gray counters. As you can
item. see, only one bit makes a change at each new count of the
BCD uses storage inefficiently. A BCD digit requires 4 bits Gray counter.
of storage but only 10 symbols are mapped onto 10 of the 16
possible binary codes making the binary codes 1010 to 1111
(10 to 15) redundant and wasting storage. As we demon-
strated earlier, natural binary numbers require an average of Decimal value Natural binary value Gray code
approximately 3.3 bits per decimal digit. In spite of its dis-
DCBA DCBA
advantages, BCD arithmetic can be found in applications
0 0000 0000
such as pocket calculators or digital watches. Some micro-
processors have special instructions to aid BCD operations. 1 0001 0001
There are other ways of representing BCD numbers in 2 0010 0011
addition to the BCD code presented above. Each of these 3 0011 0010
codes has desirable properties making it suitable for a partic- 4 0100 0110
ular application (e.g. the representation of negative num- 5 0101 0111
bers). These BCD codes are not relevant to this text. 6 0110 0101
7 0111 0100
8 1000 1100
4.4.2 Unweighted codes
9 1001 1101
The binary codes we’ve just described are called pure binary, 10 1010 1111
natural binary, or 8421 weighted binary because the 8, 4, 2, 11 1011 1110
and 1 represent the weightings of each of the columns in the
12 1100 1010
positional code. These are not the only types of code avail-
13 1101 1011
able. Some positional codes don’t have a natural binary
14 1110 1001
weighting, other codes are called unweighted because the
value of a bit doesn’t depend on its position within a number. 15 1111 1000
Each of the many special-purpose codes has properties that
make it suitable for a specific application. One such Table 4.6 The 4-bit Gray code (an unweighted unit distance
unweighted code is called a unit distance code. code).
HAMMING DISTANCE
The Hamming distance between two words is the number of places (i.e. positions) in which their bits differ. Consider the
following five pairs of words.
Two m-bit words have a zero Hamming distance if they are the same and an m-bit distance if they are logical complements.
4.4 Special-purpose codes 155
The Gray code is used by the optical encoder, a mechanism directly opposite each light source. For any position of the
for converting the angle of a shaft or spindle into a binary disk, a particular combination of the photoelectric cells
value. An optical encoder allows you to measure the angular detects a light beam, depending on whether or not
position of a shaft electronically without any physical con- there is a transparent sector between the light source and
nection between the shaft and the measuring equipment. A detector.
typical example of an optical encoder is found in an auto- A natural binary code can create problems when more
mated weather reporting system. The direction from which than one bit of the output code changes as the shaft rotates
the wind is blowing is measured by one of the World’s oldest from one code to the next. The photoelectric cells may not be
instruments, the weather vane. The weather vane is mounted perfectly aligned; the light source isn’t a point source; and the
on a shaft connected to an optical encoder, which provides edges of the sectors don’t have perfectly straight edges. When
the angle of rotation (i.e. wind direction) as a digital signal. the disk rotates from one sector to the next and two or three
Figure 4.2 shows an optical encoder using a natural binary bits change state, one bit may change slightly before the other.
code and Figure 4.3 shows the same arrangement but with For example, the change from the natural binary code 001 to
a Gray-encoded disk. A transparent glass or plastic disk is 010 might be observed as the sequence 001, 000, 010. Because
attached to the shaft whose angular position is to be measured. the least-significant bit changes before the middle bit, the spu-
As you can see, the disk is covered with concentric tracks, one rious code 000 is generated momentarily. In some applica-
for each of the bits in the code representing the position of the tions this can be very troublesome. Figure 4.3 demonstrates
shaft. A 4-bit code might be suitable for a wind direction indi- that a Gray-encoded disk has the property that only one bit at
cator, whereas a 10-bit code may be required to indicate the a time changes, solving the problems inherent in the natural
position of a shaft in a machine. Each of these tracks is divided binary system. Once the Gray code has been read into a digi-
into sectors that are either opaque or transparent. tal system it may be converted into a natural binary code for
A light source is located on one side of the disk over each processing in the normal way. The EOR gate logic of Fig. 4.4
track. A photoelectric sensor is located on the other side converts between Gray codes and natural binary codes.
156 Chapter 4 Computer arithmetic
Light
sources
1 0 Binary
1 0 D output
1 0
i
s
01 1 k Sector Angle Binary code
1 00
0 0 – 45 000
Spindle
1 45 – 90 001
Binary-encoded disk 2 010
90 – 135
3 135 – 180 011
1 01
10 0 4 180 – 225 100
5 225 – 270 101
0 6 270 – 315 110
1
0 1 7 315 – 360 111 Figure 4.2 A natural
0 1 Bit 0 binary-encoded optical
Bit 2 Bit 1 encoder.
Light
sources
0 0 Binary
0 0 output
1 0
Sector Angle Binary code
1 1
01 00 0 0 – 45 000
1 45 – 90 001
Gray-encoded disk 2 90 – 135 011
3 135 – 180 010
1 01 4 180 – 225 110
11 1 5 225 – 270 111
6 270 – 315 101
1 0 7 315 – 360 100
1 1
0 0 Bit 0 Figure 4.3 A Gray-encoded
Bit 2
Bit 1 optical encoder.
g4 g3 g2 g1 g0 b4 b3 b2 b1 b0
Figure 4.4 Converting binary codes to Gray codes and vice versa.
4.5 Error-detecting codes electronic system there are always unwanted random signals,
collectively called noise which may interfere with the correct
In an ideal world, errors don’t occur. In reality a bit that operation of the system. These random signals arise from a
should be a 1 sometimes gets changed into a 0, and a bit that variety of causes, ranging from the thermal motion of elec-
should be a 0 sometimes gets changed into a 1. In any trons in a digital system, to electromagnetic radiation from
4.5 Error-detecting codes 157
nearby lightning strikes and power line transients caused threes codes. We also introduce the error-correcting code
by the switching of inductive loads (e.g. starting motors in (ECC), which can correct one or more errors in a corrupted
vacuum cleaners or elevators). The magnitude of these word. Of course the ECC is also an EDC whereas the EDC is
unwanted signals is generally tiny compared with digital sig- not necessarily an ECC.
nals inside the computer. The two electrical signal levels Before we can discuss EDCs and ECCs we must introduce
representing the zero and one binary states are so well sepa- two terms: source word and code word. A source word is an
rated that one level is almost never spontaneously converted unencoded string of bits and a code word is a source word
into the other level inside a digital computer under normal that has been encoded. For example, the source code 10110
operating conditions. might be transformed into the code word 111000111111000
We can use some of the properties of binary numbers to by triplicating each bit.
detect errors, or even to correct errors. Suppose we take the In order to create an error-detecting code, we have to con-
binary pattern 01101011 and ask whether there is an error in struct a code in such a way that an error always leaves
it. We can’t answer this question, because one binary pattern a noticeable trace or marker. An error-correcting code
is just as good as another. Now consider the word ‘Jamuary’. increases the length of the source code by adding one (or
You will immediately realize that it should be spelt January, more) redundant bits, so called because they carry no new
because there is no word ‘Jamuary’ in the English language. information. Figure 4.5 demonstrates how r redundant bits
You can correct this spelling error because the closest valid are added to an m-bit source word to create an (m r)-bit
word to ‘Jamuary’ is January. It’s exactly the same with binary code word. The redundant bits are also called check bits
codes. because they are used to check whether the code word is valid
Error-detecting codes, (EDCs), can detect that a word has or not. Note that the check bits can be interleaved throughout
been corrupted (i.e. changed). The subject of error-detecting the word and don’t have to be located together as Fig. 4.5
codes is large enough to fill several textbooks. Here we look at shows.
Code word
r redundant bits m data bits
Source word
dm–1 d1 d0 dm–1 d1 d0
101
100
Table 4.7 Odd and even parity codes.
The black circles
represent valid code
words with even
parity only one bit changes), the codeword changes from one of the
010 011
black circles to one of the white circles one unit length away.
The blue circles
represent invalid Because these code words all have an odd parity, you can
code words (error always detect a single error. Two errors (or any even number
states) with of errors) cannot be detected because you move from one
000 001
odd parity
valid code word to another valid codeword. Fortunately, if
one error is a rare event, two errors are correspondingly rarer
Figure 4.7 A 3-bit error detecting code
(unless the nature of the error-inducing mechanism affects
more than one bit at a time).
If you detect an error, you must ask for the correct data to
4.5.1 Parity EDCs be retransmitted (if the corruption occurred over a data
The simplest error-detecting code is called a parity check code. link). If the data was stored in memory, there’s little you can
We take an m-bit source word and append a parity bit to the do other than to tell the operating system that something has
source word to produce an (m 1)-bit codeword. The parity gone wrong.
bit is chosen to make the total number of 1s in the code word Table 4.7 gives the eight valid code words for a three-bit
even (an even parity bit) or odd (an odd parity bit). We will source word, for both even and odd parities. In each case the
assume an even parity bit here. parity bit is the most-significant bit.
Figure 4.6 shows an 8-bit source word, 01101001, which is As an example of the application of check bits consider a
converted into a 9-bit code with even parity. This binary simple two-digit decimal code with a single decimal check
string has a total of four 1s, so the parity bit must be selected digit. The check digit is calculated by adding up the two
as 0 to keep the total number of 1s even. Assuming that the source digits modulo 10 (modulo 10 simply means that we
parity bit is appended to the left-hand end of the string, the ignore any carry when we add the digits; for example, the
9-bit code word is 001101001. If, when this value is stored in modulo 10 value of 6 7 is 3). If the two source digits are 4
memory or transmitted over a data link, any one of the bits is and 9, the code word is 493 (the check digit is 3). Suppose that
changed, the resulting parity will no longer be even. Imagine during transmission or storage the code word is corrupted
that bit 2 is changed from 0 to 1 and the code word becomes and becomes 463. If we re-evaluate the check digit we get 4
001101101. This word now contains five 1s, which indicates 6 10 0 (modulo 10). As the recorded check digit is 3, we
an error, because the parity is odd. know that an error must have occurred.
Figure 4.7 provides a graphical representation of a 2-bit
source word with an even parity bit. Although 3 bits provide
eight possible binary values, only four of them are valid code
4.5.2 Error-correcting codes
words with an even parity (i.e. the black circles representing We can design error-detecting and-correcting codes to both
codes 000, 101, 110, 011). As you can see, a valid code word is locate and fix errors. Figure 4.8 illustrates the simplest 3-bit
separated from another nearest valid code word by two unit error detecting and correcting code where only code words
lengths. In Fig. 4.7 a unit length corresponds to an edge of the 000 and 111 are valid. The Hamming distance between these
cube. If one of the valid codewords suffers a single error (i.e. two valid code words is 3. Suppose that the valid code word
4.5 Error-detecting codes 159
110 111
Figure 4.8 A 3-bit error-correcting code. Figure 4.9 The principle of the EDC.
A sphere represents a
region that encloses
code words that are
X A B Y closer to the valid
code word (the black
circle) than to all other
valid code words.
111 is stored in memory and later read back as 110. This code possible signals. Should a code word be received that is not
word is clearly invalid, because it is neither 000 nor 111. If you one of these 2m values, an error may be assumed.
examine Fig. 4.8, you can see that the invalid code word 110 If r check bits are added to the m message digits to create an
has a Hamming distance of 1 from the valid code word 111 n-bit code word, there are 2n 2mr possible code words.
and a Hamming distance 2 from the valid code word 000. An The n-dimensional space will contain 2m valid code words, 2n
error correcting code selects the correct code as the nearest possible code words and 2n2m 2m(2nm1) 2m(2r1)
valid code word to the invalid code word. We assume that the error states.
valid code word was 111—we have now both detected an If we read a word from memory or from a communication
error and corrected it. system, we can check its location within the n-dimensional
space. If the word is located at one of the 2m valid points we
How error-detecting codes work assume that it’s error free. If it falls in one of the 2n2m error
The idea behind EDCs is simple, as Fig. 4.9 demonstrates. An states, we can reject it.
incorrect code word has to be made to reveal itself. Assume Error-correcting codes require that all valid code words be
we transmit n-bit messages, where m bits are data bits and separated from each other by a Hamming distance of at least 3.
r nm bits are redundant check bits. Imagine an n-dimen- An error-correcting code tries to correct an error by selecting
sional space in which each point is represented by the value of the nearest valid code to the code word in error. Because valid
an n-bit signal. This n-dimensional space contains 2n possible codes are separated by a minimum of three units from each
elements (i.e. all the possible combinations of n bits). other, a single error moves a code word one unit from its cor-
However, an m-bit source code can convey 2m unique mes- rect value, but it remains two units from any other valid code
sages. In other words, only 2m signals are valid out of the 2n word. Figure 4.10 illustrates this concept.
160 Chapter 4 Computer arithmetic
Block parity error-correcting codes Figure 4.13 demonstrates the action of a block error-
The single parity-bit, error-detecting code can be extended to detecting code in the presence of a single error. A tick marks
create a block EDC (also called a matrix EDC). A block EDC each row or column where the parity is correct and a cross
uses two types of parity check bit: a vertical parity bit and a marks where it is not. In this example, the bit in error is
horizontal (or longitudinal) parity bit. Imagine a block of detected by the intersection of the row and column in which
data composed of a sequence of source words. Each source it creates a parity violation. Thus, although the word 1001 is
word can be written vertically to form a column and the received incorrectly as 1101 it can be corrected. Although the
sequence of source words can be written one after another to block parity code can detect and correct single errors, it can
create a block. Figure 4.11 demonstrates a simple block of six detect (but not correct) certain combinations of multiple
3-bit source words. error. Block EDCs/ECCs are sometimes found in data trans-
The source words are 110, 101, 001, 110, 101, and 010 and mission systems and in the storage of serial data on magnetic
have been written down as a block or matrix. We can generate tape.
a parity bit for each source word (i.e. column) and append it By detecting a parity error in a row, we can detect the posi-
to the bottom of each column to create a new row. Each of tion of the bit in error (i.e. in this case bit D1). By detecting a
these parity bits is called a vertical parity bit. Since a block of parity error in a column, we can detect the word in error
source words is made up of a number of columns, a parity (i.e. in this case word 3). Now we can locate the error, which
word can be formed by calculating the parity across the bits. is bit D1 of word 3. The error can be corrected by inverting
Each code word (i.e. column) in Fig. 4.12 is composed of four this bit.
bits: D0, D1, D2, and D3 (where D3 is the vertical parity bit).
We can now derive a horizontal parity bit by calculating the 4.5.3 Hamming codes
parity across the columns. That is, we create a parity bit across
all the D0s. Horizontal parity bits for D1, D2, and the vertical Hamming codes are the simplest class of error-detecting
parity bits, D3, can be generated in a similar way. Figure 4.12 and-correcting codes that can be applied to a single code
shows how the source words of Fig. 4.11 are transformed into word (in contrast with a block error-correcting code that is
a block error-detecting code. applied to a group of words). A Hamming code takes an
A vertical even parity bit has been appended to each col- m-bit source word and generates r parity check bits to create
umn to create a new row labeled D3. Similarly, a horizontal an n-bit code word. The r parity check bits are selected so that
parity bit has been added to each row to create a new column a single error in the code word can be detected, located, and
labeled word 7. therefore corrected.
D0 0 1 1 0 1 0
D1 1 0 0 1 0 1
D2 1 1 0 1 1 0 Figure 4.11 Six 3-bit words.
D0 0 1 1 0 1 0 1
D1 1 0 0 1 0 1 1
D2 1 1 0 1 1 0 0 Figure 4.12 Creating a block
D3 0 0 1 0 0 1 0 error-detecting code.
Hamming codes are designated Hn, m where, for example, 4.5.4 Hadamard codes
H7,4 represents a Hamming code with a code word of 7 bits
and a source word of 4 bits. The following sequence of bits Computer and communication systems designers employ a
represents a H7,4 code word: wide range of error-correcting codes and each code has its
own particular characteristics. As the mathematics of error-
Bit position 7 6 5 4 3 2 1 correcting codes is not trivial, we will demonstrate the con-
Code bit I4 I3 I2 C3 I1 C2 C1 struction of an error-correcting code that can be appreciated
without any math.
Ii source bit i, Cj check bit j. A Hadamard matrix of order n is written [H]n and has
The information (i.e. source word) bits are numbered some very interesting properties. All elements in a Hadamard
I1, I2, I3, and I4, and the check bits are numbered C1, C2, matrix are either 1 or 1 (this is still a binary or two-state
and C3. Similarly, the bit positions in the code word are system because we can write 1 and 1 instead of 0 and 1 with-
numbered from 1 to 7. The check bits are located in out loss of generality). The simplest Hadamard matrix is
binary positions 2i in the code word (i.e. positions 1, 2, written [H]2 and has the following value:
and 4). Note how the check bits are interleaved with the
source code bits.
The three check bits are generated from the source word
according to the following parity equations.
An interesting property of the Hadamard matrix is that
C3I2⊕I3⊕I4
a 2n x 2n Hadamard matrix [H]2n can be derived from the
C2I1⊕I3⊕I4
n x n Hadamard matrix [H]n by means of the expansion
C1I1⊕I2⊕I4
The new check bits are 1, 0, 0 and the stored check bits are 0, Can you see any pattern in the matrix emerging? Let’s
1, 0. If we take the exclusive OR of the old and new check construct a Hadamard matrix of the order eight, [H]8, by
bits we get 1 ⊕ 0, 0 ⊕ 1, 0 ⊕ 0 1, 1, 0. The binary value 110
taking the value of [H]4 and using the expansion to
expressed in decimal form is 6 and points to bit position 6 in
the code word. It is this bit that is in error. How does a generate [H]8.
Hamming code perform this apparent magic trick? The
answer can be found in the equations for the parity check
bits. The check bits are calculated in such a way that any
single bit error will change the particular combination of
check bits that points to its location.
The Hamming code described above can detect and cor-
rect a single error. By adding a further check bit we can create
a Hamming code that can detect two errors and correct one
error.
162 Chapter 4 Computer arithmetic
A sphere represents a
region that encloses
code words that are
X A B C Y
closer to the valid
code word (the black
circle)than to all other
valid code words.
Invalid states
Valid code word A, B, and C Valid code word
State A is closer to code word X Figure 4.14 Adjacent code
than code word Y words in a 4-unit code.
If you inspect the rows of this Hadamard matrix of the order and 1 1 1 1 1 1 1 1. The first code word has a
eight, you find that each row has a Hamming distance of 4 unit distance of 4 from the second code word. If the first code
from each of the other seven rows. The Hamming distance word were converted into the second code word, it might go
between any row and all other rows of a Hadamard matrix of through the intermediate error states 1 1 1 1
the order n is n/2. 1 1 1 1 , 1 1 1 1 1 1 1 1, 1 1 1
Let’s now use the Hadamard matrix of the order eight to 1 1 1 1 1, and 1 1 1 1 1 1 1 1.
transform a 3-bit source code into an 8-bit code word. The Let’s look at this code in more detail. Figure 4.14 illustrates
matrix for [H]8 has eight rows, so that a 3-bit source code can two adjacent valid codewords, X and Y, generated by a [H]8
be used to select one of the eight possible rows. Table 4.8 is a matrix, which are, of course, separated by a Hamming dis-
simple copy of an [H]8 matrix in which the rows of the matrix tance of 4. The intermediate error states between X and Y (i.e.
are numbered 0 to 7 to represent source codes 000 to 111. the invalid codewords) are labeled A, B, and C. Each state is
Suppose you want to encode the 3-bit source code 011 separated by 1 unit distance from its immediate neighbors. As
using [H]8. The source word is 011, which corresponds to the you can see, error state A is closer to valid code word X than
8-bit code word 1 1 1 1 1 1 1 1 (or 10011001 to the next nearest code word, Y. Similarly, error state C is
expressed in conventional binary form) on row 3 of Table 4.8. closer to valid code word Y than to any other valid code word.
This code word has three information bits and five redundant Error state B is equidistant from the two valid code words and
bits. These five redundant bits enable us to both detect and cannot be used to perform error correction. Two errors in a
correct an error in the code word. code word are therefore detectable but not correctable,
The most important property of the Hadamard matrix of because the resulting error state has a Hamming distance of 2
the order n is that each row differs from all other rows in from the correct state and 2 from at least one other valid state.
exactly n/2 bit positions. The rows of an [H]8 matrix differ Suppose that the code word 011 is transformed into the
from each other in four bit positions; that is, the minimum 8-bit Hadamard code 10011001 and an error occurs in storage
Hamming distance between code words is 4. We have already (or transmission) to give a new incorrect value 10011101 (we
demonstrated that a code with a minimum Hamming dis- have made an error in bit 2). We detect and correct the error
tance of 3 can both detect and correct errors. Consider, for by matching the new code word against all the valid code
example, the valid code words 11 1 1 1 1 1 1 words.
4.6 Data-compressing codes 163
Item Code
Potatoes 00
Onions 01
Beans 10
Avocado pears 11
Target
If there are n transactions, the total storage required to The first (leftmost) bit of the string is 0. From the trellis we
record them is 2n bits. At first sight it would seem that there’s can see that a first bit 0 leads immediately to the terminal
no way the grocer can get away with less than two bits to node 0. Thus, the first code is 0. Similarly, the second code is
encode each transaction. However, after a little thought, the also 0. The third code begins with a 1, which takes us to a
grocer realizes that most customers buy potatoes and junction rather than to a terminal. We must examine another
therefore devises the encoding scheme of Table 4.12. bit to continue. This is also a 1, and yet another bit must be
Table 4.12 uses codes of different lengths. One code has a read. The third bit is a 0 leading to a terminal node 110. This
1-bit length, one has a 2-bit length, and two have 3-bit lengths. process can be continued until the string is broken down into
After a week’s trading, the total storage space occupied will be the sequence: 0 0 110 0 10 111 0 potatoes, potatoes, beans,
the number of transactions for each item multiplied by the potatoes, beans, avocados, potatoes.
length of its code. The average code length will be: Variations of this type code are used in data and program
compression algorithms to reduce the size of files (e.g. the
1 4 2 18 3 16 3 16 1.375
3 1 1
0 Terminal node
0 (Potatoes)
10 Terminal node
Start
0 (Onions)
1
110 Terminal node
(Beans)
0
of these letters in a long message. Such a table can be derived 4 0.0625 4 0.0625 1.875 bits per symbol. If the same
by obtaining the statistics from many messages. The values in five symbols had been coded conventionally, 3 bits would
this table are hypothetical and have been chosen to provide a have been required to represent 000 A to E 100. Huffman
simple example. encoding has reduced the average storage requirement from
Symbol A is the most common symbol (it occurs eight 3 to less than 2 bits per symbol.
times more frequently than symbol D) and we will give it the Now let’s look at a more complex example of Huffman
shortest possible code—a single bit. It doesn’t matter whether encoding. In this case we will use a wider range of symbols
we choose a 1 or a 0. We’ll represent A by 0. If symbol A is and avoid the easy numbers of the previous example (did you
represented by a single-bit code 0, what is represented by a 1? notice that all the probabilities were binary fractions?). In this
The answer is, all the remaining symbols. We therefore have case, we take 16 letters, A to P, and produce a table of relative
to qualify the code 1 by other bits in order to distinguish frequencies, (Table. 4.14). We have not used all 26 letters, to
between the remaining four symbols. keep the example reasonably simple. The relative frequencies
We will represent the next most common symbol B by the are made up.
code 10, leaving the code 11 to be shared among symbols C, Figure 4.19 shows how we can construct a Huffman tree
D, and E. Continuing in this manner, the code for symbol C is for this code. The letters (i.e. symbols) are laid out along the
110, for symbol D is 1110, and for symbol E is 1111. top with the relative frequencies underneath. The task is to
Figure 4.18 provides a trellis to illustrate how the symbols are draw a tree whose branches always fork left (a 1 path) or right
encoded. As you can see, we have now constructed a code in (a 0 path). The two paths are between branches of equal (or as
which symbol A is represented by a single bit, whereas symbol nearly equal as possible) relative frequency. At each node in
E is represented by four bits. the tree, a shaded box shows the cumulative relative fre-
Consider encoding the string BAEAABDA. We begin at the quency of all the symbols above that node. The node at the
point labeled start in Fig. 4.18 and follow the tree until we get bottom of the tree has a relative frequency equal to the sum of
to the terminal symbol (i.e. A, B, C, D, or E); for example, the all the symbols.
bit 0 takes us immediately to the terminal symbol A, whereas Consider the right-hand end of the tree. The symbols G
you have to take the path 1, 1, 1, 0 to reach the terminal and J each have a relative frequency 1 and are joined at a
symbol D. The encoding of the sequence BAEAABDA is node whose combined relative frequency is 2. This node is
therefore B 10, A 0, E 1111, A 0, A 0, B 10, combined with a path to symbol K that has a frequency 2
D 1110, A 0, to give the string 1001111001011100. (i.e. G and J are as relatively frequent as I). You derive the
In this example there are five symbols and the average
length of a message is 1 0.5 2 0.25 3 0.125
Symbol A B C D E F G H I J K L M N O P
Symbol A B C D E Relative frequency 10 3 4 4 12 2 1 3 3 1 2 6 4 5 5 3
Relative frequency 8 4 2 1 1
Table 4.14 The relative probability of symbols in a 16-symbol
Relative probability 0.5 0.25 0.125 0.0625 0.0625 alphabet.
Table 4.13 The relative frequency of symbols in an alphabet.
E A L N O C D M B H I P F K G J
12 10 6 5 5 4 4 4 3 3 3 3 2 2 1 1
A
1 0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 2
B 0
Start 10 8 7 6 5 4
1 0 1 0 1 0 1 0
C
1 16 18 13
9
1 0 Symbol Code 1
0 1 0
D A 0
B 10 34 22
1 C 110 0
0
D 1110
E 1111 46
1 0
1
E
68
Figure 4.18 A Huffman encoding tree. Figure 4.19 A Huffman encoding tree for a 16-symbol code.
4.6 Data-compressing codes 167
Symbol A B C D E F G H I J K L M N O P
Code 1011 0110 10001 10000 11 0010 00001 0101 0100 00000 0001 1010 0111 10011 10010 0011
(a) Block of 8 x 8 pixels (b) Dividing the block into four quadrants
2 3
F
P 0 1
Quadrant
numbering
(c) The top right-hand quadrant of (b) (d) The top right-hand quadrant of (c)
is divided into four quadrants is divided into four quadrants
E E
P F
E
P E F
(e) Bottom right-hand quadrant of (a) (f) Top right-hand quadrant of (e)
is divided into four quadrants is divided into four quadrants
P P F F
E F E F
codes for the letters by starting at the bottom-most node and Fig. 4.20(b) demonstrates (hence the term quadtree). As you
working back to the symbol (see Table. 4.15). can see from Fig. 4.20(b), the four quadrants have different
properties.
In the top left-hand quadrant, all the pixels are black and
4.6.2 Quadtrees are marked ‘F’ for full. In the bottom left-hand quadrant, all
An interesting data compression technique employs a data the pixels are white and are marked ‘E’ for empty. Each of the
structure called the quadtree, which is used to encode two- two right-hand quadrants contains a mixture of black and
dimensional images. Figure 4.20 illustrates an 8 8 pixel white pixels—these quadrants are marked ‘P’ for partially
image. This image can be divided into four quadrants, as occupied.
168 Chapter 4 Computer arithmetic
E, (E, F, (E, E, F, F), (E, F, F, F)), F, (E, (E, F, E, F), E, (E, F, E, F))
F E E F
E E E F
(E, F, E , F)
E F
E F
(E, F, F, F)
E F F
F
(E, E, F, F)
(E, F, (E, E, F, F), (E, F, F, F))
E F
F F
E E
The picture of Fig. 4.20(b) can be represented by its four substitute this in the expression for the image we get E, P, F, (E,
quadrants 0, 1, 2, 3 as E, P, F, P (see figure 4.20 for the (E, F, E, F), E, P). We can do the same thing to quadrant 1 of
quadrant numbering scheme). We can partially regenerate Fig. 4.20(c) to get: E, P, F, (E, (E, F, E, F), E, (E, F, E, F)). Now we
the image because we know that one quadrant is all black and have completely defined quadrant 3 of the original image.
another is all white. However, we don’t know anything about Continuing in this way and expanding quadrant 1 of
the two quadrants marked ‘P’. the original image, we get the expression E, (E, F, (E, E, F, F),
We can, however, subdivide partially filled quadrants 1 and (E, F, F, F)), F, (E, (E, F, E, F), E, (E, F, E, F)). All we have done
3 into further quadrants. Consider the upper right-hand is to divide an image into four quadrants and successively
quadrant of Fig. 4.20(b) (i.e. quadrant 3). This can be divided divided a quadrant into four quadrants until we reach that
into four quadrants as Fig. 4.20(c) demonstrates. We can point at which each quadrant contains only one color.
describe the structure of Fig. 4.20(c) by E, P, E, P. If we substi- Because many areas of an image contain the same color, the
tute this expansion of quadrant 3 in the original expression quadtree structure can compress the image. In the case of
for the image, we get: E, P, F, (E, P, E, P). Fig. 4.20 we have compressed a 64-element block into a string
We haven’t yet completely defined quadrant 3 of the image of 29 elements (the elements may be E, F, left bracket, or right
because there are still subdivisions marked ‘P’. Figure 4.20(d) bracket).
demonstrates how the top right-hand quadrant for Fig. 4.20(c) Figure 4.21 demonstrate the complete quadtree expansion
can be subdivided into the quadrants E, F, E, F. If we now of Fig. 4.20.
4.7 Binary arithmetic 169
The quad tree and the other compression techniques we’ve during compression and the original information can’t be
described are lossless encoding techniques because a file can restored. Lossy compression technology is used to compress
be compressed and restored with no loss of information (i.e. images and sound because humans can lose a lot of detail in
compress and decompress yields the original source). Some an image (or piece of music) without noticing missing it.
compressing techniques are lossy because information is lost Typical lossy compression techniques are MP3 (sound),
JPEG (still images), and MPEG (video).
0 1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
2 3 4 5 6 7 8 9 10 11 4.7 Binary arithmetic
3 4 5 6 7 8 9 10 11 12
Now that we’ve introduced binary numbers and demon-
4 5 6 7 8 9 10 11 12 13
strated how it’s possible to convert between binary and deci-
5 6 7 8 9 10 11 12 13 14
mal formats, the next step is to look at how binary numbers
6 7 8 9 10 11 12 13 14 15
are manipulated. Binary arithmetic follows exactly the same
7 8 9 10 11 12 13 14 15 16 rules as decimal arithmetic and all that we have to do to work
8 9 10 11 12 13 14 15 16 17 with binary numbers is to learn the binary tables. Table 4.16
9 10 11 12 13 14 15 16 17 18 gives the decimal addition tables and Table 4.17 gives the dec-
imal multiplication table. Table 4.18 gives the hexadecimal
Table 4.16 The decimal addition tables. multiplication table. Table 4.19 gives the binary addition,
subtraction, and multiplication tables. As you can see, these
0 1 2 3 4 5 6 7 8 9 are much simpler than their decimal equivalents.
1 1 2 3 4 5 6 7 8 9 A remarkable fact about binary arithmetic revealed by
2 2 4 6 8 10 12 14 16 18 Table 4.19 is that if we didn’t worry about the carry in addition
3 3 6 9 12 15 18 21 24 27 and the borrow in subtraction, then the operations of addition
and subtraction would be identical. Such an arithmetic in which
4 4 8 12 16 20 24 28 32 36
addition and subtraction are equivalent does exist and has some
5 5 10 15 20 25 30 35 40 45
important applications; this is called modulo-2 arithmetic.
6 6 12 18 24 30 36 42 48 54
Table 4.19 tells us how to add two single digits. We need to
7 7 14 21 28 35 42 49 56 63 add longer words. The addition of n-bit numbers is entirely
8 8 16 24 32 40 48 56 64 72 straightforward, except that when adding the two bits in each
9 9 18 27 36 45 54 63 72 81 column, a carry bit from the previous stage must also be
added in. Each carry bit results from a carry-out from the
Table 4.17 The decimal multiplication tables. column on its right. In the following example, we present the
0 01 02 03 04 05 06 07 08 9 0A 0B 0C 0D 0E 0F
1 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
2 02 04 06 08 0A 0C E 10 12 14 16 18 1A 1C 1E
3 03 06 09 0C 0F 12 15 18 1B 1E 21 24 27 2A 2D
4 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C
5 05 0A 0F 14 19 1E 23 28 2D 32 37 3C 41 46 4B
6 06 0C 12 18 1E 24 2A 30 36 3C 42 48 4E 54 5A
7 07 0E 15 1C 23 2A 31 38 40 46 4D 54 5B 62 69
8 08 10 18 20 28 30 38 40 48 50 58 60 68 70 78
9 09 12 1B 24 2D 36 3F 48 51 5A 63 6C 75 7E 87
A 0A 14 1E 28 32 3C 46 50 5A 64 6E 78 82 8C 96
B 0B 16 21 2C 37 42 4D 58 63 6E 79 84 8F 9A A5
C 0C 18 24 30 3C 48 54 60 6C 78 84 90 9C A8 B4
D 0D 1A 27 34 41 4E 5B 68 75 82 8F 9C A9 B6 C3
E 0E 1C 2A 38 46 54 62 70 7E 8C 9A A8 B6 C4 D2
F 0F 1E 2D 3C 4B 5A 69 78 87 96 A5 B4 C3 D2 E1
carries
A B S (sum) C (carry)
Addition Subtraction Multiplication
0 0 0 0
00 0 00 0 000
0 1 1 0
01 1 01 1 borrow 1 010
1 0 1 0
10 1 10 1 100
1 1 0 1
11 0 carry 1 11 0 111
PERFORMING ADDITION
The following sequence demonstrates binary addition. Each generated in the next column to the right (bold blue). Only
step shows the current carry in (light blue) and the carry out seven stages are shown in order to fit on the page.
4.7 Binary arithmetic 171
A B
A
B
Sum HA
S C out
Carry
Figure 4.23 The circuit representation of a half adder.
B 0 0 0 0 0
Sum 0 1 0 1 0
1 0 0 1 0
1 1 0 0 1
0 0 1 1 0
0 1 1 0 1
1 0 1 0 1
Carry
1 1 1 1 1
Sum When people perform an addition they deal with the carry
automatically, without thinking about it. More specifically they
say, ‘If a carry is generated we add it to the next column, if it is
not we do nothing.’ In human terms doing nothing and adding
B zero are equivalent. As far as the logic necessary to carry out the
Carry addition is concerned, we always add in the carry from the
previous stage, where the carry bit has the value 0 or 1.
The full adder, represented by the circuit symbol of
Figure 4.22 Three ways of implementing a half adder. Fig. 4.25, adds together two bits A and B, plus a carry-in Cin
from the previous stage, to generate a sum S and a carry-out
Cout. In other words, the full adder is a 3-bit adder. Table 4.21
4.7.2 The Full Adder provides the truth table for a full adder.
You can realize the circuit for a full adder by connecting two
Unfortunately, the half adder is of little use as it stands. When
half adders in tandem. Conceptually, a full adder requires that
two n-bit numbers are added together we have to take
the two bits of A and B be added together and then
account of any carry bits. Adding bits ai of A and bi of B
the carry-in is added to the result. Figure 4.26 shows a possible
together must include provision for adding in the carry bit
representation of the full adder in terms of two half adders.
ci⫺1 from the results of the addition in the column to the right
The sum output of the full adder is provided by the sum out-
of ai and bi. This is represented diagrammatically as
put of the second half adder, HA2. The carry-out from the full
adder, Cout, is given by ORing the carries from both half adders.
To demonstrate that the circuit of Fig. 4.26 does indeed per-
form the process of full addition a truth table may be used.
Table 4.22 provides a truth table for the circuit of Fig. 4.26.
As the contents of the S2 and Cout columns are identical to
This row consists of the those of the corresponding columns of the truth table for the
carry bits generated by full adder (Table 4.22), we must conclude that the circuit of
the columns on the right. Fig. 4.26 is indeed that of a full adder. Figure 4.27 demonstrates
172 Chapter 4 Computer arithmetic
Figure 4.24 Using Digital Works to implement and test a half adder circuit.
A B C in
A B Cin
FA
S C out
C1 S1
Figure 4.25 The circuit representation of a full adder.
Cin A B S1 C1 S2 C2 Cout
HA2
0 0 0 0 0 0 0 0
0 0 1 1 0 1 0 0
0 1 0 1 0 1 0 0
0 1 1 0 1 0 0 1
1 0 0 0 0 1 0 0
1 0 1 1 0 0 1 1
1 1 0 1 0 0 1 1
1 1 1 0 1 1 0 1
Cout S2
Table 4.22 Truth table for a full adder implemented by two half
adders. Fig. 4.26 Implementing a full adder using two half adders.
4.7 Binary arithmetic 173
Figure 4.27 Using Digital Works to implement and test a full adder built from two half adders.
the use of Digital Words to construct and simulate a full adder look at ways in which two n-bit numbers can be added
built from two half adders. together. We begin with the serial full adder and then describe
In practice the full adder is not implemented in this way the parallel full adder.
because the propagation path through the two half adders It is perfectly possible to add two n-bit numbers, A and B,
involves six units of delay. An alternative full adder circuit together, serially, a bit at a time by means of the scheme given
may be derived directly from the equations for the sum and in Fig. 4.29. The contents of the shift registers containing the
the carry from the truth table. Let the sum be S, the carry-out n-bit words A and B are shifted into the full adder a bit at a
Co, and the carry-in C. time. The result of each addition is shifted into a result (i.e.
sum) register. A single flip-flop holds the carry bit so that the
old carry-out becomes the next carry-in. After n clock pulses,
and the sum register, S, contains the sum of A and B. Serial adders
aren’t used today because parallel adders are much faster.
Parallel adders
A parallel adder adds the n bits of word A to the n bits of word
B in one simultaneous operation. Figure 4.30 describes a
The carry-out represents a majority logic function that is true parallel adder constructed from n full adders. The carry-out
if two or more of the tree inputs are true. The circuit diagram of from each full adder provides the carry-in to the stage on its
the full adder corresponding to the above equations is given in left. The term parallel implies that all n additions take place at
Fig. 4.28. This circuit contains more gates than the equivalent the same time and it’s tempting to think that the parallel
realization in terms of half adders (12 against 9) but it is faster. adder is n times faster than the corresponding serial adder. In
The maximum propagation delay is three gates in series. practice a real parallel adder is slowed down by the effect
of the carry-bit propagation through the stages of the full
adder.
4.7.3 The addition of words
Several points are worth noting in Fig. 4.30. You might
Even a full adder on its own is not a great deal of help, as we think that a half adder could replace the least-significant bit
normally wish to add two n-bit numbers together. We now stage because this stage doesn’t have a carry-in. However,
174 Chapter 4 Computer arithmetic
C in
A B C in
A
Carry
B
Sum
C in C out
Q D
The carry flip-flop stores the
C carry-out from the previous
Carry addition to generate the carry-in
flip-flop to the next addition.
Shift clock
n pulses per addition Figure 4.29 The serial adder.
........
by using a full adder for this stage, the carry-in may be set to Another feature of this circuit concerns the carry-out from
zero for normal addition, or it may be set to 1 to generate the most-significant bit stage. If two n-bit words are added
A B 1. If input B is set to zero, A 1 is generated and and the result is greater than 111 . . . 1, then a carry-out is
the circuit functions as an incrementer. A facility to add in 1 generated. As the computer cannot store words longer than
to the sum of A plus B will prove very useful when we come to n bits, the sum cannot be stored in the memory as a single
complementary arithmetic. entity. The carry-out of the most-significant stage may be
4.8 Signed numbers 175
latched into a flip-flop (normally forming part of the com- carry look ahead circuits can be used to anticipate a carry over
puter’s condition code register). When addition is performed a group of say four full adders. That is, the carry out to stage
by software as part of a program, it is usual for the program- i 5 is calculated by examining the inputs to stages i 4,
mer to test the carry bit to check whether the result has gone i 3, i 2, and i 1, and the carry in to stage i 1, by
out of range. means of a special high-speed circuit. This anticipated carry
A final point about the parallel adder concerns the mean- is fed to the fifth stage to avoid the delay that would be
ing of the term parallel. The first stage can add a0 to b0 to get incurred if a ripple-through carry were used. The exact
S0 as soon as A and B are presented to the input terminals of nature of these circuits is beyond the scope of this book.
the full adder. However, the second stage must wait for the
first stage’s carry-out to be added in to a1 plus b1 before it can
be sure that its own output is valid. In the worst case inputs of 4.8 Signed numbers
111 . . . 1 1, the carry must ripple through all the stages.
This type of adder is referred to as a ripple-carry adder. Any real computer must be able to deal with negative
The full adder we have described here is parallel in the sense numbers as well as positive numbers. Before we examine how
that all the bits of A are added to all the bits of B in a single the computer handles negative numbers, we should consider
operation without the need for a number of separate clock how we deal with them. I believe that people don’t, in fact,
cycles. Once the values of A and B have been presented to the actually use negative numbers. They use positive numbers
inputs of the full adders, the system must wait until the circuit (the 5 in 5 is the same as in 5), and place a negative sign
has had time to settle down and for all carries to propagate in front of a number to remind them that it must be treated
before the next operation is started. Figure 4.31 shows a ripple in a special way when it takes part in arithmetic operations. In
adder in more detail using the circuits we’ve developed before. other words, we treat all numbers as positive and use a sign
As you can see, the carry in has to ripple through successive (i.e. or ) to determine what we have to do with the num-
stages until it reaches the most-significant bit position. bers. For example, consider the following two operations.
Real full adders in computers are much more complicated
than those we have shown here. The fundamental principles 8 8
+5 and -5
are the same, but the effect of the ripple-through carry from
13 3
first to last stage cannot be tolerated. A mechanism called
Carry-in to
a2 b2 a1 b1 a0 b0 first stage
HA Cin HA Cin HA
HA HA HA
Carry to s2 s1 s0
next stage
In both these examples the numbers are the same, but the Sign and Number Result
operations we performed on them were different; in the magnitude with sign with sign
first case we added them together and in the second case we value bit converted converted
into sign into sign bit
subtracted them. This technique can be extended to
computer arithmetic to give the sign and magnitude represen- 1.
tation of a negative number.
2.
4.8.1 Sign and magnitude representation
An n-bit word can have 2n possible different values from 0 to
3.
2n1; for example, an 8-bit word can represent 0, 1, . . . , 254,
255. One way of indicating a negative number is to take the
most-significant bit and reserve it to indicate the sign of the
number. The usual convention is to choose the sign bit as 0 4.
to represent positive numbers and 1 to represent negative
numbers. We can express the value of a sign and magnitude
number mathematically in the form (1)S M, where S is
the sign bit of the number and M is its magnitude. If S 0, 4.8.2 Complementary arithmetic
(1)0 1 and the number is positive. If S 1,
In complementary arithmetic the negativeness of a number is
(1)1 1 and the number is negative. For example, in
contained within the number itself. Because of this, the
8 bits we can interpret the two numbers 00001101 and
concept of signs ( and ) may, effectively, be dispensed
10001101 as
with. If we add X to Y the operation is that of addition if X is
000011012 = +1310 100011012 = –1310 positive and Y is positive, but if Y is negative the end result is
number number that of subtraction (assuming that Y is represented by its
magnitude magnitude negative form). It is important to point out here that comple-
0 0001101 1 0001101 mentary arithmetic is used to represent and to manipulate
both positive and negative numbers. To demonstrate that
sign bit Negative sign bit Positive
there is nothing magical about complementary arithmetic,
let’s examine decimal complements.
Using a sign bit to represent signed numbers is not widely
used in integer arithmetic. The range of a sign and magnitude Ten’s complement arithmetic
number in n bits is given by (2n1 1) to (2n1 1). The ten’s complement of an n-digit decimal number, N, is
All we’ve done is to take an n bit number, use 1 bit to defined as 10nN. The ten’s complement may also be
represent the sign, and let the remaining n 1 bits represent calculated by subtracting each of the digits of N from 9 and
the number. Thus, an 8-bit number can represent from 127 adding 1 to the result; for example, if n 1, the value of 1 is
(11111111) to 127 (01111111). One of the objections to represented in ten’s complement by 9. Consider the four-digit
this system is that it has two values for zero: decimal number 1234. Its ten’s complement is:
00000000 0 and 10000000 0 (a) 104 1234 8766 or (b) 9999
1234
Another reason for rejecting this system is that it requires 8765 1 8766
separate adders and subtractors. The are other ways of
representing negative numbers that remove the need for Suppose we were to add this complement to another
subtractor circuits. number (say) 8576. We get
Examples of addition and subtraction in sign and
8576
magnitude arithmetic are given below. Remember that the +8766
most-significant bit is a sign bit and does not take part in the 17342
calculation itself. This is in contrast with two’s complement
arithmetic (see later) in which the sign bit forms an integral Now let’s examine the effect of subtracting 1234 from 8576
part of the number when it is used in calculations. In each of by conventional means.
the four examples below, we perform the calculation by first 8576
converting the sign bit to a positive or to a negative sign. Then 1234
we perform the calculation and, finally, convert the sign of 7342
the result into a sign bit.
4.8 Signed numbers 177
Notice that the results of the two operations are similar in examples 3 and 4 give negative results that require a little fur-
the least-significant four digits, but differ in the fifth digit ther explanation. Example 3 calculates 9 6 by adding the
by 104. The reason for this is not hard to find. Consider the two’s complement of 9 to 6 to get 3 expressed in two’s com-
subtraction of Y from X. We wish to calculate Z X Y, plement form. The two’s complement representation of 3 is
which we do by adding the ten’s complement of Y to X. The given by 100000 00011 11101.
ten’s complement of Y is defined as 104 Y. Therefore we get Example 4 evaluates X Y to get a result of 15 but
with the addition of a 2n term. The two’s complement repre-
Z X (104 Y) 104 (X Y).
sentation of 15 is given by 100000 01111 10001. In
In other words, we get the desired result, X Y, together with example 4, where both numbers are negative, we have
an unwanted digit in the leftmost position. This digit may be (2n X) (2n Y) 2n (2n X Y). The first part of
discarded. this expression is the redundant 2n and the second part is the
Complementing a number twice results in the original num- two’s complement representation of X ⫺Y. The two’s com-
ber; for example, 1234 is 104 1234 8876. Complementing plement system works for all possible combinations of posi-
twice, we get ( 1234) 8876 104 8876 1234. tive and negative numbers.
am–1 bm–1 a2 b2 a1 b1 a0 b0
C
The control input
adds A and B when
C = 0 and subtracts
A – B when C = 1.
........
Cout sm–1 s2 s1 s0
adding this 1 forms the two’s complement of B enabling the If we choose a 5-bit representation, we know that the range
subtraction of B from A to take place. of valid signed numbers is 16 to 15. Suppose we first add
5 and 6 and then try 12 and 13.
Case 1 Case 2
Properties of two’s complement numbers
5 00101 12 01100
1. The two’s complement system is a true complement system in 6 00110 13 01101
that X (X) 0. For example, in 5 bits 1310 11 01011 1110 25 11001 710 (as a
0110110 and 132 100112. The sum of 13 and 13 is two's complement number)
01101
10011 In case 1 we get the expected answer of 1110, but in case 2
100000 0 we get a negative result because the sign bit is ‘1’. If the answer
2. There is one unique zero 00 . . . 0. were regarded as an unsigned binary number it would
be 25, which is, of course, the correct answer. However,
3. If the number is positive the most-significant bit is 0,
once the two’s complement system has been chosen to
and if it is negative the most-significant bit is 1. Thus, the
most-significant bit is a sign bit. represent signed numbers, all answers must be interpreted in
this light.
4. The range of two’s complement numbers in n bits is
Similarly, if we add together two negative numbers whose
from 2n1 to 2n 1 1. For n 5, this range is
from 16 to 15. Note that the total number of different total is less than 16, we also go out of range. For example, if
numbers is 32 (16 negative, zero and 15 positive). What this we add 9 101112 and 12 101002, we get
demonstrates is that a 5-bit number can uniquely describe 9 10111
32 items, and it is up to us whether we choose to call these
12 10100
items the natural binary integers 0 to 31, or the signed two’s
27 101011 gives a positive result 0101112 1110
complement numbers 16 to 15.
5. The complement of the complement of X is X (i.e. Both these cases represent an out-of-range condition
-----
(X) X). In 5 bits 12 01100 and 12 10011 called arithmetic overflow. Arithmetic overflow occurs during
1 10100. If we form the two’s complement of 12 (i.e. a two’s complement addition if the result of adding two
10100) in the usual fashion by inverting the bits and adding 1, positive numbers yields a negative result, or if adding two
-----
we get 10100 1 01011 1 01100, which is the same as negative numbers yields a positive result.2 If the sign bits of
the number we started with. A and B are the same but the sign bit of the result is different,
Let’s now see what happens if we violate the range of two’s arithmetic overflow has occurred. If an1 is the sign bit of A,
complement numbers. That is, we will carry out an operation 2
Some define overflow more generally as ‘A condition that occurs
whose result falls outside the range of values that can be when the result of an operation does not fit the number representation
represents by two’s complement numbers. in use’.
4.8 Signed numbers 179
bn1 is the sign bit of B, and sn1 is the sign bit of the sum of Considering both cases, overflow occurs if
A and B, then overflow is defined by cn ·cn1 cn ·cn1 1.
V an1 ·bn1 ·sn1 an1 ·bn1 ·sn1 Alternative view of two’s complement numbers
Arithmetic overflow is a consequence of two’s complement We have seen that a binary integer, N, lying in the range
arithmetic and shouldn’t be confused with carry-out, which 0
N 2n 1, is represented in a negative form in n bits by
is the carry bit generated by the addition of the two most- the expression 2n N. We have also seen that this expression
significant bits of the numbers. can be readily evaluated by inverting the bits of N and adding
In practice, real systems detect overflow from Cin Cout to 1 to the result.
the last stage. That is, we detect overflow from Another way of looking at a two’s complement number is
to regard it as a conventional binary number represented
V cn ·cn1 cn · cn1 in the positional notation but with the sign of the most-
We now demonstrate that this expression is correct. This significant bit negative. That is,
proof has been included to improve your understanding of
N dn12n1 dn22n2 . . . d020
the nature of two’s complement arithmetic.
Figure 4.33 illustrates the most-significant stage of a paral- where dn1, dn2, . . . d0 are the bits of the two’s complement
lel adder that adds together bits an1, bn1, and cn1 to gener- number D. Consider the binary representation of 1410 and
ate a sum bit, sn1, and a carry-out, cn. There are four possible the two’s complement form of 14, in 5 bits.
combinations of A and B that can be added together 1410 011102
A B 14 2n N 25 14 32 14 18 10010
A + B or 14 0 1 1 1 0 1 10001 1 100102.
A B We can regard the two’s complement representation
A B of 14 (i.e. 10010) as
1 24 0 23 0 22 1 21 0 20 (this is a con-
As adding two numbers of differing sign cannot result in
ventional 8421-coded binary number with a negative
arithmetic overflow, we need consider only the cases where
weight for the most-significant bit)
A and B are both positive, or both negative.
Case 1 A and B positive an1 0, bn1 0 16 (0 0 2 0)
The final stage adds an1 bn1 cn1 to get cn-1, because 16 2 14
an1 and bn1 are both 0 (by definition if the numbers are pos- We can demonstrate that a two’s complement number is
itive). That is, the carry-out, cn, is 0 and sn1 cn1. indeed represented in this way. In what follows N represents a
We know overflow occurs if sn1 1, therefore overflow positive integer, and D the two’s complement form of N.
occurs if the sum is negative and cn ·cn1 1. We wish to prove that N D.
Case 2 A and B negative an1 1, bn1 1. n2
n2 n2
Sn–1 Cn
2n1 兺 2 兺 N2 1
i0
i
i0
i
i
n2
Binary code Natural binary Sign and magnitude One's complement Two's complement Biased form
00000 0 0 0 0 15
00001 1 1 1 1 14
00010 2 2 2 2 13
00011 3 3 3 3 12
00100 4 4 4 4 11
00101 5 5 5 5 10
00110 6 6 6 6 9
00111 7 7 7 7 8
01000 8 8 8 8 7
01001 9 9 9 9 6
01010 10 10 10 10 5
01011 11 11 11 11 4
01100 12 12 12 12 3
01101 13 13 13 13 2
01110 14 14 14 14 1
01111 15 15 15 15 0
10000 16 0 15 16 1
10001 17 1 14 15 2
10010 18 2 13 14 3
10011 19 3 12 13 4
10100 20 4 11 12 5
10101 21 5 10 11 6
10110 22 6 9 10 7
10111 23 7 8 9 8
11000 24 8 7 8 9
11001 25 9 6 7 10
11010 26 10 5 6 11
11011 27 11 4 5 12
11100 28 12 3 4 13
11101 29 13 2 3 14
11110 30 14 1 2 15
11111 31 15 0 1 16
is 15 greater than the actual number; for example, 7 is repre- Case 1 Integer arithmetic Case 2 Fixed point arithmetic
sented by 7 15 22 101102.
numbers were in integer form. This arrangement is called the integer part of the number and 12 bytes for the fractional
fixed point arithmetic, because the binary point is assumed to part; that is, they would need a 26-byte (208 bit) number. A
remain in the same position. That is, there is always the same clue to a way out of our dilemma is to note that both figures
number of digits before and after the binary point. The contain a large number of zeros but few significant digits.
advantage of the fixed point representation of numbers is
that no complex software or hardware is needed to imple-
ment it. 4.9.1 Representation of floating point
A simple example should make the idea of fixed point arith- numbers
metic clearer. Consider an 8-bit fixed point number with the We can express a decimal number such as 1234.56 in the form
four most-significant bits representing the integer part and 0.123456104 which is called the floating point format or
the four least-significant bits representing the fractional part. scientific notation. The computer handles large and small
Let’s see what happens if we wish to add the two numbers binary values in a similar way; for example, 1101101.1101101
3.625 and 6.5 and print the result. An input program first may be represented internally as 0.11011011101101 27
converts these numbers to binary form. (the 7 is, of course, also stored in a binary format). Before
looking at floating point numbers in more detail we should to
(in 8 bits)
consider the ideas of range, precision, and accuracy, which are
(in 8 bits) closely related to the way numbers are represented in floating
point format.
The computer now regards these numbers as 00111010
and 01101000, respectively. Remember that the binary point Range A number’s range tells us how big or how small
is only imaginary. These numbers are added in the normal it can be; for example, the astrophysicist was dealing with
way to give numbers as large as 2 1033 and those as small as
9 1028, representing a range of approximately 1061, or 61
decades. The range of numbers capable of representation by
a computer must be sufficient for the calculations that are
This result would be equal to 16210 if we were to regard it as likely to be performed. If the computer is employed in a
an unsigned natural binary integer. But it isn’t. We must dedicated application where the range of data to be handled
regard it as a fixed point value. The output program now is known to be quite small, then the range of valid numbers
takes the result and splits it into an integer part 1010 and a may be restricted, simplifying the hardware/software
fractional part 0010. The integer part is equal to 1010 and the requirements.
fractional part is 0.12510. The result would be printed as Precision The precision of a number is a measure of its
10.125. exactness and corresponds to the number of significant
In practice, a fixed point number may be spread over several figures used to represent it. For example, the constant
words to achieve a greater range of values than allowed by a may be written as 3.142 or 3.141592. The latter value is
single word. The fixed point representation of fractional num- more precise than the former because it represents to one
bers is very useful in some circumstances, particularly for part in 107 whereas the former value represents to one part
financial calculations. For example, the smallest fractional in 104.
part may be (say) 0.1 of a cent or 0.001 of a dollar. The largest Accuracy Accuracy has been included here largely to con-
integer part may be $1 000 000. To represent such a quantity in trast it with precision, a term often incorrectly thought to
BCD a total of 6 4 3 4 36 bits are required. A byte- mean the same as accuracy. Accuracy is the measure of the
oriented computer would require 5 bytes for each number. correctness of a quantity. For example, we can say 3.141
Fixed point numbers have their limitations. Consider the or 3.241592. The first value is a low-precision number
astrophysicist who is examining the sun’s behavior. They are but is more accurate than the higher precision value, which
confronted with numbers ranging from the mass of the sun has an error in the second digit. In an ideal world accuracy and
(1990000000000000000000000000000000 grams) to the mass precision would go hand in hand. It’s the job of the computer
of an electron (0.000000000000000000000000000910956 programmer to design algorithms that preserve the accuracy
grams). that the available precision allows. One of the potential
If astrophysicists were to resort to fixed point arithmetic, hazards of computation is calculations that take the form
they would require an extravagantly large number of bits to
represent the range of numbers used in their trade. A single
byte represents numbers in the range 0 to 255. If the physicist
wanted to work with astronomically large and microscop-
ically small numbers, roughly 14 bytes would be required for
4.9 Floating point numbers 183
e a
This floating point number
Exponent Mantissa represents a x 2e.
Fig. 4.35 Storing a floating-
Floating point number point number.
When the denominator is evaluated we are left with Today, the IEEE standard for floating-point numbers
0.0009, a number with only one decimal place of precision. dominates the computer industry. Accordingly, we concen-
Although the result might show eight figures of precision, it trate on this standard.
may be very inaccurate indeed.
A floating point number can be represented in the form
a re where a is the mantissa (also called the argument), r is 4.9.2 Normalization of floating
the radix or base, and e is the exponent or characteristic. The point numbers
computer stores a floating point number by splitting the By convention a floating point mantissa is always normalized
binary sequence representing the number into the two fields unless it is equal to zero and is expressed in the form 1.F
illustrated in Fig. 4.35. The radix r is not stored explicitly by where F is the fractional part.3 Because a normalized IEEE
the computer. floating pint mantissa always begins with a 1, this is called ‘the
Throughout the remainder of this section the value of the leading 1’. A normalized mantissa is therefore in the range
radix in all floating point numbers is assumed to be two. 1.00 . . . 00 to 1.11 . . . 11; that is
Before the IEEE format became popular, some computers
used an octal or hexadecimal exponent, so that the mantissa 2 x
1, or x 0, or 1
x 2.
is multiplied by 8e or 16e, respectively. For example, if a float- If the result of a calculation were to yield 11.010 . . . 2e,
ing-point number has a mantissa 0.101011 and an octal the result would be normalized to give 1.1010 . . . 2e 1.
exponent of 4 (i.e. 0100 in 4 bits), the number is equal to Similarly, the result 0.1001 . . . 2e would be normalized to
0.101011 84 or 0.101011 212, which is 1010110000002. 1.001 . . . 2e1.
It’s not necessary for a floating point number to occupy a By normalizing a mantissa, the greatest possible advantage
single storage location. Indeed with an 8-bit word, such a rep- is taken of the available precision. For example, the unnor-
resentation would be useless. Several words are grouped to malized 8-bit mantissa 0.00001010 has only four significant
form a floating point number (the number of words required bits, whereas the normalized 8-bit mantissa 1.0100011 has
is bits-in-floating-point-representation/computer-word- eight significant bits. It is worth noting here that there is a
length). The split between exponent and mantissa need not slight difference between normalized decimal numbers as
fall at a word boundary. That is, a mantissa might occupy 3 used by engineers and scientists, and normalized binary
bytes and the exponent 1 byte of a two 16-bit word floating- numbers. By convention, a decimal floating point number is
point number. normalized so that its mantissa lies in the range 1.00 . . . 0 to
When constructing a floating point representation for 9.99 . . . 9.
numbers, the programmer must select the following. A special exception has to be made in the case of zero, as
1. The total number of bits. this number cannot, of course, be normalized.
2. The representation of the mantissa (two’s complement etc.). Because the IEEE floating-point format uses a sign and
magnitude format, a sign-bit indicates the sign of a mantissa.
3. The representation of the exponent (biased etc.).
A negative floating point mantissa is stored in the form
4. The number of bits allocated to the mantissa and exponent.
5. The location of the mantissa (exponent first or mantissa first). x 1.11 . . . 1 to 1.00 . . . 0
Point 4 is worthy of elaboration. Once you’ve decided on the A floating point number is limited to one of the three
total number of bits in the floating point representation, the ranges 2 x
1, or x 0, or 1
x 2 described by
number must be split into a mantissa and exponent. Fig. 4.36.
Dedicating a large number of bits to the exponent lets you rep-
Biased exponents
resent numbers with a large range. Gaining exponent bits at the
expense of the mantissa reduces the precision of the floating A floating-point representation of numbers must make
point number. Conversely, increasing the bits available for the provision for both positive and negative numbers, and
mantissa improves the precision at the expense of the range. 3
Before the advent of the IEEE standard, floating point numbers were
Once, almost no two machines used the same format. often normalized in the form 0.1 . . . x 2e and constrained to the range
Things improved with the introduction of microprocessors. 1
⁄2
x 1 or 1⁄2 x 1.
184 Chapter 4 Computer arithmetic
–1.1111 ... –1.000 ... 1.000 ... 1.1111 ... to 2m1 1 by subtracting a constant value
0.000 ... (or bias) of B 2m1 from each of the num-
bers. We get a continuous natural binary series
Valid negative Valid positive from 0 to N representing exponents from B
mantissas mantissas
_ + to N B.
If we use a 3-bit decimal biased exponent
with B 4, the biased exponents are 0, 1, 2, 3,
–2 –1 0 1 2 4, 5, 6, 7 and represent the actual expo-
nents 4, 3, 2, 1, 0, 1, 2, 3. We’ve
Figure 4.36 Range of valid normalized two’s complement mantissas. invented a way of representing negative num-
bers by adding a constant to the most negative
number to make it equal to zero. In this exam-
Binary value True exponent Biased form ple, we’ve added 4 to each true number so that 4 is repre-
sented by the biased values 0, and 3 by 1, etc.
0000 8 0
We create a biased exponent by adding a constant to the
0001 7 1 true exponent so that the biased exponent is given by
0010 6 2 b ⬘ ⫽ b B, where b⬘ is the biased exponent, b the true expo-
0011 5 3 nent, and B a weighting. The weighting B is frequently either
0100 4 4 2m1 or 2m1 1. Consider what happens for the case where
0101 3 5 m 4 and B 23 8. (See Table 4.24).
0110 2 6 The true exponent ranges from 8 to 7, allowing us to
0111 1 7 represent powers of 2 from 28 to 27, while the biased expo-
1000 0 8 nent ranges from 0 to 15. The advantage of the biased rep-
1001 1 9 resentation of exponents is that the most negative exponent is
represented by zero. Conveniently, the floating-point value of
1010 2 10
zero is represented by 0.0 . . . 0 2most negative exponent (see
1011 3 11
Figure 4.37). By choosing the biased exponent system we
1100 4 12
arrange that zero is represented by a zero mantissa and a zero
1101 5 13 exponent as Figure 4.36 demonstrates.
1110 6 14 The biased exponent representation of exponents is also
1111 7 15 called excess n, where n is typically 2m1. For example, a 6-bit
For example, if n 1010.1111, we normalize it to 1.0101111 23. The exponent is called excess 32 because the stored exponent
true exponent is 3, which is stored as a biased exponent of 3 8, which is exceeds the true exponent by 32. In this case, the smallest
1110 or 1011 in binary form. true exponent that can be represented is 32 and is stored as
an excess 32 value of 0. The maximum true exponent that can
Table 4.24 Relationship between true and biased exponents.
be represented is 31 and this is stored as 63.
A second advantage of the biased exponent representation
Exponent Mantissa is that the stored (i.e. biased) exponents form a natural binary
S
representing zero by a sequence. This sequence is monotonic so that increasing the
0 0 0 0 ........0 0 0 0 .....................0 zero exponent and mantissa exponent by 1 involves adding 1 to the binary exponent, and
decreasing the exponent by 1 involves subtracting one from
Fig. 4.37 Representing zero in floating point arithmetic. the binary exponent. In both cases the binary biased expo-
nent can be considered as behaving like an unsigned binary
positive and negative exponents. The following example in number. Consequently, you can use relatively simple logic to
decimal notation demonstrates this concept. compare two exponents. Remember that in 4-bit signed arith-
0.123 10 , 0.756 10 , 0.176 10 ,
12 9 3 metic the number 0110 is larger than 1110 because the
0.459 10 7 second number is negative. If these were biased exponents,
1110 would be larger than 0110.
The mantissa of an IEEE format floating point number is
represented in sign and magnitude form. The exponent, how- IEEE floating point format
ever, is represented in a biased form. An m-bit exponent pro- The Institute of Electronics and Electrical Engineers (IEEE)
vides 2m unsigned integer exponents from 00 . . . 0 to has defined a standard floating point format for arithmetic
11 . . . 1. Suppose that we relabel these 2m values from 2m1 operations called ANSI/IEEE standard 754-1985. To cater for
4.9 Floating point numbers 185
different applications, the standard specifies three basic memory. If we know that a 1 must be located to the left of the
formats, called single, double, and quad. Table 4.25 defines the fractional mantissa, there is no need to store it. In this way 1
principal features of these three floating point formats. bit of storage is saved, permitting the precision of the man-
An IEEE format floating point number X is formally tissa to be extended by 1 bit. The format of the number when
defined as stored in memory is given in Fig. 4.38.
X 1S 2EB 1.F, As an example of the use of the IEEE 32-bit format, con-
where S sign bit, 0 positive mantissa, 1 negative sider the representation of the decimal number 2345.125.
mantissa, E exponent biased by B, F fractional mantissa 2345.12510 100100101001.0012 (as an equivalent
(note that the mantissa is 1. F and has an implicit leading 1). binary number)
A single-format 32-bit floating-point number has a bias of 1.00100101001001 211 (as a nor-
127 and a 23-bit fractional mantissa. A sign and magnitude malized binary number)
representation has been adopted for the mantissa; if S 1 The mantissa is negative so the sign bit S is 1. The biased
the mantissa is negative and if S 0 it is positive. exponent is given by 11 127 138 100010102. The
The mantissa is always normalized and lies in the range fractional part of the mantissa is
1.000 . . . 00 to 1.111 . . . 11. If the mantissa is always normal- .00100101001001000000000 (in 23 bits). Therefore, the IEEE
ized, it follows that the leading 1, the integer part, is redun- single format representation of 2345.125 is:
dant when the IEEE format floating point number is stored in 11000101000100101001001000000000
In order to minimize storage space in computers where the
memory width is less than that of the floating point number,
floating point numbers are packed so that the sign bit, expo-
Single Double Quad
precision precision precision nent and mantissa share part of two or more machine words.
When floating point operations are carried out, the numbers
Field width in bits
are first unpacked and the mantissa separated from the expo-
S sign 1 1 1 nent. For example, the basic single precision format specifies
E exponent 8 11 15 a 23-bit fractional mantissa, giving a 24-bit mantissa when
L leading bit 1 1 1 unpacked and the leading 1 reinserted. If the processor on
F fraction 23 52 111 which the floating point numbers are being processed has a
Total width 32 64 128 16-bit word length, the unpacked mantissa will occupy 24
Exponent bits out of the 32 bits taken up by two words.
Maximum E 255 2047 32 767 If, when a number is unpacked, the number of bits in its
Minimum E 0 0 0 exponent and mantissa is allowed to increase to fill the avail-
Bias 127 1023 16 383 able space, the format is said to be extended. By extending the
format in this way, the range and precision of the floating
Notes
S sign bit (0 for a negative number, 1 for a positive number).
L leading bit (always 1 in a normalized, non-zero mantissa). Imaginary leading bit
F fractional part of the mantissa. 1 bit 8 bits
The range of exponents is from Min E 1 to Max E1 23 bits
The number is represented by 1S x 2E exponent L • F. 31 30 23 22 0
A signed zero is represented by the minimum exponent, L 0, and
F 0, for all three formats. S Biased exponent 1. Fractional mantissa
The maximum exponent has a special function that represents signed
infinity for all three formats. 32-bit floating point number
Table 4.25 Basic IEEE floating point formats. Figure 4.38 Format of the IEEE 32-bit floating point format.
point number are considerably increased. For example, a The exponent of B is smaller than that of A which results
single format number is stored as a 32-bit quantity. When it is an increase of 2 in B’s exponent and a corresponding division
unpacked the 23-bit fractional mantissa is increased to 24 bits of B’s mantissa by 102 to give 0.0056789 105. We can now
by including the leading 1 and then the mantissa is extended add A to the denormalized B.
to 32 bits (either as a single 32-bit word or as two 16-bit
A 0.1234500 105
words). All calculations are then performed using the 32-bit
B 0.0056789 105
extended precision mantissa. This is particularly helpful
0.1291289 105
when trigonometric functions (e.g., sin x, cos x) are evalu-
ated. After a sequence of floating operations have been car- The result is already in a normalized form and doesn’t need
ried out in the extended format, the floating point number is post-normalizing. Note that the answer is expressed to a pre-
repacked and stored in memory in its basic form. cision of seven significant figures whereas A and B are each
In 32-bit single IEEE format, the maximum exponent Emax expressed to a precision of five significant figures. If the result
is 127 and the minimum exponent Emin is 126 rather were stored in a computer, its mantissa would have to be
than 128 to 127 as we might expect. The special value reduced to five figures after the decimal point (because we
Emin1 (i.e. 127) is used to encode zero and Emax 1 is were working with five-digit mantissas).
used to encode plus or minus infinity or a NaN. A NaN is a When people do arithmetic they often resort to what may
special entity catered for in the IEEE format and is not a best be called floating precision. If they want greater preci-
number. The use of NaNs is covered by the IEEE standard and sion they simply use more digits. Computers use a fixed rep-
they permit the manipulation of formats outside the IEEE resentation for floating point numbers so that the precision
standard. may not increase as a result of calculation. Consider the
following binary example of floating point addition.
Addition can’t take place as long as the exponents are differ- 1. Because the exponent shares part of a word with the
mantissa, it’s necessary to separate them before the process
ent. To perform a floating-point addition (or subtraction)
of addition can begin. As we pointed out before, this is called
the following steps must be carried out.
unpacking.
1. Identify the number with the smaller exponent. 2. If the two exponents differ by more than p 1, where p is the
2. Make the smaller exponent equal to the larger exponent by number of significant bits in the mantissa, the smaller num-
dividing the mantissa of the smaller number by the same ber is too small to affect the larger and the result is effectively
factor by which its exponent was increased. equal to the larger number. For example, there’s no point in
3. Add (or subtract) the mantissas. adding 0.1234 1020 to 0.4567 102, because adding
0.4567 102 to 0.1234 1020 has no effect on a four-digit
4. If necessary, normalize the result (post-normalization).
mantissa.
In the above example we have A 0.12345 105 and 3. During the post-normalization phase the exponent is
B 0.56789 103. checked to see if it is less than its minimum possible value or
4.9 Floating point numbers 187
Unpack the
numbers
A = a × 2e1
B = b × 2e2
Mantissas a and b are expressed in p bits
e1 – e2 >p +1
Yes No
STOP or
e2 – e1 >p +1
e1 = e2
Add a to b
Shift mantissa right Over range Test resulting Under range Shift mantissa left
e1 = e1 + 1 mantissa e1 = e1– 1
No Is Is No
e 1 > maximum? e 1 < minimum?
Yes Yes
ERROR ERROR
Exponent overflow END Exponent underflow
greater than its maximum possible value. This corresponds to 4.9.4 Examples of floating point
testing for exponent underflow and overflow, respectively. Each
of these cases represents conditions in which the number is calculations
outside the range of numbers that the computer can handle. Because handling floating point numbers can be tricky, we
Exponent underflow would generally lead to the number provide several examples. An IEEE standard 32-bit floating-
being made equal to zero, whereas exponent overflow would
point number has the format N 1S 1.F 2E127,
result in an error condition and may require the intervention
where S is the sign bit, F is the fractional mantissa, and E the
of the operating system.
biased exponent.
Floating point multiplication is easier than addition or
subtraction because we simply multiply mantissas and add
exponents. For example if x s1 2e1 and y s2 2e2 then EXAMPLE 1
x • y s1 • s2 2(e1e2). The multiplication can be done with Convert the decimal numbers 123.5 and 100.25 into the IEEE
an integer multiplier (we don’t even have to worry about format for floating point numbers. Then carry out the subtrac-
signed numbers). Of course, multiplication of two p-bit tion of 123.5⫺100.25 and express the result as a normalized
numbers yields a 2p-bit product and we therefore have to 32-bit floating point value.
round down the result of the multiplication. When we add
123.510 1111011.1
the two exponents, we have to remember to subtract the bias
because each exponent is Etrue b. 1.1110111 26
The mantissa is positive, so S 0. The exponent is 6, which
Rounding and truncation
is stored in biased form as 6 127 13310 100001012. The
We have seen that some of the operations involved in floating mantissa is 1.1110111, which is stored as 23 bits with the lead-
point arithmetic lead to an increase in the number of bits in ing 1 suppressed. The number is stored in IEEE format as
the mantissa and that some technique must be invoked to 010000101 10010001000000000000000.
keep the number of bits in the mantissa constant. The sim- We can immediately write down the IEEE value for 100.25
plest technique is called truncation and involves nothing because it is so close to the 123.5 we have just calculated; that
more than dropping unwanted bits. For example, if we trun- is, 0 10000101 10010001000000000000000.
cate 0.1101101 to four significant bits we get 0.1101. The two IEEE-format floating point numbers taking
Truncating a number creates an error called an induced error part in the operation are first unpacked. The sign, the expo-
(i.e. an error has been induced in the calculation by an oper- nent, and the mantissa (with the leading 1 restored) must be
ation on the number). Truncating a number causes a biased reconstituted.
error because the number after truncation is always smaller The two exponents are compared. If they are the same, the
than the number that was truncated. mantissas are added. If they are not, the number with the
A much better technique for reducing the number of bits smaller exponent is denormalized by shifting its mantissa
in a word is rounding. If the value of the lost digits is greater right (i.e. dividing by 2) and incrementing its exponent (i.e.
than half the least-significant bit of the retained digits, 1 is multiplying by 2) until the two exponents are equal. Then the
added to the least-significant bit of the remaining digits. We numbers are added.
have been doing this with decimal numbers for years—the If the mantissa of the result is out of range (i.e. greater than
decimal number 12.234 is rounded to 12.23, whereas 13.146 1.11111 . . . 1 or less than 1.0000 . . . 0) it must be re-normal-
is rounded to 13.15. Consider rounding to four significant ized. If the exponent goes out of range (bigger than its largest
bits the following numbers. value or smaller than its smallest value) exponent overflow
0.1101011 → 0.1101 The three bits removed are 011, so do occurs and an error is flagged. The result is then repacked and
nothing the leading 1 in front of the normalized mantissa removed.
0.1101101 → 0.1101 1 0.1110 The three bits removed IEEE number
are 101, so add 1 123.510 001000101111011100000 00000000000
Rounding is always preferred to truncation partially IEEE number
because it is more accurate and partially because it gives 100.2510 0 0100010110010001000000000000000
rise to an unbiased error. Truncation always undervalues These floating-point numbers have the same exponent, so
the result leading to a systematic error whereas rounding we can subtract their mantissas (after inserting the leading 1).
sometimes reduces the result and sometimes increases it.
The major disadvantage of rounding is that it requires 1.11101110000000000000000
a further arithmetic operation to be performed on the 1.10010001000000000000000
result. 0.01011101000000000000000
4.10 Multiplication and division 189
The result is not normalized and must be shifted left twice to This number is equal to 25 1.0101010011
get 1.01110100000000000000000. The exponent must be 101010.10011 42.59375.
decreased by 2 to get 01000011. The result expressed in floating
point format is EXAMPLE 3
0 0100011 01110100000000000000000
Let’s perform a floating point multiplication. We’ll use two dec-
EXAMPLE 2 imal floating point numbers that can be converted into binary
form without a calculator. Assume we wish to calculate
Carry out the operation 42.6875 ⫺ 0.09375 by first converting X ⫽ 256.5 x 4.652.
these numbers to the IEEE 32-bit format. Use these floating We can immediately write 256.5 100000000.12
point numbers to perform the subtraction and then calculate 1.000000001 28 and 4.625 100.1012 1.00101 22.
the new floating point value. In IEEE 32-bit format, these two numbers are represented by
42.687510 = 101010.10112
0 10000111 00000000100000000000000 and
= 1.010101011 25
0 10000001 00101000000000000000000
This number is positive and S 0. The true exponent is 5
and, therefore, the biased exponent is 5 127 (i.e. actual To multiply the numbers, we unpack the fractional
exponent bias) 132 100001002 in 8 bits. The frac- mantissas, insert the leading 1s, and multiply them. Then we
tional exponent is 010101011(00000000000000). Therefore add the two biased exponents and subtract one bias. The new
42.6875 is represented as an IEEE floating point value by mantissa is
01000010001010101100000000000000. 1.000000001 1.00101 1.001010001001012
Similarly, 0.0937510 0.000112 1.1 24. The If we add the biased mantissas and subtract one bias we get
sign-bit S 1 because the number is negative and the biased 10000111 10000001 01111111 100010012. The final
exponent E 4 127 123 011110112. The frac- IEEE format result is
tional mantissa is F 10000000000000000000000. The
0 10001001 00101000100101000000000 44944A016.
representation of 0.09375 is therefore 1011110111
0000000000000000000000. These two numbers are stored as The decimal result is 1.00101000100101000000000
2100010011111111 1.00101000100101 210
01000010001010101100000000000000 and
10010100010.01012 1186.312510.
10111101110000000000000000000000, respectively.
In order to perform the addition we have to unpack these
numbers to sign biased exponent mantissa. 4.10 Multiplication and division
First number 0 10000100 01010101100000000000000
We’ve looked at addition and subtraction—now we consider
Second number 1 01111011 10000000000000000000000
multiplication and division. Other mathematical functions
We must insert the leading 1 into the fractional mantissa to can be derived from multiplication. Division itself will later
get the true mantissa. be defined as an iterative process involving multiplication.
First number 0 10000100 101010101100000000000000
Second number 1 01111011 110000000000000000000000
4.10.1 Multiplication
In order to add or subtract the numbers, the exponents must Binary multiplication is no more complex than decimal
be the same (we can work with biased exponents). The second multiplication. In many ways it’s easier as the whole binary
number’s exponent is smaller by 10000100 multiplication table can be reduced to
0111011 000010012 910. We increase the second exponent
by 9 and shift the mantissa right 9 times to get 000
First number 0 10000100 101010101100000000000000 010
Second number 1 10000100 000000000110000000000000000000000 100
111
We can now subtract mantissas to get
10101010011000000000000. The result is positive with a The multiplication of two bits is identical to their logical
biased exponent of 10000100 and a mantissa of AND. When we consider the multiplication of strings of bits,
1.0101010011000000000000. This number would be stored as things become more complex and the way in which multipli-
0 10000100 0101010011000000000000 (we’ve dropped the cation is carried out, or mechanized, varies from machine to
leading 1 mantissa). machine. The faster and more expensive the computer, the
190 Chapter 4 Computer arithmetic
more complex the hardware used to implement multiplica- the algorithm of Table 4.26. The mechanization of the prod-
tion. Some high-speed computers perform multiplication in uct of 1101 1010 is presented in Table 4.27.
a single operation by means of a very large logic array involv-
ing hundreds of gates. Signed multiplication
The multiplication algorithm we’ve just discussed is valid
Unsigned binary multiplication only for unsigned integers or unsigned fixed point numbers.
The so-called pencil and paper algorithm used by people to cal- As computers represent signed numbers by means of two’s
culate the product of two multidigit numbers, involves the complement notation, it is necessary to find some way of
multiplication of an n-digit number by a single digit followed forming the product of two’s complement numbers. It is, of
by shifting and adding. We can apply the same approach to course, possible to convert negative numbers into a modulus-
unsigned binary numbers in the following way. The multiplier only form, calculate the product, and then convert it into a
bits are examined, one at a time, starting with the least-signif- two’s complement form if it is negative. That approach wastes
icant bit. If the current multiplier bit is one the multiplicand is time.
written down, if it is zero then n zeros are written down We first demonstrate that the two’s complement represen-
instead. Then the next bit of the multiplier is examined, but tation of negative numbers can’t be used with the basic shift-
this time we write the multiplicand (or zero) one place to the ing and adding algorithm. That is, two’s complement
left of the last digits we wrote down. Each of these groups of n arithmetic works for addition and subtraction, but not for
digits is called a partial product. When all partial products have multiplication or division (without using special algo-
been formed, they are added up to give the result of the multi- rithms). Consider the product of X and Y. The two’s com-
plication. An example should make this clear. plement representation of Y is 2nY.
If we use two’s complement arithmetic, the product X(Y)
10 13 Multiplier 11012
is given by X(2nY) 2nXXY.
Multiplicand 10102
1010
1101
1010 Step 1 first multiplier bit = 1, write down multiplicand
0000 Step 2 second multiplier bit = 0, write down zeros shifted left
1010 Step 3 third multiplier bit = 1, write down multiplicand shifted left
1010 Step 4 fourth multiplier bit = 1, write down multiplicand shifted left
10000010 Step 5 add together four partial products
The result, 100000102 13010, is 8 bits long. The multiplica- The expected result, XY, is represented in two’s
tion of two n-bit numbers yields a 2n-bit product. complement form by 22nXY. The most-significant bit is 22n
Digital computers don’t implement the pencil and paper (rather than 2n) because multiplication automatically yields a
algorithm in the above way, as this would require the storing
of n partial products, followed by the simultaneous addition Multiplier 11012 Multiplicand 10102
of n words. A better technique is to add up the partial prod-
Step Counter Multiplier Partial product Cycle
ucts as they are formed. An algorithm for the multiplication
of two n-bit unsigned binary numbers is given in Table 4.26. a and b 4 1101 00000000
c 4 1101 10100000 1
We will consider the previous example of 1101 1010 using
d and e 4 0110 01010000 1
f 3 0110 01010000 1
(a) Set a counter to n. c 3 0110 01010000 2
(b) Clear the 2n-bit partial product register. d and e 3 0011 00101000 2
(c) Examine the rightmost bit of the multiplier (initially the least- f 2 0011 00101000 2
significant bit). If it is one add the multiplicand to the n most- c 2 0011 11001000 3
significant bits of the partial product. d and e 2 0001 01100100 3
(d) Shift the partial product one place to the right. f 1 0001 01100100 3
(e) Shift the multiplier one place to the right (the rightmost bit is, of c 1 0001 100000100 4
course, lost).
d and e 1 0000 10000010 4
(f) Decrement the counter. If the result is not zero repeat from step c. If
the result is zero read the product from the partial product register. f 0 0000 10000010 4
Table 4.26 An algorithm for multiplication. Table 4.27 Mechanizing unsigned multiplication.
4.10 Multiplication and division 191
Corrected result
double-length product. In order to get the correct two’s com- 3. If the current multiplier bit is the same as the next lower
plement result we have to add a correction factor of order multiplier bit, do nothing.
22n 2nX 2n(2n X) Note 1. When adding in the multiplicand to the partial
product, discard any carry bit generated by the
This correction factor is the two’s complement of X scaled addition.
by 2n. As a further illustration consider the product of X 15 Note 2. When the partial product is shifted, an arithmetic
and Y 13 in 5 bits. shift is used and the sign bit propagated.
The final result in 10 bits, 11001111012 19510, is cor- Note 3. Initially, when the current bit of the multiplier is its
rect. Similarly, when X is negative and Y is positive, a correc- least-significant bit, the next lower-order bit of the
tion factor of 2n(2nY) must be added to the result. multiplier is assumed to be zero.
When both multiplier and multiplicand are negative the
following situation exists. The flowchart for Booth’s algorithm is given in Fig. 4.40. In
order to illustrate the operation of Booth’s algorithm,
(2n X)(2n Y) 22n 2nX 2nY XY consider the three products 13 15, 13 15, and
In this case correction factors of 2nX and 2nY must be 13 (15). Table 4.28 demonstrates how Booth’s
added to the result. The 22n term represents a carry-out bit algorithm mechanizes these three multiplications.
from the most-significant position and can be neglected.
High-speed multiplication
We don’t intend to delve deeply into the subject of high-speed
Booth’s algorithm multiplication as large portions of advanced textbooks are
One approach to the multiplication of signed numbers in devoted to this topic alone. Here two alternative ways of
two’s complement form is provided by Booth’s algorithm. forming products to the method of shifting and adding are
This algorithm works for two positive numbers, one negative explained.
and one positive, or both negative. Booth’s algorithm is We have seen in Chapter 2 that you can construct a 2-bit by
broadly similar to conventional unsigned multiplication but 2-bit multiplier by means of logic gates. This process can be
with the following differences. In Booth’s algorithm two bits extended to larger numbers of bits. Figure 4.41 illustrates the
of the multiplier are examined together, to determine which type of logic array used to directly multiply two numbers.
of three courses of action is to take place next. The algorithm An alternative approach is to use a look-up table in which
is defined below. all the possible results of the product of two numbers are
stored in a ROM read-only-memory. Table 4.29 shows how
1. If the current multiplier bit is 1 and the next lower order
two four-bit numbers may be multiplied by storing all
multiplier bit is 0, subtract the multiplicand from the
28 256 possible results in a ROM.
partial product.
The 4-bit multiplier and 4-bit multiplicand together form
2. If the current multiplier bit is 0 and the next lower order an 8-bit address that selects one of 256 locations within the
multiplier bit is 1, add the multiplicand to the partial ROM. In each of these locations the product of the multiplier
product. (most-significant four address bits) and the multiplicand
192 Chapter 4 Computer arithmetic
Unpack the
numbers
Yes No
Stop Result zero?
Table 4.30 Relationship between multiplier size and array size. 11001冄1000111111
11001
The 5 bits of the divisor do not go into the first 5 bits of the
4.10.2 Division dividend, so a zero is entered into the quotient and the divi-
sor is compared with the first 6 bits of the dividend.
Division is the inverse of multiplication and is performed by
01
repeatedly subtracting the divisor from the dividend until the
11001冄1000111111
result is either zero or less than the divisor. The number of times
11001
the divisor is subtracted is called the quotient, and the number
001010
left after the final subtraction is the remainder. That is
The divisor goes into the first 6 bits of the dividend once, to
dividend/divisor quotient remainder/ divisor leave a partial dividend 001010(1111).
Alternatively, we can write The next bit of the dividend is brought down to give
dividend quotient divisor remainder
010
Before we consider binary division let’s examine decimal 11001冄1000111111
division using the traditional pencil and paper technique. 11001
The following example illustrates the division of 575 by 25. 010101
11001
quotient
divisor冄dividend 25冄575 The partial dividend is less than the divisor, and a zero is
The first step is to compare the two digits of the divisor entered into the next bit of the quotient. The process contin-
with the most-significant two digits of the dividend and ask ues as follows.
how many times the divisor goes into these two digits. The 010111
answer is 2 (i.e. 2 25 50), and 2 25 is subtracted 11001冄1000111111
from 57. The number 2 is entered as the most-significant 11001
digit of the quotient to produce the situation below. 00101011
11001
2 000100101
25冄575 11001
50 000011001
7 11001
The next digit of the dividend is brought down, and the 0000000000
divisor is compared with 75. As 75 is an exact multiple of 25, In this case the partial quotient is zero, so that the final
a three can be entered in the next position of the quotient to result is 10111, remainder 0.
give the following result.
Restoring division
23
25冄575 The traditional pencil and paper algorithm we’ve just dis-
50 cussed can be implemented in digital form with little modifi-
75 cation. The only real change is to the way in which the divisor
75
is compared with the partial dividend. People do the compar-
00
ison mentally whereas computers must perform a subtrac-
As we have examined the least-significant bit of the tion and test the sign of the result. If the subtraction yields a
dividend and the divisor was an exact multiple of 75, the positive result, a one is entered into the quotient, but if the
4.10 Multiplication and division 195
Non-restoring division
Au Bu Au Bl Al Bu Al Bl It’s possible to modify the restoring division algo-
4-bit 4-bit 4-bit 4-bit 4-bit 4-bit 4-bit 4-bit rithm of Fig. 4.44 to achieve a reduction in the
multiplier multiplier multiplier multiplier time taken to execute the division process. The
non-restoring division algorithm is almost identi-
cal to the restoring algorithm. The only difference
is that the so-called restoring operation is elimi-
A uB u A uB l A lB u A lB l nated. From the flowchart for restoring division,
it can be seen that after a partial dividend has
Partial product adder A lB l been restored by adding back the divisor, one-half
the divisor is subtracted in the next cycle. This is
A lB u because each cycle includes a shift-divisor-right
operation, which is equivalent to dividing the
A uB l
divisor by two. The restore divisor operation in the
A uB u current cycle followed by the subtract half the
divisor in the following cycle is equivalent to a
256A uB u + 16A uB l + 16A lB u + A lB l
single operation of add half the divisor to the par-
tial dividend. That is, D D/2 D/2, where D
is the divisor.
Figure 4.46 gives the flowchart for non-restoring
division. After the divisor has been subtracted
from the partial dividend, the new partial divi-
16-bit product
dend is tested. If it is negative, zero is shifted into
Fig. 4.43 High-speed multiplication. the least-significant position of the quotient and
half the divisor is added back to the partial divi-
result is negative a zero is entered in the quotient and the divi- dend. If it is positive, one is shifted into the least-significant
sor added back to the partial dividend to restore it to its pre- position of the quotient and half the divisor is subtracted
vious value. from the partial dividend. Figure 4.47 repeats the example of
A suitable algorithm for restoring division is as follows. Fig. 4.4 using non-restoring division.
1. Align the divisor with the most-significant bit of the Division by multiplication
dividend.
Because both computers and microprocessors perform
2. Subtract the divisor from the partial dividend. division less frequently than multiplication, some processors
3. If the resulting partial dividend is negative, place a zero implement multiplication but not division. It is, however, possi-
in the quotient and add back the divisor to restore the ble to perform division by means of multiplication, addition,
partial dividend. and shifting.
196 Chapter 4 Computer arithmetic
Start
Align most-significant
bits of divisor and dividend
No Yes
Is result positive?
Is
Yes divisor aligned No
with LSB of
dividend?
End
Suppose we wish to divide a dividend N by a divisor D to If we now repeat the process with K (1 Z2), we get
obtain a quotient Q, so that Q N/D. The first step is to scale
N(1 Z) 1 Z2 N(1 Z)(1 Z2)
D so that it lies in the range Q ·
1 Z2 1 Z2 1 Z4
12
D 1 This process may be repeated n times with the result that
This operation is carried out by shifting D left or right and
N N(1 Z)(1 Z )(1 Z ) (1 Z2 )
n1
2 4 ...
recording the number of shifts—rather like normalization in
Q
floating point arithmetic. We define a new number, Z, in D 1 Z2
n1
Start
Is
Yes divisor aligned No
with LSB of
dividend?
Restore final divisor to
get remainder
End
calculated from the above formula must be scaled by the multiplication because the two’s complement system cannot be
same factor to produce the desired result. used for signs and unsigned multiplication.
■ SUMMARY ■ PROBLEMS
In this chapter we have looked at how numerical information is 4.1 Convert the following decimal integers to their natural
represented inside a digital computer. We have concentrated on binary equivalents.
the binary representation of numbers, because digital (a) 12 (d) 4090
computers handle binary information efficiently. Because both (b) 42 (e) 40900
positive and negative numbers must be stored and manipulated (c) 255 (f) 65530
by a computer, we have looked at some of the ways in which
digital computers represent negative numbers. The two’s 4.2 Convert the following natural binary integers to their
complement system is used to represent negative integers, decimal equivalents.
whereas a biased representation is used to represent negative (a) 110 (c) 110111
exponents in floating point arithmetic and a floating point (b) 1110110 (d) 11111110111
mantissa uses a sign and magnitude representation.
4.3 Complete the table below.
Because digital computers sometimes have to work with
very large and very small numbers, we have covered some of Decimal Binary Hexadecimal Base 7
the ways in which the so-called scientific notation is used 37
to encode both large and small numbers. These numbers 99
are stored in the form of a mantissa and a magnitude 10101010
(i.e. the number of zeros before/after the binary point) and are 11011011101
called floating point numbers. Until recently, almost every 256
computer used its own representation of floating point numbers. CAB
Today, the IEEE standard for the format of floating point numbers 12
666
has replaced most of these ad hoc floating point formats.
At the end of this chapter we have briefly introduced the
operations of multiplication and division and have 4.4 Convert the following base 5 numbers into base 9
demonstrated how they are mechanized in digital computers. equivalents. For example, 235 149.
Special hardware has to be used to implement signed (a) 14 (b) 144 (c) 444 (d) 431
4.10 Multiplication and division 199
4.5 Convert the following decimal numbers to their binary 4.11 Convert the following decimal numbers into BCD form.
equivalents. Calculate the answer to five binary places and (a) 1237 (b) 4632 (c) 9417
round the result up or down as necessary.
4.12 Perform the following additions on the BCD numbers
(a) 1.5 (d) 1024.0625
using BCD arithmetic.
(b) 1.1 (e) 3.141592
(c) 1/3 (f) 1/兹2 (a) 0010100010010001 (b) 1001100101111000
0110100001100100 1001100110000010
4.6 Convert the following binary numbers to their decimal
equivalents. 4.13 The 16-bit hexadecimal value C12316 can represent many
things. What does this number represent, assuming that it is the
(a) 1.1 (d) 11011.111010
following:
(b) 0.0001 (e) 111.111111
(c) 111.101 (f) 10.1111101 (a) an unsigned binary integer
(b) a signed two’s complement binary integer
4.7 Complete the following table. Calculate all values to four (c) a sign and magnitude binary integer
places after the radix point. (d) an unsigned binary fraction
4.14 Convert the following 8-bit natural binary values into their
Decimal Binary Hexadecimal Base 7 Gray code equivalents.
0.25 (a) 10101010
0.35 (b) 11110000
11011.0111 (c) 00111011
111.1011 4.15 Convert the following 8-bit Gray code values into their
2.08 natural binary equivalents.
AB.C (a) 01010000
1.2 (b) 11110101
66.6 (c) 01001010
4.16 What are the Hamming distances between the following
4.8 Calculate the error (both absolute and as a percentage) if pairs of binary values?
the following decimal fractions are converted to binary
(a) 00101111 (b) 11100111
fractions, correct to 5 binary places. Convert the decimal
01011101 01110101
number to six binary digits and then round up the fifth bit if the
(c) 01010011 (d) 11111111
sixth bit is a 1.
00011011 00000111
(a) 0.675 (e) 11011101 (f) 0011111
(b) 0.42 11011110 0000110
(c) 0.1975
4.17 Decode the Huffman code below, assuming that the valid
4.9 An electronics engineer has invented a new logic device codes are P 0, Q 10, R 110, and S 111. How many bits
that has three states: 1, 0, and 1. These states are would be required if P, Q, R, and S had been encoded as 00, 01,
–
represented by 1, 0, and 1, respectively. This arrangement may 10, and 11, respectively?
be used to form a balanced ternary system with a radix 3, but
where the trits represent -1, 0, 1 instead of 0, 1, 2. The 000001110111000000101111111101010001111100010
following examples illustrate how this system works. 4.18 The hexadecimal dump from part of a microcomputer’s
memory is as follows.
Ternary Balanced ternary Decimal
11 11 4 (3 1) 0000 4265 6769 6EFA 47FE BB87 0086 3253 7A29
––
12 111 5 (9 3 1) 0010 698F E000
–
22 101 8 (9 1) The dump is made up of a series of strings of characters, each
––
1012 1111 32 (27 9 3 1) string being composed of nine groups of four hexadecimal
characters. The first four characters in each string provide the
Write down the first 15 decimal numbers in the balanced starting address of the following 16 bytes. For example, the first
ternary base. byte in the second string (i.e. $C9) is at address $0010 and the
4.10 The results of an experiment fall in the range second byte (i.e. $8F) is at address $0011.
4 to 9. A scientist reads the results into a computer and The 20 bytes of data in the two strings represent the
then processes them. The scientist decides to use a 4-bit following sequence of items (starting at location 0000):
binary code to represent each of the possible inputs. Devise a (a) five consecutive ASCII-encoded characters
4-bit code capable of representing numbers in the range (b) one unsigned 16-bit integer
4 to 9. (c) one two’s complement 16-bit integer
200 Chapter 4 Computer arithmetic
(d) one unsigned 16-bit fraction Show that no two valid code words differ by less than 3 bits.
(e) one six-digit natural BCD integer Demonstrate that an error in any single bit can be used to locate
(f) one 16-bit unsigned fixed-point number with a 12-bit the position of the error and, therefore, to correct it.
integer part and a 4-bit fraction
4.24 Examine the following H7,4 Hamming code words and
(g) One 4-byte floating point number with a sign bit and true
determine whether the word is a valid code word. If it isn’t valid,
fraction plus an exponent biased by 64
what should the correct code word have been (assuming that
S E 64 Mantissa only 1 error is present)?
(a) 0000000
8 bits 24 bits (b) 1100101
(c) 0010111
Decode the hexadecimal data, assuming that it is interpreted as
4.25 Convert the following image into a quadtree.
above.
4.19 A message can be coded to protect it from unauthorized F = full (all elements1)
readers by EORing it with a binary sequence of the same length E = empty (all elements0)
to produce an encoded message. The encoded message is P = partially filled
decoded by EORing it with the same sequence that was used to
decode it. If the ASCII-encoded message used to generate the 2 3
code is ALANCLEMENTS, what does the following encoded 0 1
message (expressed in hexadecimal form) mean?
Quadrant
09 09 0D 02 0C 6C 12 02 17 02 10 73 numbering
4.20 A single-bit error-detecting code appends a parity bit to a
4.26 Convert the following image into a quadtree.
source word to produce a code word. An even parity bit is
chosen to make the total number of ones in the code word even
(this includes the parity bit itself). For example the source words
0110111 and 1100110 would be coded as 01101111 and
1100110, respectively. In these cases the parity bit has been
located in the LSB position. Indicate which of the following
hexadecimal numbers have parity errors.
4.31 Using 8-bit arithmetic throughout, express the following 4.38 Write down the largest base 5 positive integer in n digits
decimal numbers in two’s complement binary form: and the largest base 7 number in m digits. It is necessary to
(a) 4 (d) 25 (g) 127 represent n-digit base 5 numbers in base 7. What is the
(b) 5 (e) 42 (h) 111 minimum number m of digits needed to represent all possible
(c) 0 (f) 128 n-digit base 5 numbers? Hint—the largest m-digit base-7
number should be greater than, or equal to, the largest n-digit
4.32 Perform the following decimal subtractions in 8-bit two’s base 5 number.
complement arithmetic. Note that some of the answers will
result in arithmetic overflow. Indicate where overflow has 4.39 A 4-bit binary adder adds together two 4-bit numbers, A
occurred. and B, to produce a 4-bit sum, S, and a single-bit carry-out C.
What is the range of outputs (i.e. largest and smallest values)
(a) 20 (b) 127 (c) 127 (d) 5
that the adder is capable of producing? Give your answer in
5 126 128 20
both binary and decimal forms.
(e) 69 (f) 20 (g) 127 (h) 42 An adder is designed to add together two binary coded
42 111 2 69 decimal (BCD) digits to produce a single digit sum and a 1-bit
120 carry-out. What is the range of valid outputs that this circuit
may produce?
4.33 Using two’s complement binary arithmetic with a 12-bit The designer of the BCD adder decides to use a pure binary
word, write down the range of numbers capable of being adder to add together two BCD digits as if they were pure 4-bit
represented (both in decimal and binary formats) by giving the binary numbers. Under what circumstances does the binary
smallest and largest numbers. What happens when the smallest adder give the correct BCD result? Under what circumstances is
and largest numbers are the result incorrect (i.e. the 4-bit binary result differs from the
(a) incremented? (b) decremented? required BCD result)?
What algorithm must the designer apply to the 4-bit output
4.34 Distinguish between overflow and carry when these terms of the binary adder to convert it to a BCD adder?
are applied to two’s complement arithmetic on n-bit words.
4.40 Design a 4-bit parallel adder to add together two 4-bit
4.35 Write down an algebraic expression giving the value of the natural binary-encoded integers. Assume that the propagation
n-bit integer N an1, an2, . . . ,a1, a0 for the case where N delay of a signal through a gate is t ns. For your adder, calculate
represents a two’s complement number. the time taken to add
Hence prove that (in two’s complement notation) the (a) 0000 to 0001
representation of a signed binary number in n 1 bits may be (b) 0001 to 0001
derived from its representation in n bits by repeating the (c) 0001 to 1111
leftmost bit. For example, if n 12 10100 in 5 bits,
n 12 110100 in 6 bits. 4.41 Design a full subtractor circuit that will subtract bit X
together with a borrow-in bit Bi from bit Y to produce a
4.36 Perform the additions below on 4-bit binary numbers. difference bit D Y X Bi, and a borrow-out Bo.
(a) 0011 (b) 1111 (c) 0110 (d) 1100
4.42 In the negabinary system an i-bit binary integer, N, is
1100 0001 0111 1010
expressed using positional notation as
In each case, regard the numbers as being (i) unsigned integer,
(ii) two’s complement integer, and (iii) sign and magnitude N a0 10 20 a1 11 21 . . .
integer. Calculate the answer and comment on it where ai1 1i1 2i1
necessary.
This is the same as conventional natural 8421 binary
4.37 Add together the following pairs of numbers. Each number weighted numbers, except that alternate positions have the
is represented in a 6-bit sign-and-magnitude format. Your additional weighting 1 and 1.
answer should also be in sign-and-magnitude format. Convert
each pair of numbers (and result) into decimal form in order to For example, 1101 (1 1 8) (1 1 4)
check your answer. (1 0 2) (1 1 1) 8 4 1 3
(a) 000111 (b) 100111 The following 4-bit numbers are represented in
010101 010101 negabinary form. Convert them into their decimal
equivalents.
(c) 010101 (d) 111111
000111 000001 (a) 0000
(b) 0101
(e) 110111 (f) 011111 (c) 1010
110111 000110 (d) 1111
202 Chapter 4 Computer arithmetic
4.43 Perform the following additions on 4-bit negabinary two packed floating point numbers and end with the packed
numbers. The result is a 6-bit negabinary value. You must sum.
construct your own algorithm.
(c) Perform the subtraction of 1000.708 100.25 using the
(a) 0000 (b) 1010 (c) 1101 (d) 1111 two IEEE-format binary floating point numbers you
0001 0101 1011 1111 obtained for 1000.708 and 100.25 in part (a) of this
question. You should begin the calculation with the packed
4.44 Convert the following signed decimal numbers into their
floating-point representations of these numbers and end
6-bit negabinary counterparts.
with the packed result.
(a) 4 (b) 7 (c) 7 (d) 10
4.49 Convert the 32-bit IEEE format number C33BD00016 into
4.45 What is the range of values that can be expressed as an its decimal representation.
n-bit negabinary value? That is, what is the largest positive
4.50 Explain why floating point numbers have normalized
decimal number and what is the largest negative decimal
mantissas.
number that can be converted into an n-bit negabinary form?
4.51 What is the difference between a truncation error and a
4.46 A computer has a 24-bit word length, which, for the
rounding error?
purpose of floating point operations, is divided into an 8-bit
biased exponent and a 16-bit two’s complement mantissa. 4.52 The following numbers are to be represented by three
Write down the range of numbers capable of being represented significant digits in the base stated. In each case perform the
in this format and their precision. operation by both truncation and rounding and state the
relative error created by the operation.
4.47 Explain the meaning of the following terms (in the context
of floating point arithmetic): (a) 0.11001002 (b) 0.1A3416
(c) b. 0.00110112 (d) d. 0.12AA16
(a) biased exponent
(b) fractional part 4.53 We can perform division by multiplication to calculate
(c) packed Q N/D. The iterative expression for Q is given by
(d) unpacked
Q N(1 Z)(1 Z2)(1 Z4)…(1 Z2 )
n1
(e) range
(f) precision where Z 1 D.
(g) normalization
(h) exponent overflow/underflow If N 5010 and D 0.7410, calculate the value of Q. Evaluate
Q using 1, 2, 3, and 4 terms in the expression.
4.48 An IEEE standard 32-bit floating point number has the
format N 1S 1.F 2E127, where S is the sign bit, F is the 4.54 For each of the following calculations (using 4-bit
fractional mantissa, and E the biased exponent. arithmetic) calculate the value of the Z (zero), C (carry), N
(negative), and V (overflow) flag bits at the end of the operation.
(a) (i) Convert the decimal number 1000.708 into the IEEE
format for floating point numbers. (a) 1010 1010
(b) 1111 0001
(ii) Convert the decimal number 100.125 into the IEEE (c) 1111 0001
format for floating point numbers. (d) 0110 0110
(e) 1010 1110
(b) Describe the steps that take place when two IEEE floating
(f) 1110 1010
point numbers are added together. You should start with the
The instruction set architecture 5
CHAPTER MAP
2 Logic elements and 5 The instruction set 6 Assembly language 7 Structure of the CPU
Boolean algebra architecture programming Having described what a
The basic building blocks, gates, In this chapter we introduce the Having introduced the basic computer does, the next step is
from which we construct the computer’s instruction set operations that a computer can to show how it operates. Here we
computer. architecture (ISA), which carry, the next step is to show examine the internal organization
describes the low-level how they are used to construct of a computer and demonstrate
programmer’s view of the entire programs. We look at how how it reads instructions from
3 Sequential logic computer. The ISA defines the the 68K processor uses machine- memory, decodes them, and
The building blocks, flip-flops, type of operations a computer level instructions to implement executes them.
used to construct devices that carries out. We are interested in some simple algorithms.
store data and counters. three aspects of the ISA: the
nature of the instructions, the
8 Other processors
resources used by the We have used the 68K to
4 Computer arithmetic instructions (registers and introduce the CPU and assembly
The representation of numbers in memory), and the ways in which language programming. Here we
a computer and the arithmetic the instructions access data provide a brief overview of two
used by digital computers. (addressing modes). The 68K other processors: a simple 8-bit
microprocessor is used to microcontroller and a 32-bit RISC
illustrate the operation of a real processor.
device.
INTRODUCTION
There are two ways of introducing the processor. One is to explain how a computer works at the
level of its internal information flow by describing the way in which information is transmitted
between registers and internal units and showing how an instruction is decoded and interpreted
(i.e. executed). The other approach is to introduce the native language, or machine code, of a
computer and show what computer instructions can do. In practice no-one writes programs in
machine code; instead they use assembly language which is a human-readable representation of
machine code (see the box ‘The assembler’).
Both approaches to the description of a computer are valid. Beginning with how a computer
works by examining its internal operation is intuitive. Once you understand how information flows
from place to place through adders and subtractors, you can see how instructions
are constructed and then you can examine how sequences of instructions implement
programs.
Unfortunately, beginning with the hardware and looking at very primitive operations hides the
big picture. You don’t immediately see where you are going or understand why we need the
primitive operations in the first place. This bottom-up approach is rather like studying cellular
biochemistry as the first step in a course on sociology. Knowing how a brain cell works doesn’t tell
you anything about human personality.
204 Chapter 5 The instruction set architecture
Beginning a course with the computer’s instruction set gives you a better idea of what a
computer does in terms of its capabilities. Once you know what a computer does, you can look
inside it and explain how it implements its machine code operations.
In the previous edition of Principles of Computer Hardware I began with the internal
organization of a computer and explained the steps involved in the execution of an instruction.
Later we looked at the nature of instructions. In this edition I’ve reversed the order and we begin
with the instruction set and leave the internal organization of the computer until later. This
sequence enables students to take lab classes early in the semester and build up practical
experience by writing assembly language programs.
We begin this chapter by introducing the notion of computer architecture, the instruction
set, and the structure of a computer. We describe a real processor, the Motorola 68K.
This processor is a contemporary of the Intel 8086 but has a more sophisticated
architecture and its instruction set is easier for students to understand. This processor
has evolved like the corresponding Intel family and its variants are now called the
ColdFire family.
THE ASSEMBLER
An assembly language program starts off as a text file written by a programmer (or created by
a compiler). An assembler takes the text file together with any library functions required by
the program and generates the binary code that the target executes.
The addresses of code and data generated by the assembler are not absolute (i.e. actual), but
refer to the locations with respect to the start of the program. Another program called a linker
takes one or more code modules generated by the assembler, puts them together, and creates
the actual addresses of data in memory. The output of the linker is the binary that can be exe-
cuted by the actual computer. This mechanism allows you to write a program in small chunks
and to put them together without having to worry about addresses in the different chunks.
pilot’s crew sees them as a colleague with whom they relate at Figure 5.1 illustrates how a computer can be viewed in differ-
the personal level. The pilot’s doctor sees a complex biologi- ent ways. The outer level is the applications layer that the end
cal mechanism. It’s exactly the same with computers—you user sees. This level provides a virtual spreadsheet or any
can view them in different ways. other user-application because, to all intents and purposes,
Suppose you run a spreadsheet on a computer. As far as the machine looks like a spreadsheet machine that does noth-
you’re concerned, the machine is a spreadsheet machine that ing else other than implement spreadsheets.
behaves exactly as if it were an electronic spreadsheet doing A spreadsheet, a word processor, or a game is invariably
nothing other than spreadsheet calculations. You could con- implemented by expressing its behavior in a high-level lan-
struct an electronic device to directly handle spreadsheets, but guage such as C or Java. You can view a computer as a
no one does. Instead they construct a computer and run a machine that directly executes the instructions of a high-level
program to simulate a spreadsheet. language. In Fig. 5.1 the layer below the application level is
the high-level language layer.
It’s difficult to construct a computer that executes a high-
ARCHITECTURE AND ORGANIZATION
level language like C. Computers execute machine code, a
Architecture describes the functionality of a system, whereas primitive language consisting of simple operations such as
organization describes how it achieves that functionality.
addition and subtraction, Boolean operations, and data
Consider the automobile as a good example of the distinction
movement. The statements and constructs of a high-level
between architecture and organization.The architecture of an
automobile covers its steering, acceleration, and braking. An language are translated into sequences of machine code
automobile’s gearbox is part of its organization rather than its instructions by a compiler. The machine code layer in Fig. 5.1
architecture.Why? Because the gearbox is a device that is responsible for executing machine code; it’s this layer that
facilitates the operation of an automobile—it is there only defines the computer’s architecture.
because we can’t create engines that drive wheels directly. Figure 5.1 shows two layers between the machine level and
high-level language levels. The assembly language level sits on
Application
Word processor
High-level Database
language Game
Operating
system
Assembly C
Language Java
LISP
Machine level
Windows
Unix
Microprogram
Digital
logic
Hardware/software
interface
top of the machine level and represents the human-readable the individual logic elements are hardwired to each other by
form of the machine code; for example, the binary fixed connections. You can’t program or modify the behavior
string 00000010100000010001000000000011 might be the of components at this level. This statement isn’t strictly true.
machine code instruction represented in assembly language Programmable logic elements whose functionality can be
as MOVE D2,D1 (move the number in register D2 to the modified do exist; for example, it is possible to reconfigure
register D1).1 internal connections using the same technology found in
To say that assembly language is just a human-readable flash memory. In the future we may incorporate such comp-
version of machine code is a little simplistic. An assembly lan- onents in processors to enable manufacturers to update a
guage contains facilities that make it easier for a human to processor’s instruction set or to fix hardware bugs.
write a program. Moreover, an assembly language allows you You could, in fact, go even deeper into the hierarchy of
to determine where data and code is loaded into memory. We Fig. 5.1 because there is a physical layer below the digital logic
will not use sophisticated assembly language mechanisms layer. This physical layer is concerned with the individual
and it is reasonably true to say that assembly language transistors and components of the computer that are used to
instructions are human-readable versions of the strings of 1s fabricate gates, registers, and buses. Below the physical layer
and 0s that represent the machine-code binary instructions. exists the individual atoms of the transistors themselves.
The conventions we will adopt in the structure and layout of We’re not interested in the physical layer and the atomic
assembly language programs are, normally, those of the layers, because that’s the province of the semiconductor
Motorola assembler. engineer and physicist. In this chapter we are concerned with
In Fig. 5.1 there is an additional layer between the assembly the machine-level and microprogram layers.
language layer and the high-level language layer called the operat-
ing system level. Strictly speaking, this layer isn’t like the other lay-
ers. The operating system runs on top of the machine code and
assembly language layers and provides facilities required by 5.2 Introduction to the CPU
higher-level layers (e.g. memory management and the control of
peripherals such as the display and disk drives). Before we look at what a CPU does or how it works, it is
Below the machine-level layer is the microprogram layer. A important to understand the relationship between the CPU,
heavy line separates the machine level and microprogram the memory, and the program. Let’s take a simple program to
layers because you can access all the layers above this line. The calculate the area of a circle and see how the computer deals
two innermost layers (microprogram and digital logic) are with it. In what follows the computer is a hypothetical
not accessible to the programmer. machine devoid of all the complications associated with real-
The microprogram layer is concerned with the primitive ity. Throughout this section we assume that we are operating
operations that take place inside the computer during the at the machine level.
execution of a machine code operation. For example, a MOVE The area of a circle, A, can be calculated from the
D2,D1 machine-level instruction might be interpreted by formula A r2. When people evaluate the area of a circle,
executing a sequence of micro-operations inside the com- they automatically perform many of the calculations at a sub-
puter. These micro-operations transfer information between conscious level. However, when they come to write programs,
functional units such as registers and buses. The sequences of they must tell the computer exactly what it must do, step by
micro-operations that interpret each machine level instruc- step. To illustrate this point, take a look at the expression r2.
tion are stored in firmware within the computer. Firmware is We write r2, but we mean a number, which we have given the
the term for read-only memory containing programs or symbol r, multiplied by itself. We never confuse the symbol r
other data that controls the processor’s operation. Firmware with the value that we give to r when we evaluate the expres-
cannot normally be modified, although modern systems can sion. This may seem an obvious point, but students some-
update their firmware from time to time. times have great difficulty when they encounter the concepts
Some modern computers don’t have a microprogram layer. of an address and data in assembly language. Although
If an instruction set is very regular and all instructions involve people never confuse the symbol for the radius (i.e. r) and its
a simple, single-step operation, there is no need for a micro- value, say 4 cm, you must remember that an address (i.e. the
program to translate instructions into primitive operations. place where the value of r is stored) and data (i.e. the value
Where there’s a simple relationship between the binary code of r) are both binary quantities inside the computer.
of an instruction and what it does, the microprogram layer
directly translates a machine-level instruction into the control 1
Throughout this chapter we adopt the convention used by the 68K
signals required to implement the instruction.
microprocessor that the rightmost register in an instruction is the desti-
The innermost level of the computer is the digital logic level nation operand (i.e. where the result goes). To help you remember this,
which consists of the gates, flip-flops, and buses. At this level we will use a bold face to indicate the destination operand.
5.2 Introduction to the CPU 207
Memory
Program
get r
square r
multiply r 2 by π read
output the result
Constants read
π Processor
(CPU)
write
Variables
A read
r read
INPUT write
Figure 5.2 The relationship
External between the memory,
system OUTPUT
processor, and program.
Memory system
Processor Address port
0 7
1 3
Address Address bus 2 15 Address
(location 4)
3 20
4 3
Data port Memory cell
5 7 containing the
6 3 value 3
Data Data bus
7 8
8 8
Control Control bus 9 42
signals 10 12
The control bus 11 19
determines the
direction of information Figure 5.3 The random
transfer access memory system.
flow in two directions. During a write cycle data generated by the machine is harder to learn, but it does illustrate the real-world
program flows from the CPU to the memory where it is stored constraints faced by its designer.
for later use. During a read cycle the CPU requests the retrieval There’s no perfect solution to this dilemma. We’ve chosen
of a data item that is transferred from the memory to the CPU. a real machine, the 68K, to introduce an assembly language
Suppose the instruction ADD X,Y,Z corresponding to the and machine-level instructions. The 68K is a classic CISC
operation X Y Z is stored in memory.3 The CPU must processor and is easer to understand than the Pentium family
first fetch this instruction from memory and bring it to the because the 68K has a more regular instruction set. Another
CPU. Once the CPU has analyzed or decoded the instruction, reason for using the 68K processor to illustrate the ISA is its
the CPU has to get the values of X and Y from memory. The interesting architectural features.
actual values of X and Y are read from the memory and sent to The architecture of a processor is defined by its register set,
the CPU. The CPU adds these values and sends the result, Z, its instruction set, and its addressing modes (the way in which
back to memory for storage. Remember that X, Y, and Z are the location of data in memory is specified). Figure 5.5
symbolic names for the locations of data in memory. describes the 68K’s register set. There are eight data registers
Few computers are constructed with two independent used to hold temporary variables, eight address registers used
information paths between the CPU and its memory as to hold pointers to data, a status register, and a program
Fig. 5.4 suggests. Most computers have only one path along counter, which determines the next instruction to be executed.
which information flows between the CPU and its memory— Data registers are 32 bits wide and but can be treated as if
data and instructions have to take turns flowing along this they were 8 or 16 bits wide. Address registers always hold 32-
path. Two paths are shown in Fig. 5.4 simply to emphasize bit values and are always treated as 32-bit registers that hold
that there are two types of information stored in the memory two’s complement values. However, you can perform an
(i.e. the instructions that make up a program and the data operation on the low-order 16 bits of an address register and
used by the program). Indeed, forcing data and instructions the result will be sign-extended to 32 bits automatically.
to share the same path sometimes creates congestion on
the data bus between the CPU and memory that slows the
computer down. This effect is called the von Neumann 5.3.1 The instruction
bottleneck. We now look at the instructions executed by the 68K proces-
sor. There has been remarkably little progress in instruction
set design over the last few decades and computers do today
5.3 The 68K family almost exactly what they did in the early days.4 Much of the
32 bits
16 bits
8 bits
D0
The eight data registers D1
hold scratchpad D2
information and are used
by data processing D3
instructions. You can treat D4
data registers as 8-bit,
16-bit, or 32-bit entities. D5
D6
D7
progress over the last six decades has been in computer tech- R may be memory locations or registers. The two-address
nology, organization, and implementation rather than in instruction is in blue because that is the format used by the 68K.
computer architecture. Let’s begin with three operands because it’s intuitively easy
Computer instructions are executed sequentially, one by to understand. A three-address computer instruction can be
one in turn, unless a special instruction deliberately changes written
the flow of control or unless an event called an exception
(interrupt) takes place.
The structure of instructions varies from machine to where operation defines the nature of the instruction,
machine. The format of an instruction running on a Pentium is source1 is the location of the first operand, source2 is the
different to the format of an instruction running on a 68K location of the second operand, and destination is the
(even though both instructions might do the same thing). location of the result. The instruction ADD P,Q,R adds P and
Instructions are classified by type (what they do) and by the Q to get R (remember that we really means that the instruc-
number of operands they take. The three basic instruction types tion adds the contents of location P to the contents of loca-
are data movement which copies data from one location to tion Q and puts the sum in location R). Having reminded you
another, data processing, which operates on data, and flow con- that when we mention a variable we mean the contents of the
trol, which modifies the order in which instructions are exe- memory location or register specified by that variable, we will
cuted. Instruction formats can take zero, one, two, or three not emphasize it again.
operands. Consider the following examples of instructions with Modern microprocessors don’t implement three-address
zero to three operands. In these examples operands P, Q, and instructions exactly like this. It’s not the fault of the instruc-
tion designer, but it’s a limitation imposed by the practicali-
ties of computer technology. Suppose that a computer has a
Operands Instruction Effect 32-bit address that allows a total of 232 bytes of memory to be
Three ADD P,Q,R Add P to Q and put the result in R accessed. The three address fields, P, Q, and R would each be
Two ADD P,Q Add P to Q and put the result in Q 32 bits, requiring 3 32 96 bits to specify operands.
One ADD P Add P to an accumulator Assuming a 16-bit operation code (allowing up to 216 65 536
Zero ADD Add the top two items on the stack instructions), the total instruction size would be
212 Chapter 5 The instruction set architecture
112 bits
16 bits 32 bits 32 bits 32 bits
Memory
R2
R3
R4
96 16 112 bits or 14 bytes. Figure 5.6(a) illustrates a the remaining 32 15 17 bits to specify the instruction, as
hypothetical three-address instruction. Fig. 5.6(b) demonstrates. Figure 5.7 illustrates the operation
Computer technology developed when memory was very of an instruction with three register addresses.
expensive indeed. Implementing a 14-byte instruction was We’ll use the ADD instruction to add together four values
not cost effective in the 1970s. Even if memory had been in registers R2, R3, R4, and R5. In the following fragment of
cheap, it would have been too expensive to implement 112- code, the semicolon indicates the start of a comment field,
bit-wide data buses to move instructions from point to point which is not part of the executable code. This code is typical
in the computer. Finally, main memory is intrinsically slower of RISC processors like the ARM.
than on-chip registers.
The modern RISC processor allows you to specify three ADD R1,R2,R3 ;R1 = R2 R3
addresses in an instruction by providing three 5-bit operand ADD R1,R1,R4 ;R1 R1 R4
address fields. This restriction lets you select from one of only ADD R1,R1,R5 ;R1 R1 R5
32 different operands that are located in registers within the R2 R3 R4 R5
CPU itself.5 By using on-chip registers to hold operands, the
time taken to access data is minimized because no other stor- 5
I will use RISC very loosely to indicate the class of computers that
age mechanism can be accessed as rapidly as a register. An have a register-to-register architecture such as the ARM, MIPS,
instruction with three 32-bit operands requires 3 5 bits to PowerPC, and SPARC. The Motorola 68K and the Intel Pentium are not
specify the operands, which allows a 32-bit instruction to use members of this group.
5.3 The 68K family 213
REGISTER-TO-REGISTER ARCHITECTURES
Computers act on data in registers or memory locations. Many that one of the source operands is destroyed by the
data processing operations operate on two operands; for instruction.
example, X Y or XY or X⋅Y or Z ⊕ Y. These operations are A typical two-address instruction is ADD D0,P. This adds
said to be dyadic because they require two operands. The the contents of memory location P to the contents of register
result of such a dyadic operation generates a third operand, D0 and deposits the result in location P. The original contents
called the destination operand; for example, Z A B. of P are destroyed.
First-generation microprocessors of the 1970s and 1980s Register-to-register architectures permit operations only on
allowed one source operand to be in memory and one the contents of on-chip registers such as ADD R1, R2, R3.The
source operand to be in a register in the CPU. A separate desti- source or destination of an operand is never a memory loca-
nation address was not permitted, forcing you to use one of tion. Consequently, registers must first be loaded from mem-
the source operands as a destination. This restriction means ory and the results of an operation transferred to memory.
Two-address machines 8-bit code is verbose because you have to load data into the
A CISC machine like the 68K has a two-address instruction accumulator, process it, and then store it to avoid it being
format. Clearly, you can’t execute P Q R with two overwritten by the next data processing instruction.
operands. You can execute Q ← P Q. One operand appears One-address machines are still widely used in embedded
twice, first as a source and then as a destination. The opera- controllers in low-cost, low-performance systems such as
tion ADD P,Q performs the operation [Q] ← [P] [Q]. The toys. We look at an 8-bit processor in Chapter 9.
price of a two-operand instruction format is the destruction, Zero-address machines
by overwriting, of one of the source operands.
Most computer instructions can’t directly access two A zero-address machine doesn’t specify the location of
memory locations. Typically, the operands are either two reg- an operand because the operand’s location is fixed. A
isters or one register and a memory location; for example, the zero-address machine uses a stack, which is a data structure in
68K ADD instruction can be written the form of a queue where all items are added and removed from
the same end. An ADD instruction would pop the top two items
Instruction RTL definition Mode off the stack, add them together, and push the result on the stack.
Although stack machines have been implemented to execute
ADD D0,D1 [D1] ← [D1] + [D0] Register to register languages like FORTH, processors with stack-based architec-
ADD P,D2 [D2] ← [D2] + [P] Memory to register tures have been largely confined to the research lab. There is one
ADD D7,P [P] ← [P] + [D7] Register to memory exception. The language JAVA is portable because it is complied
into bytecode, which runs on a stack machine, which is simu-
The 68K has seven general-purpose registers, D0 to D7; there lated on the real target machine. We will return to the stack later.
are no restrictions on the way in which you use these 68K instruction format
registers; that is, if you can use Di you can also use Dj for any
i or j from 0 to 7. We will look at 68K instruction in detail when we’ve covered
more of the basics. The 68K has a two-address instruction
One-address machines format. An operand may be a register or a memory location.
A one-address machine specifies one operand in the instruction. The following are valid 68K instructions.
The second operand is a fixed register called an accumulator,
which doesn’t have to be specified. For example, the operation Instruction RTL definition
one-address instruction ADD P means [A] ← [A] [P] .
The notation [A] indicates the contents of the accumulator.
A simple operation R P Q can be implemented by the
following fragment of 8-bit code (from a 6800 processor). Register to register
Memory to register
Register to memory
The only memory to
memory operation
Eight-bit machines of the Intel 8080 and Motorola 6800 Only one operand
required for the clear
eras have one-address architectures. As you can imagine,
instruction
214 Chapter 5 The instruction set architecture
Consider the 68K’s ADD instruction ADD $00345678, Suppose a processor supports operations that act on a
D2. This instruction performs the operation [D2] ← subsection of a register. This raises the interesting ques-
[D2] [34567816]. The two source operands provide the tion, ‘What happens to the bits that do not take part in
addresses: one address is a memory location and the other a the operation?’ Figure 5.8 demonstrates how we can handle
data register. This instruction format is sometimes called operations shorter than 32 bits. Assume that a register is
‘one-and-a-half address’ because you can specify only a hand- partitioned as Fig. 5.8(a) demonstrates. In this example, we
ful of registers. are going to operate on data in the least-significant byte.
CISC processors use variable-length instructions. The We can do three things, as Fig. 5.10(b) and (c) demon-
minimum 68K instruction size is 16 bits and instructions can strates. In (b) the bits not acted on remain unchanged—this
be constructed by chaining together successive 16-bit values in is the option implemented by the 68K when it operates on
memory. For example, the 68K is one of the few processors to data registers. In (c) the bits that do not take part in the oper-
provide a memory-to-memory MOVE instruction that supports ation are cleared to zero. In (d) the bits that do not take part
absolute 32-bit addressing. You can write MOVE $12345678, in the operation are set to the value of the most-significant
$ABCDDCBA, which takes 10 consecutive bytes in memory and bit (the sign bit) of the bits being operated on. This option
moves the contents of one memory location to another. preserves the sign of two’s complement values. Most proces-
sors implement options (c) or (d).
Subword operations RISC processors like the ARM do not allow general data
First-generation microprocessors had 8-bit data wordlengths processing operations on fever than 32 bits. However, they do
and operations acted on 8-bit values to produce 8-bit results. support 8-bit and 16-bit load instructions with a zero or sign
When 16-bit processors appeared, operations were applied to extension.
16-bit values to create 16-bit results. However, the byte did not The 68K calls 32-bit values longwords, 16-bit values words,
go away because some types of data such as ASCII-encoded and 8-bit values bytes. Motorola’s terminology is not univer-
characters map naturally on to 8-bit data elements. sal. Others use the term word to mean 32 bits and halfword to
If you wish to access bytes in a 16- or 32-bit processor, you means 16 bits. The 68K is an unusual processor because it
need special instructions. The Motorola 68K family deals with allows variable size operations on most of its data processing
8-bit, 16-bit, and 32-bit data by permitting most data process- instructions. By appending .B after an instruction, you per-
ing instructions to act on an 8-bit or a 16-bit slice of a register form an operation on a byte. Appending .W performs the
as well as the full 32 bits. RISC processors do not (generally) operation on a 16-bit word and appending .L performs the
support 8- or 16-bit operations on 32-bit registers, but they do operation on a 32-bit longword. Omitting a size suffix selects
support 8-bit and 16-bit memory accesses. a 16-bit default. Consider the following.
5.3.2 Overview of addressing modes called immediate because the constant is part of the instruc-
tion and is immediately available to the computer. The
A key concept in computing in both high- and low-level lan- addressing mode is also called immediate because the
guages is the addressing mode. Computers perform opera- operand is immediately available from the instruction and
tions on data and you have to specify where the data comes you don’t have to fetch it from memory or a register. When
from. The various ways of specifying the source or destina- you specify the absolute address of a source operand, the
tion of an operand are called addressing modes. computer has to get the address from the instruction and
We can’t discuss instructions, the ISA, or low-level pro- then read the data at that location.
gramming until we have introduced three fundamental Indirect addressing specifies a pointer to the actual
concepts in addressing: operand, which is invariably in a register. For example, the
instruction, MOVE (A0),D1 first reads the contents of regis-
●
Absolute addressing (the operand specifies the location ter A0 to obtain a pointer that gives you the address of
of the data) the operand. Then it reads the memory location specified by
●
Immediate addressing (the operand provides the the pointer in A0 to get the actual data. This addressing mode
operand itself) requires three memory accesses; the first is to read the
●
Indirect addressing (the operand provides a pointer instruction to identify the register containing the pointer, and
to the location of the data). the second is to read the contents of the register to get the
pointer, the third is to get the desired operand at the location
In absolute addressing you specify an operand by providing specified by the pointer.
its location in memory or in a register. For example, ADD You can easily see why this addressing mode is called indi-
P,D1 uses absolute addressing because the location of the rect because the address register specifies the operand indi-
operand P is specified as a memory location. Another example rectly by telling you where it is, rather than what it is.
of absolute addressing is the instruction CLR 1234, which Motorola calls this mode of address register indirect address-
means set the contents of memory location 1234 to zero. ing, because the pointer to the actual operand is in an address
When you specify a data register as an operand, that is also register. Figure 5.9 illustrates the effect of executing the oper-
an example of absolute addressing, although some call it reg- ation MOVE (A0),D0.
ister direct addressing. In Fig. 5.9 address register A0 points to a memory location;
In immediate addressing the operand is an actual value that is, the value it contains is the address of an operand in
rather than a reference to a memory location. The 68K memory. In this case A0 contains 1234 and is, therefore,
assembler indicates immediate addressing by prefixing the pointing at memory location 1234. When the instruction
operand with the ‘#’ symbol; for example, ADD #4,D0 means MOVE (A0),D0 is executed, the contents of the memory
add the value 4 to the contents of register D0 and put the location pointed at by A0 (i.e. location 1234) are copied
result in register D0. Immediate addressing lets you specify a into data register D0. In this example, D0 will be loaded
constant, rather than a variable. This addressing mode is with 3254.
Executing a MOVE (A0),D0 instruction The first three instructions set up the initial
Memory values. We load A0 with the address of the
A0
numbers. The location has the symbolic name
1234 1232
‘Table’. The # symbol precedes ‘Table’ because
Pointer
1234 3254 D0 A0 is being loaded with the address table and
not the contents of that address. Data register D0
Address register A0 is a pointer.1236
It contains the value 1234 and,
is used to hold the sum of the numbers and is
therefore, points to address location cleared prior to its first use. Finally, we put the
1234. If you use A0 to access memory, The effect of MOVE (A0),D0 number 20 into D1 to count the elements as we
you will access location 1234. is [D0] ← [[A0]]
add them.
Figure 5.9 Address register indirect addressing. The body of the code is in blue. The first
instruction fetches the byte pointed at by A0
and adds it to the running total in D0 and the second instruc-
tion points to the next byte element in the list. Note that when
Why do we implement this addressing mode? Consider the we increment the pointer we use a longword operation
following two operations. because all pointers are 32 bits.
MOVE (A0),D0 ;copy the item pointed at by A0 into D0 The last part of the program decrements the element count
ADD.L #2,A0 ;increment A0 to point to the next item by one and then branches back to ‘Next’ if we haven’t reached
The first operation loads D0 with the 16-bit element pointed zero. We look at the branching operations in more detail later.
at by address register A0. The second instruction increments The three addressing modes form a natural progression.
A0 by 2 to point to the next element. The increment is 2 Consider their definitions in RTL.
because the elements are 2 bytes (i.e. 16 bits) wide and
successive elements are 2 bytes apart in memory. Addressing mode Assembly RTL Memory
Address register indirect addressing allows you to step form accesses
though an array or table of values accessing consecutive ele- Immediate addressing MOVE #4,D1 [D1] ← 4 1
Absolute addressing MOVE P,D1 [D1] ← [P] 2
ments. Suppose we have a table of 20 consecutive bytes that
Indirect addressing MOVE (A1),D1 [D1] ← [[A1]] 3
we have to add together. We can write
IMPORTANT POINTS
The fragment of code to add the 20 numbers is, in principle, confuse operations on a pointer with operations on the
very straightforward. However, it contains aspects that many data elements at which they point!
beginners find confusing. Indeed, I would say that probably
3. Understand the meaning of the # symbol, which indicates
90% of the errors made by beginners are illustrated by this
a literal value. MOVE 1234,D0 puts the contents of mem-
fragment of code. Consider the following points.
ory location 1234 in register D0. MOVE #1234,D0 puts
1. Data register D0 is used to hold the running total. At the the number 1234 in D0. This is the single most common
machine level, registers and memory locations are not set mistake my students make.
to zero before they are used. Therefore, the programmer
4. An address register used to hold a pointer has to be loaded
must initialize their contents either by clearing them or by
with the value of the pointer. This value is a memory
loading a value into them.
location where the data lies. If the symbolic name for the
2. We are working with byte-wide data elements throughout. address of a table is PQR, then you point to PQR with
Therefore all operations on data in this problem have a .B MOVE.L #PQR,A0. You are putting an actual address in
suffix. All operations on pointers have an .L suffix. Do not A0 and not the contents of a memory location.
5.4 Overview of the 68K’s instructions 217
D1 Literal addressing
MOVE #4,D1 4 The operand is part
the instruction.
Memory
D1 Absolute addressing
MOVE 1234,D1 The operand is a register
or memory location.
Operand 1234
Memory
D1 Indirect addressing
MOVE (A0),D1 The operand is specified
by a pointer. In this case
pointer is in A0.
A0
Pointer Operand
Figure 5.10 illustrates these three addressing modes Logical A logical operation treats data as a string of bits and
graphically. performs a Boolean operation on these bits; for example,
11000111 AND 10101010 yields 10000010.
Shift A shift instruction moves the bits in a register one or
5.4 Overview of the 68K’s more places left or right; for example, shifting 00000111 one
instructions place left yields 00001110.
Bit A bit instruction acts on an individual bit in a register,
We now look at the type of operations that the 68K and rather than the entire contents of a register. Bit instructions
similar processors carry out on data. Here we are interested in allow you to test a single bit in a word (for 1 or 0), to set a bit,
general principles. In the next chapter we demonstrate how to clear a bit, or to flip a bit into its complementary state.
the instructions can be used. A typical two-operand Compare These instructions compare two operands and set
memory-to-register instruction has the format the processor’s status flags accordingly; for example, a
compare operation allows you to carry out the test.
Control Control instructions modify the flow of control; that
and is interpreted as [D0] ← [D0] [P]. The source
is, they change the normal sequential execution of instructions
operand appears first (left to right), then the destination
and permit instructions to be executed out of order.
operand. Instructions can be divided into various categories.
For our current purposes, we will consider the following
broad categories. 5.4.1 Status flags
Data movement These instructions copy data from one Before we continue we have to introduce the notion of the proces-
place to another; for example, from memory to a register or sor status register because its contents can be modified by the
from one register to another. execution of most instructions. The processor status register
Arithmetic Arithmetic instructions perform operations on records the outcome of an instruction and it can be used to imple-
data in numeric form. In this chapter we assume data is either ment conditional behavior by selecting one of two courses of
a signed or an unsigned integer. action. Some processors call this register a condition code register.
218 Chapter 5 The instruction set architecture
Conditional behavior is the feature of computer languages that named because the one thing it does not do is move data. Data
lets us implement high-level language operations such as movement instructions copy data; for example, the instruc-
tion MOVE Y,X copies the contents of Y to X but does not
modify the value of Y. You could say that a data movement
or instruction is a data propagate or data copy instruction.
Some processors have a load instruction, a store instruc-
tion, and a move instruction. A load copies data from mem-
A processor register contains at least four bits, Z, N, C, and
ory to a register, a store copies data from a register to memory,
V, whose values are set or cleared after an instruction has been
and a move instruction copies data from one register to
executed. These four flags, or status bits, and their interpreta-
another. As we already know, the 68K has a single MOVE
tions are as follows.
instruction, which copies data from anywhere to anywhere.
Z-bit Set if the result of the operation is zero. There are other types of move operation; for example, the
N-bit Set if the result is negative in a two’s complement 68K has an exchange instruction that swaps the contents of
sense; that is, the leftmost bit is zero. two registers; for example,
C-bit Set if the result yields a carry-out.
EXG D1, A2 has the effect [A2]←[D1]; [A1]←[A2]
V-bit Set if the result is out of range in a two’s complement
sense. The purpose of the semicolon in the above RTL indicates
that the two operations happen simultaneously.
Typical CISC processors update these flags after each
operation (see box for more details).
Consider the following example using 8-bit arithmetic. 5.4.3 Arithmetic instructions
Suppose D0 contains 001101012 and D1 contains 011000112.
The effect of adding these two values together with Arithmetic operations are those that act on numeric data (i.e.
ADD D0,D1 would result in signed and unsigned integers). Table 5.1 lists the 68K’s arith-
metic instructions. Let’s look at these in detail.
Add The basic ADD instruction adds the contents of two
operands and deposits the result in the destination operand.
The result is 100110002, which is deposited in D1. If we inter- One operand may be in memory. There’s nothing to stop you
pret these numbers as two’s complement values, we have added using the same source operand twice and writing ADD D0,D0
two positive values and got a negative result. Consequently, the to load D0 with the value of 2 [D0].
V-bit is set to indicate arithmetic overflow. The result is not All addition and subtraction instructions update the con-
zero, so the Z-bit is cleared. The carry-out is 0. The most- tents of the condition code register unless the destination
significant bit is 1, so the N-bit is set. Consequently, after this operand is an address register.
operation C 0, Z 0, N 1, V 1. Add with carry The add with carry instruction, ADC, is
almost the same as the ADD instruction. The only different is
5.4.2 Data movement instructions that ADC adds the contents of two registers together with
the carry bit; that is, ADC D0,D1 performs [D1]←[D1]
The most frequently executed computer operation is data
[D0] C, where C is the carry bit generated by a previous
movement. The data movement instruction is incorrectly
operation.
In this example we use ADD.L to add the two low-order 32- divides a 32-bit value by a 16-bit value to produce a 16-bit
bit words. Remember that the .L suffix indicates a 32-bit quotient and a 16-bit remainder. In order to avoid using an
operation. An addition records any carry bit generated by the instruction with three operands, the quotient and remainder
addition and moves it to the C-bit. The following instruction are packed in the same register. For example, DIVU D0,D1
ADC adds the high-order longwords together with any carry divides the 32-bit contents of D1 by the 16-bit lower-order
that was generated by adding the low-order longwords. 16-bit word in D0 and puts the 16-bit quotient in the low-
Figure 5.11 illustrates the addition Z X Y where X, Y, order word of D1 and the 16-bit remainder in the high-order
and Z are 64-bit values and the addition is to be performed word of D1. We can express this as
with 32-bit arithmetic. Each of the operands is divided into
an upper and lower 32-bit word.
Subtract The subtract instruction subtracts the source
operand from the destination operand and puts the result in
the destination register; for example, SUB D0,D1 performs
[D1]←[D1] [D0]. A special subtraction operation that If D0 contains 4 and D1 contains 1234516, the operation
facilitates multiple length subtraction, SBC D0,D1, performs DIVU D0,D1 results in D1 000148D116. Consider the fol-
the action [D1]←[D1][D0] C (the carry bit is also lowing fragment of code where we divide P by Q and put the
subtracted from the result). quotient in D2 and the remainder in D3.
220 Chapter 5 The instruction set architecture
This code is more complex than you would think and depending only on the outcome of the comparison. Here we
demonstrates the pitfalls of assembly language. First we have to simply demonstrate a (compare, branch) pair because we will
remember that P is a 32-bit value and that Q is a 16-bit value. soon look at branch instructions in greater detail.
The divide instruction divides a 32-bit value by a 16-bit value. Consider the high-level construct if (x = = 5)
Because we get the quotient and remainder in D1, we have to {x = x + 10}
split them and copy them to D2 and D3 respectively. A MOVE We can write the following fragment of code:
instruction always operates on the low-order word in a register,
which means that we don’t have to worry about the remainder
bits in bits 16 to 31 of D1 when we copy the quotient. However,
because D2 is a 32-bit register, we should ensure that the upper
order bits are zero before we do the transfer.We use CLR.L to set
all the bits of D2 to zero before transferring the 16-bit quotient.
In this example the branch instruction BNE Exit forces a
We can use the SWAP instruction, which exchanges the
branch (jump) to the line labeled by Exit if the outcome of
upper and lower order words of a register to get the remain-
the compare operation yields ‘not zero’.
der in the low-order 16-bits of D1 before we transfer the
remainder to D3.
When writing 68K instructions, you always have to ask
yourself ‘How? many bits are we operating on?’ and ‘What are
5.4.5 Logical instructions
we going to do about the bits not taking part in the operation’. Logical operations allow you to directly manipulate the indi-
Some processors take away that choice; for example, the ARM vidual bits of a word. When a logical operation is applied to
and similar RISC processors require that all operations be two 32-bit values, the logical operation is applied (in parallel)
applied to all bits of a register. to each of the 32 pairs of bits; for example, a logical AND
between A and B would perform ci ai⋅bi for all values of i.
Table 5.2 illustrates the 68K’s logical operations using an 8-
5.4.4 Compare instructions bit example.
High-level languages provide conditional constructs of the form The AND operation is dyadic and is applied to two source
operands. Bit i of the source is ANDed with bit i of the desti-
nation and the result is stored in bit i of the destination. If
We examine how these constructs are implemented later. [D1] 110010102, the operation
At this stage we are interested in the comparison part of the
above construct, (x y), which tests two variables for
equality. We can also test for greater than or less than. The results in [D1] 110000002. Remember that the symbol #
operation that performs the test is called comparison. indicates a literal or actual operand and the symbol %
The 68K provides a compare instruction CMP source, indicates a binary value. We can represent this operation
destination, which evaluates [Rd]—[Rs] and updates the more conventionally as
bits in the condition code register accordingly.
A compare instruction is inevitably followed by a branch
instruction that chooses one of two courses of action
The AND operation is used to mask the bits of a word. If 5.4.7 Shift instructions
you AND bit x with bit y, the result is 0 if y 0, and x if y 1.
A typical application of the AND instruction is to strip the A shift operation moves a group of bits one or more places
parity bit off an ASCII-encoded character. That is, left or right; for example, consider the following examples
Although there are only two shift directions, left and right,
results in [D1] 111110102. there are several variations on the basic shift operation. These
The EOR operation is used to toggle (i.e. invert) one variations depend on whether we are treating the value being
or more bits of a word. EORing a bit with 0 has no effect shifted as an integer or a signed value and whether we include
and EORing it with 1 inverts it. For example, if [D1] the carry bit in the shifting.
110010102, the operation Shift operations are used to multiply or divide by a power
of 2, to rearrange the bits of a word, and to access bits in
results in [D1] 001110102. a specific location of a word.
By using the NOT, AND, OR, and EOR instructions, you Figure 5.12 illustrates the various types of shift operation.
can perform any logical operations on a word. Suppose you Suppose the 8-bit value 110010102 is shifted one place right.
wish to clear bits 0, 1, and 2, set bits 3, 4, and 5, and toggle bits What is the new value? A logical shift right operation, LSR
6 and 7 of the byte in D0. You could write introduces a zero into the leftmost bit position vacated by the
shift and the new value is 011001012.
Arithmetic shifts treat the data shifted as a signed two’s
complement value. Therefore, the sign bit is propagated by
an arithmetic shift right. In this case, the number
If [D1] initially contains 010101012, its final contents will 110010102 54 is negative and, after an arithmetic right
be 101110002. We will look at a more practical application of shift, ASR, the new result is 111001012 (i.e. 27).
bit manipulation after we have covered branch operations in When a word is shifted right, the old least-significant bit
a little more detail. has been shifted out and ‘lost’. Figure 5.12 shows that this bit
isn’t lost because it’s copied into the carry flag bit.
5.4.6 Bit instructions An arithmetic shift left is equivalent to multiplication by 2
and an arithmetic shift right is equivalent to division by 2.
The 68K provides bit instructions that operate on the indivi- Some computers allow you to shift one bit position at a
dual bits of a word. Bit instructions are not strictly necessary, time. Others let you shift any number of bits. The number of
because you can use logical operations to do the same thing. bits to be shifted can be a constant; that is, it is defined in the
The 68K’s bit instructions can be used to set, clear, or toggle program and the shift instruction always executes the same
(complement) a single bit in a word. Moreover, the bit instruc- number of shifts. Some computers let you specify the
tions also test the state of the bit they have tested and set or clear number of bits to be shifted as the contents of a register. This
the Z-bit of the condition control register accordingly. Consider allows you to implement dynamic shifts because you can
222 Chapter 5 The instruction set architecture
LSL
Logical shift left
C Operand 0 A zero enters the least-significant bit and the
most-significant bit is copied to the carry flag.
LSR
Logical shift right
0 Operand C A zero enters the most-significant bit and the
least-significant bit is copied to the carry flag.
ASL
Arithmetic shift left
C Operand 0 A zero enters the least-significant bit and the
most-significant bit is copied to the carry flag.
ROL
Rotate left
C Operand The most-significant bit is copied into the
least-significant bit and the carry flag.
ROR
Rotate right
Operand C The least-significant bit is copied into the
most-significant bit and the carry flag.
ROLC
Rotate left through carry
C Operand C The most-significant bit is copied into the carry flag
and the old C-bit copied into the least-significant bit.
RORC
Rotate right through carry
C Operand C The least-significant bit is copied into the C flag
and the old C-bit copied into the most-significant bit.
change the contents of the register that specifies the number Shift type Before circular shift After circular shift
of shifts. The 68K lets you write LSL #4,D0 to shift the con- Rotate left, ROL 11001110 10011101
tents of data register D0 left by four places or LSL D1,D0 to Rotate right, ROR 11001110 01100111
shift the contents of D0 left by the number in D1.
Figure 5.12 also describes circular shifts or rotates. A circ- The last pair of shift operations in Fig. 5.12 are called rotate
ular shift operation treats the data being shifted as a ring with though carry. These operations treat the carry bit as part of
the most-significant bit adjacent to the least-significant bit. the shift operation. A circular shift is performed with the old
Circular shifts result in the most-significant bit being shifted carry bit being shifted into the register and the bit lost from
into the least-significant bit position (left shift), or vice versa the carry register being shifted into the carry bit. Suppose that
for a right shift. No data is lost during a circular shift. the carry bit is currently 1 and that the 8-bit value 111100002
Consider the following examples. is to be shifted one place right through carry. The final result
5.4 Overview of the 68K’s instructions 223
is 111110002 and the carry bit is 0. A circular shift is a non- ROXR The operand is rotated by 0 to 31 places right. The bit
destructive shift because no information is lost (bits don’t fall shifted out of the least-significant end of the operand is
off the end). shifted into the C-bit. The old value of the C-bit is copied into
The 68K’s shift instructions are as follows. the most-significant end of the operand; that is, shifting takes
place over 33 bits (i.e. the operand plus the C-bit).
LSL The operand is shifted left by 0 to 31 places. The vacated
bits at the least-significant end of the operand are filled with Shift operations can be used to multiply or divide a num-
zeros. ber by a power of two. They can be used for several other pur-
LSR The operand is shifted right 0 to 31 places. The vacated poses such as re-arranging binary patterns; for example,
bits at the most-significant end of the operand are filled with suppose register D2 contains the bit pattern 0aaaaxxxbbbb2
zeros. and we wish to extract the xxx field (we’re using 12-bit arith-
metic for simplicity). We could write
ASL The arithmetic shift left is identical to the logical
shift left.
ASR The operand is shifted right 0 to 31 places. The vacated If we want to ensure that we just have the xxx field, we can use
bits at the most-significant end of the operand are filled with a logical AND to clear the other bits by
zeros if the original operand was positive, or with 1s if it was
negative (i.e. the sign-bit is replicated). This divides a number
by 2 for each place shifted.
ROL The operand is rotated by 0 to 31 places left. The bit 5.4.8 Branch instructions
shifted out of the most-significant end is copied into the
A branch instruction modifies the flow of control and causes
least-significant end of the operand. This shift preserves all
the program to continue execution at the target address speci-
bits. No bit is lost by the shifting.
fied by the branch. The simplest branch instruction is the
ROR The operand is rotated by 0 to 31 places right. The bit unconditional branch instruction, BRA target, which
shifted out of the least-significant end is copied into the always forces a jump to the instruction at the target address.
most-significant end of the operand. This shift preserves all In the following fragment of code, the BRA Here instruction
bits. No bit is lost by the shifting.
ROXL The operand is rotated by 0 to 31
places left. The bit shifted out of the most
significant end of the operand is shifted
into the C-bit. The old value of the C-bit is
copied into the least-significant end of the
operand; that is, shifting takes place over 33 bits (i.e. the forces the 68K to execute next the instruction on the line
operand plus the C-bit). which is labeled by Here.
1000 Instruction 1
1004 Instruction 2
1008 Instruction 3
100C Instruction 4
1010 Instruction 5
1014 Instruction 6 The branch instruction forces the
1018 Instruction 7 instruction at 2000 to be executed
next.
101C Instruction 8 BRA 2000
1020 Instruction 9 2000 Instruction N
1024 Instruction 10 2004 Instruction N + 1
1028 Instruction 11 2008 Instruction N + 2
102C Instruction 12 200c Instruction N + 3
1030 Instruction 13 2010 Instruction N + 4
1034 Instruction 14 BRA 1040 2014 Instruction N + 5
1038 Instruction 15
103C Instruction 16
1040 Instruction 17
1044 Instruction 18
■ SUMMARY 5.8 The 68K has an exchange register pair instruction, EXG.
Why would you want such an instruction?
We have introduced the CPU and its native language, the 5.9 The SWAP Di instruction swaps the upper- and lower-order
assembly language, which is a human-readable representation words of data register Di. Why would you want such an
of machine code. Unfortunately, assembly languages are not instruction? If the 68K’s instruction set lacked a SWAP, how
portable; each family of microprocessors has its own unique would you swap the two halves of a data register?
assembly language that is incompatible with any other
processor. You can run a C program on most computers with a C 5.10 Why are so many variations on a shift operation provided
compiler. A program written in Pentium assembly language will by the 68K and many other processors?
run only on machines with a Pentium at their core. 5.11 What is the largest memory space (i.e. program) that can
We introduced the concept of an architecture, the assembly be addressed by processors with the following number of
language programmer’s view of the computer in terms of its address bits?
functionality rather than performance or implementation. To (a) 12 bits
illustrate the characteristics of an architecture we selected the (b) 16 bits
elegant 68K processor, which is, paradoxically, simpler than (c) 24 bits
many of its contemporaries while, at the same time, (d) 32 bits
incorporating a number of sophisticated facilities such as the (e) 48 bits
ability to shift an operand as part of a data processing (f) 64 bits
instruction and the ability to execute an instruction only if
certain conditions are met (predication). 5.12 The von Neumann stored program computer locates
An architecture consists of a set of instructions, a set of program and data in the same memory. What are the
resources (the registers), and the addressing modes used to advantages and disadvantages of a system with a combined
access data. program and data memory?
In this chapter we have laid the foundations for the next 5.13 The gear lever is part of an automobile’s organization
chapter where we look at how programs can be constructed to rather than its architecture. Are the brakes part of a car’s
run on the instruction set architecture we introduced here. architecture or organization?
5.14 What does the RTL expression [100]←[50]+2 mean?
■ PROBLEMS 5.15 What does the RTL expression [100]←[50 + 2]+ 2
mean?
5.1 What’s the difference between an assembly language and
machine code? In order to answer this question fully, you should 5.16 What is an operand?
use the Internet to find out more about assemblers. 5.17 In the context of an instruction register, what is a field ?
5.2 Books and articles on the computer make a clear distinction 5.18 What is a literal operand?
between architecture and organization. Do you think that this is a 5.19 What is the effect on the C-, V-, Z-, and N-bits when the
useful distinction? Can you think of other areas (i.e. non-computer following 8-bit operations are carried out?
example) where such a distinction would be appropriate?
5.3 What are the advantages and disadvantages of dividing a
computer’s registers into data and address registers like the 68K?
5.4 What are the relative advantages and disadvantages of one- 5.20 Some microprocessors have one general-purpose data
address, two-address, and three-address instruction formats? register, some two, some eight, and so on. What determines the
5.4 Overview of the 68K’s instructions 227
number of such general-purpose data registers in any given 5.31 What is the difference between the C, Z, V, and N flags in a
computer? computer’s status register (or condition code register)?
5.21 What is the difference between a dedicated and a general- 5.32 What is the difference between machine code and
purpose computer? assembly language?
5.22 What is a subroutine and how is it used? 5.33 What is the advantage of a computer with many registers
5.23 What is the so-called von Neumann bottleneck? over one with few registers?
5.24 For the following memory map explain the meaning of the 5.34 Translate the following algorithm into assembly language.
following RTL expressions in plain English. IF X > 12 THEN X 2*X4 ELSE X X Y
(a) 5.35 For the memory map below, evaluate the following
(b) expressions, where [N] means the contents of the memory
(c) location whose address is N. All addresses and their contents are
(d) decimal values.
(e)
(f)
00 12
01 17
02 7
03 4
04 8
05 4
06 4
07 6
08 0
5.25 Suppose a problem in a high-school algebra text says ‘Let
x 5’. What exactly is x? Answer this question from the point 09 5
of view of a computer scientist. 10 12
5.26 In the context of a CPU, what is the difference between a 11 7
data path and an address path? 12 6
5.27 Why is the program counter a pointer and not a counter? 13 3
5.28 What’s the difference between a memory location and a 14 2
data register?
5.29 Does a computer need data registers?
(a) [7] (f) [[9] [2]]
5.30 Some machines have a one-address format, some a
two-address format, and some a three-address format; for (b) [[[4]] (g) [[5] [13] 2 * [14]]
example, (c) [[[0)]]] (h) [0)] * 3 + [1] * 4
(d) [2 10] (i) [9] * [10]
(e) [[9] 2]
5 The instruction set 6 Assembly language 7 Structure of the CPU 8 Other processors
architecture programming Now we know what a computer We have used the 68K to
Chapter 5 introduces the Having introduced the basic does, the next step is to show introduce the CPU and assembly
computer’s instruction set operations that a computer can how it operates. In Chapter 7 we language programming. Here we
architecture, which defines the carry out, the next step is to examine the internal provide a brief overview of some
low-level programmer’s view of show how instructions are used organization of a computer and of the features of other
the computer and describes the to construct entire programs. We demonstrate how it reads processors.
type of operations a computer introduce the 68K’s instructions from memory,
carries out. We are interested in programming environment via a decodes them, and executes
three aspects of the ISA: the simulator that runs on a PC and them.
nature of the instructions, the demonstrate how to implement
resources used by the some basic algorithms.
instructions (registers and
memory), and the way in which
the instructions access data
(addressing modes).
INTRODUCTION
We introduced the processor and its machine-level language via the 68K CISC processor in the previous
chapter. Now we demonstrate how 68K assembly language programs are written and debugged.
Because assembly language programming is a practical activity, we provide a 68K cross-assembler
and simulator with this book. Previous editions of this book used the DOS-based Teesside simulator.
In this edition we use a more modern Windows-based system called Easy68K.We provide a copy of
the EASy68K simulator on the CD accompanying this book, as well as a copy of the Teesside
simulator and its documentation for those who wish to maintain compatibility with earlier editions.
Both simulators run on a PC and allow you to execute 68K programs. You can execute a program
instruction by instruction and observe the effect of each instruction on memory and registers as it
is executed.
Figure 6.2 shows the structure of a typical 68K instruction. We’ve already encountered the first three instructions. The
Instructions with two operands are always written in the last instruction, STOP #$2700, terminates the program by
form source, destination, where source is where the halting further instruction execution. This instruction also
operand comes from and destination is where the result loads the 68K’s status register with the value 270016, a special
goes to. code that initializes the 68K. We use this STOP instruction to
terminate programs running on the simulator.
6.1.1 Assembler directives An assembler directive tells the assembler something it needs
to know about the program; for example, the assembler direc-
Assembly language statements are divided into executable tive ORG means origin and tells the assembler where instruc-
instructions and assembler directives.An executable instruction tions or data are to be loaded in memory. The expression
is translated into the machine code of the target microproces- ORG $1000 tells the assembler to load instructions in memory
sor and executed when the program is loaded into memory. In
the example in Fig. 6.1, the executable instructions are
1
Remember that in the instruction STOP #$2700 the operand
is #$2700. The ‘#’ indicates a literal operand and
MOVE P,D0 Copy contents of P to D0 the ‘$’ indicates hexadecimal. The literal operand
ADD Q,D0 Add contents of Q to D0 00100111000000002 is loaded into the 68K’s status
MOVE D0,R Store contents of D0 in memory location R register after it stops. The 68K remains stopped until it it
STOP #$2700 Stop executing instructions1 receives an interrupt.
230 Chapter 6 Assembly language programming
starting at address 100016. We’ve used the value 100016 because Figure 6.3 demonstrates what’s happening when the 68K
the 68K reserves memory locations 0 to 3FF16 for a special pur- program in Fig. 6.1 is assembled by looking at the output pro-
pose and 1000 is an easy number to remember. duced by EASy68K. This listing has seven columns. The first
The second origin assembler directive, ORG $2000, is column is a 32-bit value expressed in hexadecimal form,
located after the code and defines the starting point of the which contains the current memory address in which
data area. We don’t need this assembler directive; without it instructions or data will be loaded. The next two columns are
data would immediately follow the code. We’ve used it the hexadecimal values of instructions or data loaded into the
because it’s easy to remember that the data starts at memory current memory location. These are the values produced by
location $2000. the assembler from instructions, addresses, and data in the
An important role of assembler directives is in reserving assembly language program. The fourth column contains the
memory space for variables, presetting variables to initial val- line number that makes it easy to locate a particular line in
ues, and binding variables to symbolic names. Languages like C the program. The remaining right-hand columns in Fig. 6.3
call these operations declaring variables. We will be performing are the instructions or assembler directives themselves
assembly level actions similar to the following C declarations. followed by any comment field.
In this fragment of code the operation int z3 42; As you can see, the instruction MOVE D0,R is located
reserves a 16-bit memory location for the variable called z3 on line 12 and is stored in memory location 100C16.
and then stores the binary equivalent of 4210 in that location. This instruction is translated into the machine code
Whenever you use the variable z3 in the program, the com- 33C00000200416, where the operation code is 33C016 and the
piler will automatically select its appropriate address in address of operand R is 0000200416.
memory. All this is invisible to the programmer. The follow- The symbol table below the program relates symbolic
ing demonstrates the relationship between 68K assembler names to their value. This information is useful when you are
directives and the C code. debugging a program; for example, you can see that variable
P has the address 2000.
The assembler maintains a variable called the location
68K assembler directive C code equivalent counter, which keeps track of where the next instruction or data
element is to be located in memory. When you write an ORG
directive, you preset the value of the location counter to that
specified; for example, ORG $1234 means load the following
instruction or data into memory at location 123416. Let’s look as
some of the other assembler directives in this program.
6.1 Structure of a 68K assembly language program 231
The define constant assembler directive DC loads a constant bits) and the 68K’s memory is byte addressed. Each word
in memory before the program is executed; that is, it provides a occupies two bytes—P takes up 200016 and 200116.
means of presetting memory locations with data before a The define storage directive (DS) tells the assembler to
program runs. This directive is written DC.B to store a byte, reserve memory space and also takes a .B, .W, or .L qualifier.
DC.W to store a word, and DC.L to store a longword. In the For example, R DC.W 1 tells the assembler to reserve a word
program of Fig. 6.3, the assembler directive P DC.W 2 places in memory and to equate the name of the word with ‘R’. The
the value 2 in memory and labels this location ‘P’. Because difference between DC.B N and DS.B N is that the former
this directive is located immediately after the ORG $2000 stores the 8-bit value N in memory, whereas the latter reserves
assembler directive, the integer 2 is located at memory loca- N bytes of memory by advancing the location counter by N.
tion 200016. This memory location (i.e. 200016) can be The final assembler directive, END $1000, tells the assem-
referred to as P. When you wish to read the value of P (i.e. the bler that the end of the program has been reached and that
contents of memory location 200016), you use P as a source there’s nothing else left to assemble. The parameter taken by
operand; for example, MOVE P,D0. Because the size of the the END directive is the address of the first instruction of the
operand is a word, the value 00000000000000102 is stored in program to be executed. In this case, execution begins with
location 200016. Figure 6.4 demonstrates the effect of this the instruction at address 100016.
assembler directive. The assembler directive EQU equates a symbolic name to a
The next assembler directive, Q DC.W 4, loads the con- numeric value. If you write Tuesday EQU 3, you can use the
stant 4 in the next available location—200216. Why 200216 symbolic name ‘Tuesday’ instead of its actual value, 3. For
and not 200116? Because the operands are word sized (i.e. 16 example, ADD #Tuesday,D0 is identical to ADD #3,D0.
232 Chapter 6 Assembly language programming
We now provide another example of the use of assembler EASy68K simulator, which allows you to cross-assemble a 68K
directives. program on a PC and then execute it on a PC. The PC simu-
Half the fun in writing assembly language programs is run- lates the behavior of a 68K processor and the basic operating
ning and debugging them. In this chapter we will be using the system function required to perform simple input and output
activities such as reading from the
Memory Memory Memory
keyboard and writing to the screen.
Figure 6.5 gives a screen dump of a
2000 2000 $0002 2000 $0002 P session with the simulator.
2002 2002
2004 2004
6.1.2 Using the
(a) ORG $2000 sets (b) P DC.W 2 (c) Location $2000
cross-assembler
the location counter puts $0002 in the has the symbolic
to 2000. current location value P. Using The following 68K assembly language
and moves the ‘P’ in the program program illustrates what an assembler
location counter to is the same as using
the next free location. $2000. does. This program is designed only to
demonstrate the use of assembler direc-
Figure 6.4 The effect of a define constant assembly directive. tives; it does not perform any useful
The first column provides the line number. The second used as a label and refers to the location in memory of the
column defines the location in memory into which data and code on this line. This address is $0408. Therefore, the
instructions go. The third column contains the instructions constant to be stored is 6 $0408 $040E. You can see that
and the constants generated by the assembler. The remainder this really is the value stored from column 3 in line 8. Note
is the original assembly language program. Consider line 8. that line 10 has the location $0418 and not $0417 because all
The simulator system requires an ORG statement at the Because the END address assembler directive terminates
beginning of the program to define the point at which code is the assembly process, no instructions beyond END point are
loaded into the simulated memory. assembled.
You can halt a 68K by executing the STOP #data instruc-
tion, which stops the 68K and loads the 16-bit value data
into its status register. By convention we use the constant
$2700 (this puts the processor in the supervisor mode, turns 6.2 The 68K’s registers
off interrupt requests, and clears the condition code flags).
Operations on address registers always yield longword results The 68K has a byte-addressable architecture. Successive bytes
because an address register holds a 32-bit pointer. A .W opera- are stored at consecutive byte addresses 0, 1, 2, 3 . . . , succes-
tion is permitted on an address register but the result is treated sive words are stored at consecutive even addresses 0, 2, 4,
as a two’s complement value and sign-extended to 32 bits. 6, . . . , and successive 32-bit longwords are stored at addresses
BEGINNER’S ERRORS
1. Embedded data 3. Subroutine call
You should not locate data in the middle of a section of code. You call a subroutine with a BSR or a JSR instruction. This is
The microprocessor executes instructions sequentially and will the only way you call a subroutine. You cannot call a
regard embedded data as instructions. Put data between the subroutine with a conditional branch (e.g. BEQ, BNE, BCC,
end of the executable instructions of a program and the END etc.).
assembler directive as the following demonstrates.
The only way that you can locate data in the middle of a pro-
gram is by jumping past it like this:
Although this code is legal, it is not good practice (for the 4. Misplaced END directives
beginner) to mix code and data. The END directive indicates the end of the program. No
instruction or assembler directive may be placed after the END
2. Initialization directive. The END must be followed by the address of the first
I saw a program beginning with the operation instruction to be executed.
MOVE.B (A0),D0 which loads D0 with the byte pointed at
by address register A0. It failed because the student had not
defined an initial value of A0. You have to set up A0 before you
can use it or any other variable by, for example,
0, 4, 8, . . . . Figure 6.6 illustrates how the 68K’s memory space of their registers. In such cases, learning assembly language is
is organized. rather like learning to conjugate irregular foreign verbs.
Figure 6.6 poses an interesting question. If you store a The 68K’s data registers are written D0 to D7. To refer to
32-bit longword at, say, memory location $1000, where do the sequence of consecutive bits i to j in register Dn we write
the 4 bytes go? For example, if the longword is $12345678, Dni:j. For example, we indicate bits 8 to 31, inclusive, of D4 by
does byte $12 go into address $1000 or does byte $78 go into D4(8:31). This notation is an extension of RTL and is not part
address $1000? of the 68K’s assembly language.
The 68K stores the most-significant byte of an operand at When a byte operation is applied to the contents of a data
the lowest address (in this case $12 is stored at $1000). This register, only bits d00 to d07 of the register are affected.
storage order is called Big Endian (because the ‘big end’ of a Similarly, a word operation affects bits d00 to d15 of the regis-
number goes in first). The term Big Endian has been bor- ter. Only the lower-order byte (word) of a register is affected
rowed from Gulliver’s Travels. Intel processors are Little by a byte (word) operation. For example, applying a byte
Endian and store bytes in the reverse order to the 68K operation to data register D1 affects only bits 0 to 7 and leaves
family. bits 8 to 31 unchanged. CLR.B D1 forces the contents of D1
The 68K stores the most-significant byte of a word in bits to XXXXXXXXXXXXXXXXXXXXXXXX00000000, where
d08 to d15 at an even address and the least-significant byte in the Xs represent the old bits of D1 before the CLR.B D1 was
bits d00 to d07 at an odd address. Executing MOVE.W executed. If [D1] $12345678 before the CLR.B D1, then
D0,1234, stores bits d00 to d07 of D0 at byte address 1235 [D1] $12345600 after it.
and bits d08 to d15 of D0 at byte address 1234. To avoid con- Further examples should clarify the action of byte, word,
fusion between registers and bits, we use ‘D’ to indicate a reg- and longword operations. In each case we give the 68K form
ister and ‘d’ to indicate a bit. We introduced the 68K’s of the instruction and its definition in RTL. We use slice nota-
registers in the previous chapter; now we examine some tion to indicate a range of bits.
of their features. In Fig. 6.6 we’ve labeled the individual bytes
Assembly from RTL definition
of the 16-bit and 32-bit memory space in blue to demon-
strate that the most-significant byte of a word or longword is
at the low address.
6.2.1 Data registers If the initial contents of D0 and D1 are $12345678 and
$ABCDEF98, respectively, the ADD operation has the follow-
The 68K has eight general-purpose data registers, numbered ing effects on the contents of D1 and the carry bit, C.
D0 to D7. Any operation that can be applied to data register Di
can also be applied to Dj. No special-purpose data registers
are reserved for certain types of instruction. Some micro-
processors do not permit all instructions to be applied to each
236 Chapter 6 Assembly language programming
The state of the carry bit and other bits of the CCR are you use an address register to calculate the location of the
determined only by the result of operations on bits 0–7 for a next number in the series.
byte operation, by the result of operations on bits 0–15 for a Because the contents of an address register are considered
word operation, and by the result of operations on bits 0–31 to be a pointer to an item in memory, the concept of separate
for a longword operation. independent fields within an address register is quite mean-
One of the most common errors made by 68K program- ingless. All operations on address registers yield longword
mers is using inconsistent size operations on a data register, as values. You can apply a .L operation to an address register
the following example demonstrates. but not a .B operation. No instruction may operate on the
low-order byte of an address register. However, word opera-
tions are permitted on the contents of
address registers because the 16-bit result
of a .W operation is automatically sign-
extended to 32 bits. For example, the oper-
ation MOVEA.W #$8022,A3 has the
This example implements the operation IF ([XYZ]–5) effect:
12 THEN . . . . But note that the operand XYZ is created as a
byte value and yet it is compared with a word value. This frag-
ment of code might work correctly sometimes if the contents
of bits 8 to 15 of D0 are zero. However, if these bits are not The 16-bit value $8022 is sign extended to $FFFF8022.
zero, this code will not operate correctly. Similarly, MOVEA.W #$7022,A3 has the effect:
6.2.2 Address registers The concept of a negative address may seem strange. If you
An address register holds the location of a variable. Registers think of a positive address as meaning forward and a negative
A0–A6 are identical in that whatever we can do to Ai, we can address as meaning backward, everything becomes clear.
also do to Aj. Address register A7 is also used as a stack Suppose address register A1 contains the value 1280. If
pointer to keep track of subroutine return addresses. We address register A2 contains the value 40 (stored as the
describe the use of the stack point in detail later in this appropriate two’s complement value), adding the contents of
chapter. A1 to the contents of A2 by ADDA.L A1,A2 to create a com-
Address registers sometimes behave like data registers. For posite address results in the value 1240, which is 40 locations
example, we can move data to or from address registers and back from the address pointed at by A1.
we can add data to them. There are important differences We conclude with an example of the use of address
between address and data registers; operations on address registers. Address register A0 points to the beginning of a
registers don’t affect the status of the condition code register. data structure made up of 50 items numbered from 0 to 49.
If you are in the process of adding up a series of numbers, you Each of these 50 items is composed of 12 bytes and data
shouldn’t have to worry about modifying the CCR every time register D0 contains the number of the item we wish
to access. Figure 6.7 illustrates this data structure. Suppose processing operations, you can perform any data manipula-
we need to put the address of this item in A1. In what tion you require. However, the 68K provides some special-
follows use the operation MULU #n,D0, which multiplies the purpose data movement instructions to generate more
16-bit low-order word in D0 by n and puts the 32-bit product compact and efficient code. The following three instructions
in D0. provide enhanced data movement capabilities.
We need to find
where the required Name Assembly form RTL definition
item falls within Exchange
the data structure. Swap
In order to do this Load effective address
we multiply the
contents of D0 by 12 (because each item takes up 12 bytes). The EXG instruction is intrinsically a longword opera-
Then we add this offset to the contents of A0 and deposit the tion that exchanges the contents of two registers (see
result in A1. That is, Fig. 6.8(a)). EXG may be used to transfer the contents of an
We are going to use the simulator to run this program and supervisor stack pointer is selected (we will discuss the differ-
observe the contents of the simulated 68K’s registers as the ence between these later). Throughout this chapter, all refer-
instructions are executed one by one. Figure 6.9 displays the ences to the stack pointer refer to the supervisor stack pointer,
contents of the simulated computer’s registers immediately
after the program is loaded.2 Note that the 68K has two A7 2
This is the output from the Teesside simulator, which is a text-based
address registers labeled SS and US in all simulator output. simulator unlike EASy68K which is Windows based. This text uses both
SS is the supervisor state stack pointer A7 and US is the user simulators. EASy68K is better for running programs in a debug mode, the
state stack pointer A7. When the 68K is first powered up, the Teesside simulator is better for creating files that can be used in a book.
6.3 Features of the 68K’s instruction set 239
SP (i.e. A7). In Fig. 6.9, PC defines the current value of the pro-
gram counter, SR the status register containing the 68K’s CCR,
and X, N, Z, V, and C are the CCR’s flag bits.
The last line of the block of data in Fig. 6.9 is the
mnemonic of the next instruction to be executed. Because the
simulator doesn’t use symbolic names, all addresses, data val-
ues, and labels are printed as hexadecimal values. In Fig. 6.9,
the program counter is pointing at location 40016 and the
instruction at this address is MOVE.L #$12345678,D0.
We now execute this program, instruction by instruction.
The purpose of this exercise is to demonstrate the use of the
simulator and to show how each instruction affects the 68K’s
internal registers as it is executed. To help you appreciate what
is happening, registers that have changed are depicted in blue.
3
The DIVU D2,D1 instruction divides the 32-bit value in D1 by the
16-bit value in D2. The 16-bit quotient is placed in the low-order word of
D1 and the 16-bit remainder is placed in the upper-order word of D1.
242 Chapter 6 Assembly language programming
The first six digits 000500 give the first memory location
on the line, and the following 16 pairs of digits give the con-
tents of 16 consecutive bytes starting at the first location.
Location 50016 contains 3216 50, and location 50216 con-
tains 0C16 12. These values were set up by the two DC.W
(define constant) assembler directives.
The state of the system prior to the execution of the first
instruction is
The MOVE.W D1,$0504 stores the low-order 16-bit result logical operation AND.B #%00000111,D0 to get 00000yyy
in D1 in memory location 50416 (i.e. Z). We’ve used a in D0.
wordlength operation and have discarded the remainder in By using the NOT, AND, OR, and EOR instructions, you can
the upper-order word of D1. Now we look at the contents of perform any logical operations on a word. Suppose you wish
memory location 500 onward. to clear bits 0, 1, and 2, set bits 3, 4, and 5, and toggle bits 6 and
7 of the byte in D0. You could write
000500 00 32 00 0C 00 45 00 00 00 00 00 00 00 00 00 00.
The first instruction LSR.B #3,D0 shifts xxyyyzzz right its use. Consider the following example of an addition fol-
to get 000xxyyy in D0. We remove the xs by means of the lowed by a branch on negative (minus).
The operation SUB D1,D2 subtracts the contents of D1 EXIT and skips past the ‘ERROR’ clause. Figure 6.10 demon-
from D2, deposits the results in D2, and updates the condi- strates the flow of control for this program.
tion code register accordingly. Remember we said earlier that not all the 68K’s instruc-
When the BMI instruction is executed, the branch is taken tions affect the CCR. For example, consider the following two
(the THEN part) if the N-bit of the CCR is set because the examples.
addition gave a negative result. The
branch target is the line labeled by ERROR
and the intervening code between BMI
ERROR and ERROR . . . is not executed.
If the branch is not taken because the
result of SUB D1,D2 was positive, the
code immediately following the BMI
ERROR is executed. This code corresponds
to the ELSE part of the IF THEN ELSE
construction. Both these fragments of code have the same effect as far as
Unfortunately, there’s an error in this example. Suppose the BMI ERROR is concerned. However, the second case might
that the subtraction yields a positive result and the ELSE part prove confusing to the reader of the program who may well
is executed. Once the ELSE code has been executed, we fall imagine that the state of the CCR prior to the BMI ERROR is
through to the THEN part and execute that too, which is not determined by the EXG D3,DA instruction.
what we want to do. After the ELSE part has been executed, it’s
necessary to skip round the THEN part by means of an BRA Example 1 Suppose you want to write a subroutine to con-
instruction. The unconditional branch instruction, BRA vert a 4-bit hexadecimal value into its ASCII equivalent.
EXIT forces the computer to execute the next instruction at
ASCII character Hexadecimal value Binary value
Table 6.1 illustrates the relationship between the binary The CMP source, destination subtracts the source
value of a number (expressed in hexadecimal form) and its operand from the destination operand and sets the flag bits of
ASCII equivalent (also expressed in hexadecimal form). For the CCR accordingly; that is, a CMP is the same as a SUB except
example, if the internal binary value in a register is that the result is not recorded.
00001010, its hexadecimal equivalent is A16. In order to
print the letter ‘A’ on a terminal, you have to transmit the
Example 2 Consider the following algorithm.
ASCII code for the letter ‘A’ (i.e. $41) to it. Once again,
please note that there is a difference
between the internal binary repre-
sentation of a number within a com-
puter and the code used represent the
symbol for that number. The number
six is expressed in 8 bits by the binary
pattern 00000110 and is stored in the
computer’s memory in this form. On
the other hand, the symbol for a six
(i.e. ‘6’) is represented by the binary
pattern 00110110 in the ASCII code.
If we want a printer to make a mark on paper corresponding
to ‘6’, we must send the binary number 00110110 to We perform two tests after the comparison CMP.B D0,D1.
it. Consequently, numbers held in the computer must One is a BNE and the other a BGE. We can carry out the two
be converted to their ASCII forms before they can be tests in succession because there isn’t an intervening instruc-
printed. tion that modifies the state of the CCR.
From Table 6.1 we can derive an algorithm to convert a Although conditional tests performed by high-level
4-bit internal value into its ASCII form. A hexadecimal value languages can be complex (e.g. IF XYZ 3t), the condi-
in the range 0 to 9 is converted into ASCII form by adding tional test at the assembly language level is rather more basic
hexadecimal 30 to the number. A hexadecimal value in the as this example demonstrates.
range $A to $F is converted to ASCII by adding hexadecimal
$37. If we represent the number to be converted by HEX and
the number to be converted by ASCII, we can write down a Templates for control structures
suitable algorithm in the form We now represent some of the control structures of
high-level languages as templates in assembly language. A
template is a pattern or example that can be modified to suit
the actual circumstances. In each of the following examples,
the high-level construct is provided as a comment to the
assembly language template by means of asterisks in the
We can rewrite the algorithm as
first column. The condition tested is [D0] [D1] and the
actions to be carried out are Action1 or Action2. The
templates can be used by providing the appropriate test
instead of CMP D0,D1 and providing the appropriate
sequence of assembly language statements instead of
This algorithm can be translated into low-level language as Action1 or Action2.
6.3 Features of the 68K’s instruction set 247
248 Chapter 6 Assembly language programming
POINTS TO REMEMBER
The assembly language symbol % indicates that the following operand, but the actual operand itself. AND.B
number is interpreted as a binary value and the symbol $ #%11000000,D0 means calculate the logical AND
means that the following number is interpreted as a between the binary value 11000000 and the contents of D0.
hexadecimal value. AND.B #%11000000,D0 tells you much If we had made a mistake in the program and had written
more than the hexadecimal and decimal forms of the operand, AND.B %11000000,D0 (rather than AND.B
AND.B #$C0,D0 and AND.B #192,D0, respectively. #%11000000,D0), the instruction would have ANDed D0
The symbol # informs the assembler that the following with the contents of memory location %11000000
value is not the address of a memory location containing the (i.e. location 192).
and throws away the result, whereas the latter evaluates guage programming and introduce a wealth of variations on
Di Dj and puts the result in Dj. address register indirect, addressing.
The label FALSE is a dummy label and is not in any way
used by the assembly program. It merely serves as a reminder
to the programmer of the action to be taken as a result of the 6.4.1 Immediate addressing
test being false. At the end of this sequence is an instruction
BRA EXIT. A BRA (branch) is equivalent to a GOTO in a high- Application of immediate addressing
level language and causes a branch round the action taken if As an arithmetic constant
the result of the test is true.
The symbol # is not part of the instruction. It is a message to reference is required to read the instruction during the fetch
the assembler telling it to select that code for MOVE that uses phase. When the instruction MOVE #5,D0 is read from
the immediate addressing mode. Don’t confuse the symbol # memory in a fetch cycle, the operand, 5, is available
with the symbols $ or %.The $ indicates only that the following immediately without a further memory access to
number is hexadecimal and the % indicates that the following location 5, to read the actual operand.
250 Chapter 6 Assembly language programming
means of the special add immediate instruction ADDI. For such as N1, the assembler evaluates it and replaces it by the
example, ADDI #22,NUM adds the constant value 22 to the calculated value—in this example #N1 is replaced by 11. We
contents of the location called NUM. use the comparison with N 1, because the counter is
In a comparison with a constant incremented before it is tested. On the last time round the
Consider the test on a variable, NUM, to determine whether it loop, the variable I becomes N 1 after incrementing and the
lies in the range 7 NUM 25. branch to NEXT is not taken, allowing the loop to be
exited. This loop construct can be written in a more elegant
fashion, but at this point we’re interested only in the applica-
tion of immediate addressing as a means of setting up
counters.
At the end of the loop, the counter is incremented by We can add 100 numbers by means of address register indi-
ADD #1,D0. The counter, D0, is then compared with its rect addressing in the following way. This isn’t efficient
terminal value by CMP #N1,D0. If you write an expression code—we’ll write a better version later.
6.4 Addressing modes 251
MOVE D1,(A2) [[A2]] ← [D1] Move the contents of D1 to the location pointed at by A2
ADD (A1),D2 [D2] ← [D2] + [[A1]] Add the contents of the location pointed at by A1 to the
contents of D2
MOVE (A1),(A2) [[A2]] ← [[A1]] Move the contents of the location pointed at by A1 to the
location pointed at by A2
The pointer in address register A0 is incremented by 2 on each memory elements are words and each word occupies 2 bytes in
pass round the loop—the increment of 2 is required because the memory. We can express this program in a more compact way.
252 Chapter 6 Assembly language programming
0FFE A0
0FFF
1000 1000 P 0
0FFF
1001 Q 1
Address register A0 points 25 1000
at memory location 1000 1002 R 2
1001
1003
A0 1002 1004 Offset with respect
1003 to A0
1000
Initial value of D0 1004
D0 1005
The effect of Figure 6.14 Using address register indirect addressing with
12 ADD.B (A0),D0 displacement.
when A0 contains 1000,
37 D0 contains 12 and
memory location 1000 d16(Ai), where d16 is a 16-bit constant and Ai an address
Final value of D0 contains 25. register. The effective address of an operand is calculated by
adding the contents of the address register specified by
Figure 6.12 Using address register indirect addressing. the instruction to the signed two’s complement constant that
forms part of the instruction. Figure 6.13 illustrates how the
effective address is calculated for the instruction
MOVE.B 4(A0),D0. Some 68K simulators permit you to
0FFE
A0 0FFF write either MOVE.B 4(A0),D0 or MOVE.B (4,A0),D0.
1000 1000 Offset We can define MOVE d16(A0),D0 in RTL as
1001 +0
1002 +1 [D0]←[d16 [A0]], where d16 is a 16-bit two’s comple-
Displacement = + 4 1003 +2 ment value in the range 32K to 32K. This constant is called
D0 1004 +3
1005
a displacement or offset because it indicates how far the
15 +4
operand is located from the location pointed at by A0.
The displacement can be negative; for example;
MOVE.B -4(A0),D0 specifies an operand 4 bytes back from
The location accessed
is 4 bytes on from that the location pointed at by A0.
Effect of MOVE.B 4(A0),D0
pointed at by A0 Why would you wish to use this addressing mode?
Consider the data structure of Fig. 6.14 where three variables
Figure 6.13 An illustration of address register indirect P, Q, and R, have consecutive locations on memory. If we load
addressing with displacement.
address register A0 with the address of the first variable, P, we
can access each variable via the pointer
in A0.
In this fragment of code we define
the displacements P, Q, and R as 0, 1,
and 2, respectively.
This code adds two numbers and stores their sum in The instruction MULU <ea>, Di multiplies the 16-bit word
memory. But where in memory? The location of the three at the effective address specified by <ea> by the lower-order
numbers is Block P, Block Q, and Block R, respec- word in Di. The 32-bit longword product is loaded into
tively. Because the value of Block can be changed by the pro- Di(0:31). MULU operates on unsigned values and uses two 16-bit
grammer, we can locate the variables P, Q, and R in any three source operands to yield a 32-bit destination operand. As the
consecutive locations anywhere in memory. Why would we 68K lacks a clear address register instruction, we have to use
want to do that? If we access variables by specifying their either MOVEA.L #0,A0 or the faster SUBA.L A0,A0 to
location with respect to a pointer, we can move the program clear A0.
about in memory without having to recalculate all addresses. Note the instruction CMPA.L #2*N,A0 containing the
expression 2*N, which is automatically evaluated by the
Using address register indirect addressing with assembler. The assembler looks up the value of N (equated to
displacement $10) and multiples it by 2 to get $20. Consequently, the
assembler treats CMPA.L #2*N,A0 as CMPA.L #$20,A0.
Let’s look at an example of this addressing mode that involves
vectors. A vector is composed of a sequence of components; Variations on a theme
for example, the vector X might be composed of four elements
The 68K supports two variations on address register indirect
x0, x1, x2, x3. One of the most common of all mathematical
addressing. One is called address register indirect addressing
calculations (because it crops up in many different areas—
with predecrementing and the other is called address register
particularly graphics) is the evaluation of the inner or scalar
indirect addressing with postincrementing. The former
product of two vectors. Suppose A and B are two n-compo-
addressing mode is written in assembly language as (Ai)
nent vectors; the inner product S, of A and B, is given by
and the latter (Ai). Both these addressing modes use
兺
S ai · bi a0 · b0 a1 · b1 … an1 · bn1 address register indirect addressing to access an operand
exactly as we’ve described. However, the postincrementing
If A (1, 3, 6) and B (2, 3, 5), the inner product S is mode automatically increments the address register after it’s
given by 1 ⋅ 2 3 ⋅ 3 6 ⋅ 5 41. Consider the case in which been used, whereas the predecrementing mode automatically
the components of vectors A and B are 16-bit integers. decrements the address register before it’s used. Figure 6.15
demonstrates the operation
ADD.B (A0), D0. This
instruction adds the contents of
the location pointed at by A0
(i.e. P) to the contents of data
register D0. After A0 has been
used to access P, the value of A0
is incremented to point at the
next element, Q.
Address register indirect
addressing is used to access
tables. If we access an item by
MOVE.B (A0),D0, the next
item (i.e. byte) in the table can
be accessed by first updating
the address pointer, A0, by
Before After ADDA #1,A0 and then repeat-
Effect of MOVE.B(A0)+,D0
ing the MOVE.B (A0),D0. The
68K’s automatic postincre-
A0 N–1 A0 N–1 menting mode increments an
N P N N +1 P N address register after it has been
Q N+1 Q N +1 used to access an operand. This
D0 D0 addressing mode is indicated by
(a) Initially, address register A0 points (b) After accessing elementP, A0 is incremented (Ai). Consider the following
at element P in memory which is accessed to point at the next element, Q. examples of address register
and loaded into D0.
indirect addressing with post
Figure 6.15 Address register indirect addressing with postincrementing. incrementing.
254 Chapter 6 Assembly language programming
The pointer register is automatically incremented by 1 for (A2), D5. Postincrementing leaves A2 pointing to the new
byte operands, 2 for word operands, and 4 for longword top item on the stack.
operands. Consider the following examples.
This pseudocode uses the notation numberi to indicate the If we use MOVE.B (A0), D0, the contents of address
ith element in a sequence. We can express this in 68K assem- register A0 incremented to 100116 after the character
bly language as located at 1000 has been accessed and we are ready to access
Addressing modes and strings the next character in the string. Consider the following
A string is a sequence of consecutive characters. We will example.
assume the characters are 8-bit ASCII-encoded values. It’s Counting characters Suppose we want to count the
necessary to indicate a string’s size in order to process it. You number of characters in the string pointed at by address
could store a string as n, char_1, char_2, . . . , char_n, where n register A0 and return the string length in D3. The string is
is the length of the string. For example, the ASCII-encoded terminated by the null character, which is included in the
string ‘ABC’ might be stored in memory as the sequence $03, character count.
$41, $42, $43.
You can also use a special termi-
nator or marker to indicate the end
of a string. Of course, the termina-
tor must not occur naturally in the
string. If the terminator is the null
byte, the string ‘ABC’ would be At the end of this code, address register A0 will be pointing
stored as the sequence $41, $42, $43, $00. Some strings use the at the next location immediately following the string. We can
terminator $0D because this is the ASCII code for a carriage rewrite this fragment of code.
return.
The address of a string in memory is usually of the first
character in the string. Figure 6.16 shows a 10-character
T 1000
string located at location 100016 in memory and terminated
by a null byte (i.e. 0). h 1001
Most microprocessors don’t permit direct operations on e 1002
strings (e.g. you can’t compare two strings using a single Address register 1003
instruction). You have to process a string by using byte oper- A0 1000 s 1004
ations to access individual characters, one by one. The char-
t 1005
acters of a string can be accessed by means of address register
indirect addressing. In Fig. 6.16, address register A0 contains This string begins in memory r 1006
location 1000 and extends
the value 100016, which is the address or location of the first to location 100A. i 1007
character in the string. n 1008
The operation MOVE.B (A0),D0 copies the byte pointed g 1009
at by the contents of address register A0 into data register D0.
null 100A
Applying this instruction to Fig. 6.16 would copy the charac-
ter ‘T’ into data register D0 (the actual data loaded into D0 is,
of course, the ASCII code for a letter ‘T’).
Figure 6.16 Example of a string.
256 Chapter 6 Assembly language programming
T 1000 T 2000
h 1001 h 2001
e 1002 e 2002
Address register 1003 Address register 2003
A0 1000 S 1004 A1 2000 S 2004
t 1005 t 2005
This string begins in memory r 1006 This string begins in memory r 2006
location 1000 and extends location 2000 and extends
to location 100A. i 1007 to location 200A. i 2007
n 1008 n 2008
g 1009 g 2009
null 100A null 200A Figure 6.17 Comparing two
strings.
The new instruction, TST, tests an operand by comparing Comparing strings Suppose we wish to test whether two
it with zero and setting the flag bits in the CCR accordingly. strings are identical. Figure 6.17 shows two strings in mem-
ory. One is located at 100016 and the other at 200016. In this
Counting A’s Suppose we want to count the number of times
case both strings are identical.
‘A’ occurs in a string that starts at address Find_A.
The instruction CMP.B #’A’,D0 compares the contents In order to compare the strings we have to read a character
of D0 (i.e. the last character read from the string) with the at a time from each string. If, at any point, the two characters
source operand, #’A’. The # symbol means the actual value do not match, the strings are not identical. If we reach two
and the ‘A’ means the number whose value is the ASCII code null characters, the strings are the same. A0 points at one
for the letter A. If you omit the # symbol, the processor will string and A1 points at the other. We will set D7 to a zero if the
read the contents of memory location 4116 (because strings are not the same, and one if they are the same.
‘A’ 4116). Because the MOVE instruction sets the CCR, we
can test for the terminator as soon as we pick up a character,
as the following code demonstrates.
Next Page
Removing spaces A common string manipulation problem is We read characters from the string and copy them to their
the removal of multiple spaces in text. If you enter a command destination until a space is encountered. The first space encoun-
into a computer like delete X, Y, Z the various component parts tered is copied across. We continue to read characters from the
(i.e. fields) of the command are first analyzed. A command line source string but do not copy them across if they are further
processor might remove multiple spaces before processing the spaces. This algorithm requires some care. If we are searching
command. Figure 6.18 shows how we might go about dealing for multiple spaces, we will move one character beyond the
with this problem. On the left, the string has three spaces. On the space because of the autoincrementing addressing mode.
right, the same string has been rewritten with only one space. Therefore, we have to adjust the pointer before continuing.
Because the final string will be the same size or shorter than Figure 6.19 demonstrates the operation of this algorithm.
the original string, we can simply move up characters when we By the way, there is a flaw in this program. What happens if
find a multiple space. We can use two pointers, one to point at the end of the string is a space followed by a null? How can
the original string and one to point at the final string. you fix the problem?
T T
h h
e e
Address register Address register
A0 A1 t
e
t s
The string contains three e t
spaces between ‘The’ and
‘test’ s null
(b) Register A1 points
t to the destination string.
(a) Register A0 points null We move only the first
to the source string. space in a group.
X X Offset
The offset selects an Data block 3
A0 A1 item within a data block.
D1
Y
X X
Figure 6.20 Indexed addressing—executing
MOVE.B Offset(A0,D0),D1.
A0 A1
6.4.3 Relative addressing move (i.e. relocate) PIC programs in memory without mod-
ifying them. MOVE 36(PC),D0 means load data register D0
Before we introduce this addressing mode, we’ll pose a prob- with the contents of the memory location 36 locations on
lem. Consider the operation MOVE $1234,D0, which spec- from this instruction. It doesn’t matter where the operation
ifies the absolute address $1234 as a source operand location. MOVE 36(PC),D0 lies in memory, because the data associ-
If you were to take the program containing this instruction ated with it will always be stored in the 36th location follow-
and its data and locate it in a different region of memory, ing the instruction.
would it work? No. Why not? Because the data accessed by the Calculating the displacement required by an instruction
instruction is no longer in location $1234. The only way to using program counter relative addressing is difficult.
run this program is to change all operand addresses to their Fortunately, you never have to perform this calculation—the
new locations. Relative addressing provides a means of relo- assembler does it for you. Consider the following example.
cating programs without changing addresses.
Relative addressing is similar to address register indirect
addressing because the effective address of an operand is
given by the contents of a register plus a displacement.
However, relative addressing uses the program counter to cal-
culate the effective address rather than an address register;
that is, the location of the operand is specified relative to the
current instruction. The syntax of a 68K relative address is Let’s assemble this code and see what happens.
260 Chapter 6 Assembly language programming
The following fragment of code demonstrates how the automatically added to the PC at the start of an instruction,
LEA instruction can be used to support position independ- relative branching is possible within the range 126 to 129
ent code. bytes from the start of the current instruction (i.e. the
When the instruction LEA Value1(PC),A0 is assembled, branch). The 68K also supports a long branch with a 16-bit
the assembler takes the value of Value1 and subtracts the offset that provides a range of 32K to 32K bytes.
current value of the program counter from it to evaluate the Figure 6.22 also illustrates the importance of relative
offset required by the instruction. branching in the production of position-independent code.
We now look at one of the most important applications of The program containing the instruction BRA XYZ can be
program counter relative addressing, relative branching. relocated merely by moving it in memory, whereas the
6.4 Addressing modes 261
program containing JMP XYZ must be modified if it is tion is $00 0410 and the address of the operation
relocated. MOVE (A0), , (A1) is $00 040C. We therefore have to
The following program moves a block of data from one branch four locations from the start of the BNE, or six loca-
region of memory to another and provides examples of both rel- tions from the end of the BNE. As the CPU always increments
ative branching and relative addressing. The first location of the the PC by 2 at the start of a branch, the stored offset is
block to be moved is FROM and the first location of its destination 6. In two’s complement form this is $FA (the code is
is TO. The number of words to be moved is given by SIZE. $66FA).
In Fig. 6.23 the instruction BNE REPEAT causes a branch Note how we use relative addressing to load the address of
backwards to instruction MOVE(A0),(A1) in the event the source and destination blocks into address registers A0
of the zero bit in the CCR not being set. From the memory and A1, respectively. This program can be assembled to give
map of Fig. 6.23, we see that the address of the branch opera- the following.
262 Chapter 6 Assembly language programming
with four data elements on the stack. When the ADD is exe-
(c) Second element pulled off the stack.
cuted, the element at the top of the stack is pulled
(Fig. 6.25(b)) and sent to the adder. The next element (i.e. C,
the old NOS) is now the new TOS. In Fig. 6.25(c) the element
at the top of stack is pulled and sent to the adder. Finally, the
output of the adder, D C, is pushed onto the stack to create
a new TOS.
Note how this ADD instruction doesn’t have an operand A
unlike all the instructions we’ve described so far. A stack-
(d) Result pushed on the stack.
based computer has so-called addressless instructions
because they act on elements at the top of the stack.
Figure 6.25 Executing an ADD operation on a stack machine.
The following example illustrates the evaluation of the
expression (A B)(C D) on a hypothetical stack-based
computer. We assume that the instruction PUSH pushes the on the stack in the way we’ve just described (e.g. ADD, SUB,
contents of D0 onto the stack, ADD, SUB, and MULU all act on MULU), special-purpose microprocessors have been designed
the top two items on the stack, and PULL places the top item to support stack-based languages. The 68K implements
on the stack in D0. instructions enabling it to access a stack, although it’s not a
stack machine. Pure stack machines
do exist, although they have never
been developed to the same extent as
the two-address machines like the
68K and Pentium.
)(C – D)
Memory
A+B
(C+D)
Figure 6.27 Executing the
program of Fig. 6.26 on a
Step 9 Step 10 Step 11 machine with a stack pointer.
Microprocessors don’t implement a stack in this way and simultaneously, because all its address registers can be used as
the items already on the stack don’t move as new items stack pointers.
are pushed and old ones pulled. The stack is located in a In what follows, we use the 68K’s stack pointer to illustrate
region of the main store and a stack pointer points to the the operation of a stack. You might expect the assembly lan-
top of the stack. This stack pointer points at the top of guage instruction that pushes the contents of D0 on the stack
stack as the stack grows and contracts. In some micro- to be PUSH D0, and the corresponding instruction to pull an
processors, the stack pointer points to the next free location item from the stack and put it in D0 to be PULL D0. Explicit
on the stack, whereas in others, it points to the current top PUSH and PULL instructions are not provided by the 68K. You
of stack. can use address register indirect with predecrementing
Figure 6.27 demonstrates how the program illustrated in addressing mode to push, and address register indirect with
Fig. 6.26 is executed by a computer with a stack in memory postincrementing addressing mode to pull.
and a stack pointer, SP. Figure 6.28 illustrates the effect of a PUSH D0 instruction,
The 68K doesn’t have a special system stack pointer—it which is implemented by MOVE.W D0,(SP), and PULL
uses address register A7. We call A7 the system stack pointer D0, which is implemented by MOVE.W (SP),D0. The 68K’s
because the stack pointed at by A7 stores return addresses stack grows towards lower addresses as data is pushed on it;
during subroutine calls. Assemblers let you write either A7 or for example, if the stack pointer contains $80014C and a
SP; for example, MOVE.W D0,(A7) and MOVE.W D0,(SP) word is pushed onto the stack, the new value of the stack
are equivalent. The 68K can maintain up to eight stacks pointer will be $80014A.
6.5 The stack 265
(SP) is defined
The 68K’s push operation MOVE.W D0,
in RTL as
Push and pull operations use word or longword operands. system stack pointer, A7. The reason for this restriction is
A longword operand automatically causes the SP to be that A7 must always point at a word boundary on an even
decremented or incremented by 4. Address registers A0 to A6 address (this is an operational restriction imposed by the
may be used to push or pull byte, .B, operands—but not the 68K’s hardware).
The 68K’s stack pointer is decre-
mented before a push and incremented
N–4 N–4 after a pull. Consequently, the stack
A7 A7 N – 2 Top of stack pointer always points at the item at the
N N–2 N top of the stack; for example,
Stack pointer N+2 MOVE (SP),D3 pulls the top item off
N+4 Stack the stack and deposits it in D3. Note
N+6 that MOVE (SP),D3 copies the TOS
N+8 N+8 into D3 without modifying the stack
pointer.
When the stack shrinks after a
(a) Snapshot of the 68K's stack. (b) State of the stack after pushing a
word by MOVE.W D0,–(A7). MOVE.W (SP), D0 operation, items
on the stack are not physically deleted;
Figure 6.28 The 68K’s stack. they are still there in the memory until
overwritten by, for example, a
MOVE.W D0,(SP) operation.
Memory Memory The stack can be used as a temp-
orary data store. Executing a
Stack pointer MOVE.W D0,(SP) saves the contents
A7 21DC D0 21DC of D0 on the stack, and executing a
D1 21E0 MOVE.W (SP),D0 returns the con-
Registers D0 to D5 D2 21E4 tents of D0. The application of the stack
and A2 to A5 are D3 21E8 as a temporary storage location avoids
dumped on the stack storing data in explicitly named mem-
D4 21EC
D5 21F0 ory locations. More importantly, if fur-
A2 21F4 ther data is stored on the stack, it does
A3 21F8 not overwrite the old data.
Stack pointer A4 21FC The 68K has a special instruction
A7 2200 Top of stack 2200 A5 2200
called move multiple registers (MOVEM),
which saves or retrieves an entire group
of registers. For example MOVEM.L
(a) Initial state of stack. D0D7/A0A7, (A7) pushes all
(b) State of stack after MOVEM.L D0–D5/
A2–A5,–(A7). registers on the stack pointed at by A7.
The register list used by MOVEM is writ-
Figure 6.29 The 68K’s stack.
ten in the form DiDj/ ApAq and
266 Chapter 6 Assembly language programming
specifies data registers Di to Dj inclusive and address registers uses an absolute address and cannot therefore be used to gen-
Ap to Aq inclusive. Groups of registers are pulled off the stack erate position-independent code. JSR may use an address reg-
by, for example, MOVEM.L (A7), D0D2/D4/A4A6. ister indirect address; for example, JSR (A0) calls the
The most important applications of the stack are in the subroutine whose address is in A0.
implementation of subroutines (discussed in the following
section) and in the handling of interrupts. When autodecre- Using subroutines—an example
menting is used, registers are stored in the order A7 to A0 We now look at an example of how subroutines are used. The
then D7 to D0 with the highest numbered address register following program inputs text from the keyboard and stores
being stored at the lowest address. Figure 6.29 illustrates the successive characters in a buffer in memory until an @ sym-
effect of MOVEM.L D0D5/A2A5,(A7). bol is typed. When an @ is encountered, the text is displayed
on the screen. In this simple example, we don’t test for buffer
overflow.
6.5.2 The stack and subroutines In this example we use the character input and output
mechanisms built into both EASy68K and the Teesside 68K
A subroutine is called by the instruction BSR <label> or JSR simulators. All I/O is performed by means of a TRAP #15
<label>, where BSR means branch to subroutine and JSR instruction, which is a call to the operating system. We have’t
means jump to subroutine. The difference between BSR and yet covered the 68K’s TRAP instructions, but all we need say
JSR is that BSR uses a relative address and JSR an absolute here is that a TRAP calls a function that forms part of the
address. Remember that the programmer simply supplies the computer’s operating system. Before the TRAP is executed,
label of the subroutine and the assembler automatically calcu- you have to tell the O/S what operation you want by putting a
lates the appropriate relative or absolute address. To call a sub- parameter in data register D0. A ‘5’ indicates character input
routine ABC, all we have to do is write either BSR ABC or JSR and a ‘6’ indicates character output. When a character is
ABC. The BSR is preferred to JSR because it permits the use of input, it is deposited in D1. Similarly, the character in D1 is
position-independent code. The range of branching with BSR is displayed by the output routine.
32 kbytes to 32 kbytes from the present instruction. JSR We can express the algorithm is pseudocode as follows.
6.5 The stack 267
In the following program, the BUFFER is a region of memory When an RTS instruction is encountered at the end of a
reserved for the data to be stored. subroutine, the longword address on the top of the stack is
The instruction CMP.B #’@’,D1 compares the contents pulled and placed in the program counter in order to force a
of the lower-order byte of data register D1 with the byte return to the calling point. The following code is produced
whose ASCII code corresponds to the symbol @. The instruc- by assembling this program. We will need this output
tion LEA BUFFER(PC),A0 generates position-independent when we trace the program (in particular the addresses
code because it calculates the address of the buffer relative to of the subroutines and the return addresses of subroutine
the program counter. Had we written LEA BUFFER,A0, the calls).
code would not have been position independent.
268 Chapter 6 Assembly language programming
We’ve set A0 to point to the buffer for input data. The next
instruction calls the subroutine to input a character. Note the
change in the PC to $428.
Having got the input (in this case Z) in D1, we return from
the subroutine. Watch the program counter again. It is
currently $42E and will be replaced by $408 (i.e. the address
of the instruction after the subroutine call.
6.5 The stack 269
Because D1 contains the ASCII code for ‘@’, the test for equal-
ity will yield true and we will not take the branch back to $0404.
We call the operating system with the TRAP. Note that the
contents of D1 will be printed as the ASCII character Z. Then
we return to the body of the program.
And so on . . .
You can’t use registers to transfer large quantities of data to above the top of the stack). If an interrupt occurs or you call
and from subroutines, due to the limited number of registers. a subroutine, the new return address will be pushed on the
You can pass parameters to a subroutine by means of a mail- top of the stack overwriting the old return address. Never
box in memory. Consider the following. move the stack pointer below the top of stack.
Such a solution is poor, because the subroutine can’t be You can avoid using the stack pointer by copying it to
interrupted or called by another program. Any data stored in another address register with LEA (A7),A0. Now you can
explicitly named locations could be corrupted by the inter- use A0 to get the parameters; for example, P1 can be loaded
rupting program (see the box on interrupts). Let’s look at into D1 by MOVE.W 6(A0),D1. The offset 6 is required
how data is transferred between a subroutine and its calling because the parameter P1 is buried under the return address
program by many high-level languages. (4 bytes) and P1 (2 bytes). Similarly, P2 can be loaded into D2
by MOVE.W 4(A0),D2.
Passing parameters on the stack After returning from the subroutine with RTS, the contents of
An ideal way of passing information between the subroutine the stack pointer are [A7] 4, where A7 is the value of the stack
and calling program is via the stack. Suppose two 16-bit pointer before P1 and P2 were pushed on the stack. The stack
parameters, P1 and P2, are needed by the subroutine pointer can be restored to its original value or cleaned up by exe-
ABC(P1,P2). The parameters are pushed on the stack cuting LEA 4(A7),A7 to move the stack pointer down by two
immediately before the subroutine call by the following code: words. Note that LEA 4(A7),A7 is the same as ADD.L #4,A7.
P1 and P2 are, of course, still in the same
locations in memory but they will be over-
written as new data is pushed on the stack.
By using the stack to pass parameters to
a subroutine, the subroutine may be inter-
The state of the stack prior to the subroutine call and rupted and then used by the interrupting program without
immediately after it is given in Fig. 6.30. Note that the return the parameters being corrupted. As the data is stored on the
address is a longword and takes up two words on the stack. stack, it is not overwritten when the subroutine is interrupted
On entering the subroutine, you can retrieve the parame- because new data is added at the top of the stack, and then
ters from the stack in several ways. However, you must never removed after the interrupt has been serviced.
change the stack pointer in such a way that you move it down Let’s look at another example of parameter passing in detail.
the stack. Consider Fig. 6.30(c) where the stack pointer is In the following program two numbers are loaded into D0 and
pointing at the return address. If you add 4 to the stack D1, and then the contents of these registers are pushed on the
pointer, it will point to parameter P2 on the stack. You can stack. A subroutine, AddUp, is called to add these two numbers
now get P2 with, say, MOVE.W (A7),D0. However, the return together. In this case the result is pushed on the stack.We’ve used
address is no longer on the stack (it’s still there in memory blue to highlight code that performs the parameter passing.
6.5 The stack 273
Figure 6.30 Passing parameters on the stack (all values on the stack are words or longwords).
THE INTERRUPT
An interrupt is a method of diverting the processor from its explicitly named memory locations will be overwritten and
intended course of action, and is employed to deal with errors corrupted by the re-use of the subroutine. If the data had been
and external events that must be attended to as soon as they stored in registers and the content of the registers pushed on
occur. Whenever a processor receives an interrupt request the stack by the interrupt-handling routine, no data in the
from a device, the processor finishes its current instruction and subroutine would have been lost by its re-use. After the sub-
then jumps to the program that deals with the cause of the routine has been re-used by the interrupt-handling routine, the
interrupt. After the interrupt has been serviced, a return is contents of the registers stored on the stack are restored and a
made to the point immediately following the last instruction return from interrupt is made with the state of the registers
before the interrupt was dealt with. The return mechanism of exactly the same as at the instant the interrupt was serviced.
the interrupt is almost identical with that of the subroutine— Interrupts may originate in hardware or software. A
the return address is saved on the stack. hardware interrupt may occur when you move the mouse.
Suppose a subroutine is intrerrupted during the course of its A software interrupt may occur when you perform an illegal
execution. If the interrupt-handling routine also wishes to use operation or even when you generate one with a TRAP #15
the same subroutine (yes, that’s possible), any data stored in instruction.
Note the five new entries to the right of the register display.
These lines display the five longwords at the top of the stack.
Each line contains the stack address, the longword in that
address, and the address with respect to the current stack
pointer.
This program calls a subroutine to swap two numbers, In this case, there is only one copy of the parameter. We repeat
A and B, which are first pushed on the stack in the main pro- the example in which we added two numbers together, and, this
gram. In subroutine SWAP the two parameters are retrieved time, pass the parameters to the subroutine by reference.
from their locations on the stack and swapped over. Once a The following program introduces a new instruction,
return from subroutine is made and the stack cleaned up, the push effective address PEA, which pushes an address in the
parameters on the stack are lost. Parameters A and B in the stack; for example, the operation PEA PQR pushes the address
main program were never swapped. PQR on the stack. The instruction PEA PQR is equivalent to
MOVE.L #PQR,(A7).
You can pass a parameter to a subroutine by reference by pass- The following is the assembled version of this program and
ing its address on the stack. This is, you don’t say ‘Here’s a para- Fig. 6.32 provides snapshots of memory and registers during
meter’. Instead you say, ‘Here’s where the parameter is located’. the execution of the code.
278 Chapter 6 Assembly language programming
We can now run this program line by line. Note We will use the simulator command MD 500 to view the
how the addresses of the variables are pushed on the stack data area. Initially it contains the two 16-bit constants
and then loaded in address registers in the subroutine. 1 and 2.
Next Page
If we look at memory again, we will find that the sum of X implement a text matching algorithm that determines
and Y has been stored in location Z. whether a string contains a certain substring. The problem
can be solved by sliding the substring along
the string until each character of the sub-
string matches with the corresponding
We have passed the parameters by reference. In practice, a character of the string, as illustrated in Fig. 6.33.
programmer would pass parameters that aren’t changed in The string starts at address $002000 and is terminated by a
the subroutine by value, and only pass parameters that are to carriage return (ASCII code $0D). The substring is stored at
be changed by reference. location $002100 onwards and is also terminated by a car-
riage return. In what follows, the string of characters is
6.6 Examples of 68K programs referred to as STRING, and the substring as TEXT.
We now put together some of the things we’ve learned about We will construct a main program that calls a subroutine,
the 68K’s instruction set and write a simple program to MATCH, to scan the string for the first occurrence of the
String T H I S T H A T T H E N T H E O T H E R
Substring T H E N T H E
Step Matches T H I S T H A T T H E N T H E O T H E R
1 5 T H E N T H E
2 0 T H E N T H E
3 0 T H E N T H E
4 1 T H E N T H E
5 0 T H E N T H E
6 6 T H E N T H E
7 0 T H E N T H E
8 0 T H E N T H E
9 1 T H E N T H E
10 0 T H E N T H E
11 8 T H E N T H E
12 0 T H E N T H E Figure 6.33 Matching a string
13 0 T H E N T H E and a substring.
6.6 Examples of 68K programs 281
substring. Because STRING and TEXT are both strings of con- in the string matching the first character of the substring. This
secutive characters, we will pass them to MATCH by reference. address is to be returned on the stack. If the match is unsuc-
The subroutine should return the address of the first character cessful, the null address, $00000000, is pushed on the stack.
282 Chapter 6 Assembly language programming
Direction of
pointer movement
Start 010000
Start
1024-byte End
buffer
End 0103FF
010400 IN_ptr
010404 OUT_ptr
OUT_ptr IN_ptr
010408 Count
The pseudocode is now fairly detailed. Both the module ving character and the pointer to the output must be moved
selection and the initialization routines are complete. We still to reflect this.
have to work on the input and output routines because of the Sometimes it is helpful to draw a simplified picture of the
difficulty in dealing with the effects of overflow and under- system to enable you to walk through the design. Figure 6.35
flow in a circular buffer. shows a buffer with four locations. Initially, in state (a), the
We can determine the state of the buffer by means of a buffer is empty and both pointers point to the same location.
variable, Count, which indicates the number of characters in At state (b), a character is entered, the counter incremented,
the buffer. If Count is greater than zero and less than its max- and the input pointer moved to the next free position. States
imum value, a new character can be added or one removed (c) to (e) show the buffer successively filling up to its maxi-
without any difficulty. If Count is zero, the buffer is empty mum count of 4. If another character is now input, as in state
and we can add a character but not remove one. If Count is (f), the oldest character in the buffer is overwritten.
equal to its maximum value and therefore the buffer is full, It is not necessary to rewrite the entire module in pseudocode.
each new character must overwrite the oldest character as We will concentrate on the input and output routines and then
specified by the program requirements. This last step is tricky begin assembly language coding.Because the logical buffer is cir-
because the next character to be output (the oldest character cular while the physical buffer is not, we must wrap the physical
in the buffer) is overwritten by the latest character. Therefore, buffer round. That is, when the last location in the physical
the next character to be output will now be the oldest survi- buffer is filled, we must move back to the start of the buffer.
284 Chapter 6 Assembly language programming
Now that we’ve designed and coded the buffer, the next example, to initialize the buffer you type 0X, where X is any
step is to test it. The following code is the assembled circular character. If you type 1Y, the character Y is stored in the next
buffer program with the necessary driver routines. The pro- free place in the buffer. If you type 2Z, the next character to be
gram inputs two characters at a time, and implements an output is displayed. After each operation, the contents of the
8-byte buffer. The first character of a pair is the control char- buffer are printed and the current value of the variable count
acter (i.e. 0 initialize, 1 input, and 2 output). For displayed.
286 Chapter 6 Assembly language programming
6.6 Examples of 68K programs 287
This example concludes our overview of 68K assembly the assembly language syntax of the 68K instruction into the
language programming. We have only touched the surface of RTL notation that defines the action of the instruction.
this topic. Real assemblers are far more complex and include (a) MOVE 3000,4000 (g) MOVE (A0),D3
facilities to design large programs such as the separate com- (b) MOVE D0,D4 (h) MOVE #12,(A0)
pilation of modules. However, our intention was never to cre- (c) MOVE 3000,D0 (i) MOVE (A1),(A2)
ate an assembly language programmer; it was to give you (d) MOVE D0,3000 (j) ADD D2,D1
some insight into how machine-level instructions are used to (e) MOVE #4000,D4 (k) ADD #13,D4
achieve high-level actions. (f) MOVE #4000,5000 (l) ADD (A3),1234
The following code was produced by a 68K cross-assembler (c) What is the effect of the assembler directive DC.W 1234?
from the above source code. (d) What is the effect of the assembler directive DC.W $1234?
6.6 By means of a memory map explain the effect of the (e) What is the effect of the ‘’ in the effective address
following sequence of 68K assembly language directives. (A0)?
(f) What is the effect of the ‘’ in the effective
address -(A0)?
(g) Why ’ADDA.L #4, A0’, but ‘ADD.L #4, D0’?
6.14 Explain what the following 68K program does. Use the 68K
simulator to test your observations.
6.18 Examine the following fragment of pseudocode and its 6.22 Write a subroutine to carry out the operation X*(YZ),
translation into 68K assembly language. Work through this code where X, Y, and Z are all wordlength (i.e. 16-bit) values. The three
and ensure that you understand it. Is the program correct? Can parameters, X, Y, and Z, are to be passed on the stack to the
you improve it? procedure. The subroutine is to return the result of the
calculation via the stack. Remember
that the 68K instruction MULU
D0,D1 multiplies the 16-bit
unsigned integer in D0 by the 16-bit
unsigned integer in D1 and puts the
32-bit product in D1.
Write a subroutine, call it, and
pass parameters X, Y, and Z on the
stack. Test your program by using
the 68K simulator’s debugging
facilities.
This is not an easy or trivial problem. You will need to draw a (b) What code would you use to clean up the stack?
map of the stack at every stage and take very great care not to (c) Draw a memory map of the stack immediately before exe-
confuse pointers (addresses) and actual parameters. cuting the RTS in the subroutine PQR.
6.24 Suppose you wish to pre-load memory with the value 6.28 Write an assembly language program to reverse the bits
1234 before executing a program. Which of the following opera- of a byte.
tions is correct? 6.29 Explain why the following assembly language and RTL
(a) DC.B #1234 constructs are incorrect
(b) DC.W 1234 (a) MOVE D4,#$64
(c) DC.W #1234 (b) MOVE (D3),D2
(d) DS.B $1234 (c) [D3] ← A0 + 3
(e) MOVE.W #1234,Location (d) [D3] ← #3
6.25 Which of the following defines MOVE.B (A2), D3? 6.30 The 68K has both signed and unsigned conditional
(a) D3 ← [[A2]]; [A2] ← [A2] + 1 branches. What does this statement mean?
(b) [D3] ← [[A2]]; [A2] ← [A2] + 1
6.31 You cannot (should not?) exit a subroutine by jumping out
(c) D3] ← [[A2]]; [A2] ← [A2] + 1
of it by means of a branch instruction. You must exit it with an
(d) [A2] ← [A2] + 1; [D3] ← [A2];
RTS instruction. Why?
6.26 Which of the following statements is true when a parameter
6.32 Assume that a string of ASCII characters is located in
is passed to a subroutine by reference (i.e. not by value).
memory starting at location $2000. The string ends with the
(a) The parameter can be put in an address register. character ‘Z’. Design and write a 68K assembly language pro-
(b) The address of the parameter can be put in an address gram to count the number of ‘E’s, if any, in the string.
register.
(c) The address of the parameter can be pushed on the stack. 6.33 Express the following sequence of 68K assembly language
(d) The parameter can be pushed on the stack. instructions in register transfer language and explain in plain
(e) Parts (a) and (d) are correct. English what each instruction does.
(f) Parts (b) and (c) are correct. (a) LEA 4(A2),A1
(b) MOVEA.L A3,A2
6.27 Consider the following code:
(c) MOVE.B (A1),D3
MOVE.W X,-(A7) Push X (d) MOVE.B #5,(A1)
MOVE.L Y,-(A7) Push Y (e) BCS ABC
BSR PQR Call PQR (f) MOVE.B (A1)+,-(A3)
Clean_up Clean up the stack
6.34 The following fragment of 68K assembly language has
(a) Why do you have to clean up the stack after returning from
several serious errors. Explain what the errors are. Explain how
the subroutine?
you would correct the errors.
292 Chapter 6 Assembly language programming
6.35 Suppose you are given an algorithm and asked to design 6.37 Suppose that A0 contains $F12CE600. What is the result
and test a program written in 68K assembly language. How of
would you carry out this activity? Your answer should include (a) ADDA.L #$1234,A0
considerations of program design and testing, and the necessary (b) ADDA.W #$1234,A0
software tools. (c) ADDA.W #$4321,A0
6.36 Suppose that D0 contains $F12C4689 and D1 contains 6.38 What is the effect of the following code?
$211D0FF1. What is the result of
CLR D0
(a) ADD.B D0,D1 MOVE.B D0,D1
(b) ADD.W D0,D1 MOVE.B #10,D2
(c) ADD.L D0,D1 XXX ADD.B D2,D0
In each case, give the contents of D1 after the operation and ADD.B #1,D1
the values of the C, Z, N, and V flags. ADD.B D1,D0
SUB.B #1,D2
BNE XXX
STOP #$2700
Structure of the CPU 7
CHAPTER MAP
INTRODUCTION
In Chapters 2 and 3 we introduced combinational and sequential logic elements and
demonstrated how to build functional circuits. In Chapters 5 and 6 we introduced the instruction
set architecture and low-level programming. This chapter bridges the gap between digital circuits
and the computer by demonstrating how we can construct a computer from simple circuits; that
is, we show how a computer instruction is interpreted (i.e. executed).
We begin by describing the structure of a simple generic CPU. Once we see how a computer
operates in principle, we can look at how it may be implemented. We describe the operation of a
very simple one-and-a-half address machine whose instructions have two operands; one in
memory and one a register. Instructions are written in the form ADD A,B that adds A to B and
puts the result in B. Either A or B must be a register.
Some readers will read this introduction to the CPU before the previous two chapters on
assembly language programming. Consequently, some topics will be re-introduced.
Instead of introducing the computer all at once, we will keep things simple and build up a CPU
step by step. This approach helps demonstrate how an instruction is executed because the
development of the computer broadly follows the sequence of events taking place during the
execution of an instruction. In the next chapter we will find that this computer is highly simplified;
real computers don’t execute an instruction from start to finish. Today’s computers overlap the
execution of instructions. As soon as one instruction is fetched from memory, the next instruction
is fetched before the previous instruction has completed its execution. This mechanism is called
pipelining and we examine it more closely in the next chapter.
294 Chapter 7 Structure of the CPU
A HYPOTHETICAL COMPUTER
Anyone describing the internal operation of a computer describe either a system that executes an instruction to
must select an architecture and an organization for their completion before beginning the next instruction, or a
target machine. We have two choices: register to memory computer that overlaps or pipelines the execution of
or register to register. The register-to-memory model fits instructions.
the architecture of processors like the Pentium and 68K, I have decided to begin this chapter with the description
whereas the register-to-register model corresponds to of a register-to-memory, non-pipelined processor.
processors like the ARM, which we introduce later. When A non-pipelined organization is easier to describe than
describing the internal structure of a computer, we could one that overlaps the execution of instructions.
PC MAR
Program counter Memory address register
Instruction
Address of the address
next instruction Address
Incrementer
Memory
The contents of the
The first stage in the execution of PC are incremented
any instruction is to fetch it from each time it is used
memory. The program counter The main memory or
contains the address of the next Data
immediate access store
instruction to be executed Instruction contains both instructions
op-code and data
IR Op-code Address
Memory buffer register MBR
Address paths
Control signals
data, and control. Data comprises the instructions, constants, a write cycle, or from which data is being read in a read cycle.
and variables that are stored in memory and registers. At this stage, the MAR contains a copy of the (previous) con-
Control paths comprise the signals that trigger events, tents of the PC. When a memory read cycle is performed, the
provide clocks, and control the flow of data and addresses contents of the memory location specified by the MAR are read
throughout the computer. from the memory and transferred to the memory buffer register
(MBR). We can represent this read operation in RTL terms as
All instructions are executed in a two-phase operation The arithmetic and logic unit (ALU) is the workhorse of
called a fetch–execute cycle. During the fetch phase, the the CPU because it carries out all the calculations. Arithmetic
instruction is read from memory and decoded by the control and logical operations are applied either to the contents of
unit. The fetch phase is followed by an execute phase in which the data register or MBR alone, or to the contents of the data
the control unit generates all the signals necessary to execute register and the contents of the MBR. The output of the ALU
the instruction. Table 7.1 describes the sequence of opera- is fed back to the data register or to the MBR.
tions taking place in a fetch phase. In Table 7.1 FETCH is a Two types of operation are carried out by the ALU—arith-
label that serves to indicate a particular line in the sequence of metic and logical. The fundamental difference between arith-
operations. The notation IR(op-code) means the operation- metic and logical operations is that logical operations don’t
code field of the instruction register. generate a carry when bit ai of word A and bit bi of B are oper-
ated upon. Table 7.2 provides examples of typical arithmetic
and logical operations. A logical shift treats an operand as a
7.1.3 The CPU’s data paths string of bits that are moved left or right. An arithmetic shift
Now that we’ve sorted out the fetch phase, let’s see what else treats a number as a signed two’s complement value and
we need to actually execute instructions. Figure. 7.2 adds new propagates the sign bit during a right shift. Most of these
data paths to the simplified CPU of Fig. 7.1, together with an operations are implemented by computers like the 68K,
address path from the address field of the instruction register Pentium, and ARM.
to the memory address register. Other modifications to Having developed our computer a little further, we can
Fig. 7.1 included in Fig. 7.2 are the addition of a data register, now execute an elementary program. Consider the high-level
D0, and an arithmetic and logical unit, ALU. language operation P Q R. Here the plus symbol means
The data register called D0 holds temporary results during arithmetic addition. The assembly language program
a calculation. You need a data register in a one-address required to carry out this operation is given below.
machine because dyadic operations with two operands such Remember that P, Q, and R are symbolic names that refer to
as ADD X,D0 take place on one operand specified by the the locations of the variables in memory.
instruction and the contents of the data register. This instruc-
LOAD Load data register D0 with the contents of
Q,D0
tion adds the contents of memory location X to the contents
memory location Q3
of the data register and deposits the result of the addition in ADD R,D0 Add the contents of memory location R to data
the data register, destroying one of the original operands. register D0
The arrangement of Fig. 7.2 has only one general-purpose STORE D0,P Store the contents of data register D0 in memory
data register, which we’ve called D0 for compatibility with the location P
68K we used in the previous chapters. Some first-generation
8-bit microprocessors had only one general-purpose data 2
A machine with a single accumulator does not need to specify it explic-
register, which was called the accumulator. itly; for example, an 8-bit microprocessor may use ADD P to indicate add
We can represent an ADD X,D0 instruction2 by the RTL the contents of P to the accumulator. We write ADD P,DO and make D0
explicit to be consistent with the notation we used for 68K instructions.
expression 3
We have defined explicit LOAD and STORE operations to be
consistent with the CPU we construct later in this chapter. The 68K uses
a single MOVE operation to indicate LOAD and STORE
7.1 The CPU 297
PC MAR
Program counter Memory address register
Address of next
location to be
Address accessed
Incrementer
Data
Instruction read
from memory
MBR
Data read from
memory or to be
written to memory
Control unit Clock
Data register D0
Control signals
A
Output from Data is transmitted
ALU goes to to the ALU where it
D0 or MBR is operated on
f(A,B) ALU
Address paths
B
Arithmetic and
logic units performs
Data paths data processing
operations
Operations sharing the same line are executed simultaneously. ALU ← [MBR] and ALU ← [D0] are executed simultaneously.
Table 7.3 Expressing the FETCH/EXECUTE cycle for an ADD instruction in RTL.
MICROPROGRAMMING
The terms microprogram, microprogramming, microcode, A microprogram consist of a sequence of microinstructions
microinstruction, and micro-operation have nothing to do that, when executed, implement a machine-level instruction.
with microprocessor or microcomputer. A micro-operation For example, the machine-level or macro-level operation
is the smallest event that can take place within a computer; ADD P,D0 can be implemented by executing a sequence
typical micro-operations are the clocking of data into of micro-operations. The instructions that comprise a
a register, a memory read operation, putting data on a bus, microprogram are called microcode.
or adding two numbers together.
gives the sequence of operations carried out during the fetch sequential mode; that is, the computer can execute only a
and execute phases of an ADD R, DO instruction. These oper- stream of instructions, one by one in strict order. We covered
ations tell us what is actually going on inside the computer. conditional behavior in the previous chapter and we require
During the fetch phase the op-code is fed to the control a means of implementing instructions such as BEQ Target
unit by CU ← [IR(op-code)] and used to generate all the (branch on zero flag set to Target).
internal signals required to perform the addition—this The computer in Fig. 7.2 lacks a mechanism for making
includes programming the ALU to do addition by adding choices or repeating a group of instructions. To do this, the
together the data at its two input terminals to produce a sum CPU must be able to execute conditional branches or jumps
at its output terminals. such as BEQ XYZ. We’ve already met the branch that forces
Operations of the form [PC] ← [MAR] or [D0] ← [D0] the CPU to execute an instruction out of the normal
[MBR] are often referred to as microinstructions. Each sequence. The block diagram of Fig. 7.3 shows the new
assembly level instruction (e.g. LOAD, ADD) is executed address and data paths required by the CPU to execute con-
as a series of microinstructions. Microinstructions and ditional branches.
microprogramming are the province of the computer Three items have been added to our computer in Fig. 7.3:
designer. In the 1970s some machines were user-micropro- ●
a condition code register, CCR
grammable; that is, you could define your own instruction ●
a path between the CCR and the control unit
set. We take a further look at microinstructions later in this
chapter.
●
a path between the address field of the instruction register
and the program counter.
PC MAR
Program counter Memory address register
Address
Incrementer
The next instruction may be the
Memory next instruction in sequence
or the instruction at the branch
target address
Data
The IR's operand field
provides the branch
target address
Data register D0
Control signals A
The arthimetic and
logic unit performs
data processing
f(A,B) ALU operations
The control units uses the
CCR output to decide whether
to execute the next instruction
or to force a branch B
The condition code register or processor status register takes C carry Set if a carry was generated in the last operation.
a snapshot of the state of the ALU after each instruction has The C-bit is, of course, the same as the carry bit in the carry
been executed and records the state of the carry, negative, flip-flop.
zero, and overflow flag bits. A conditional branch instruc-
Z zero Set if the last operation generated a zero
tion interrogates the CCR’s current state. The control unit
result.
then either forces the CPU to execute the next instruction
in series or to branch another instruction somewhere in N negative Set if the last result generated a negative result.
the program. Let’s look at the details of the conditional V overflow Set if the last operation resulted in an arith-
branch. metic overflow; that is, an operation on one or two two’s com-
The CPU updates the bits of its condition code register plement values gave a result that was outside its allowable
after it carries out an arithmetic or a logical operation to range (an arithmetic overflow occurs during addition if the
reflect the nature of the result. The following is a reminder of sign bit of the result is different from the sign bit of both
the operations of the condition code bits. operands).
300 Chapter 7 Structure of the CPU
The condition code register is connected to the control memory. Sometimes we wish to use instructions such as ADD
unit, enabling instructions to interrogate the CCR. For exam- #12,D0, where the source operand supplies the actual value
ple, some instructions test whether the last operation per- of the data being referred to by the op-code part of the
formed by the central processor yielded a positive result, or instruction. Although the symbol ‘#’ appears as part of
whether the carry bit was set, or whether arithmetic overflow the operand when this instruction is written in mnemonic
occurred. form, the assembler uses a different op-code code for
There’s no point in carrying out an interrogation unless ADD #literal,D0 than it does for ADD address,D0.
the results are acted upon. We need a mechanism that does The instruction ADD.B #12,D0 is defined in RTL as
one thing if the result of the test is true and does another
thing if the result of the test is false.
The final modification included in Fig. 7.3 is the addition
Figure 7.4 shows that an additional data path is required
of a path between the operand field (i.e. target address) of the
between the operand field of the IR and the data register and
instruction register and the program counter. It’s this feature
ALU to deal with literal operands. In fact, the architecture of
that enables the computer to respond to the result of its inter-
Fig. 7.4 can execute any computer program. Any further
rogation of the CCR.
modifications to this structure improve the CPU’s perfor-
A conditional branch instruction tests a bit of the CCR and,
mance without adding any fundamentally new feature.
if the bit tested is clear, the next instruction is obtained from
Figure 7.5 completes the design of the computer. We
memory in the normal way. But if the bit tested is set, the next
have added a second general-purpose data register D1 and a
instruction is obtained from the location whose target address
pointer register A0. In principle, there is nothing stopping us
is in the instruction register. In the above description we said
adding any number of registers. As you can see, three buses, A,
that a branch is made if a certain bit of the CCR is set; equally
B, and C are used to transfer data between the registers and
a branch can be made if the bit is clear (branches can also be
ALU.
made on the state of several CCR bits).
The structure of Fig. 7.5 can implement instructions with
The precise way in which conditional branches are actu-
more complex addressing modes than the simple direct
ally implemented inside the computer is discussed later
(absolute) addressing we have used so far; for example MOVE
when we deal with the design of the control unit. Branch
(A0),D1 can be implemented by the following sequence of
operations can be expressed in register transfer language in
micro-operations.
the form
1. Branch on carry clear (jump to the target address if the This sequence has been simplified because, you will see
carry bit in the CCR is 0) from Fig. 7.5, that there is no direct path between register A0
BCC target: IF [C]= 0 THEN [PC]←[IR(address)] and the MBR. You would have to put the contents of A0 onto
2. Branch on equal (jump to the target address if the Z-bit bus A, pass the contents of bus A through the ALU to bus C,
in the CCR is 1) and then copy bus C to the MAR. We will return to this theme
BEQ target: IF [Z]= 1 THEN [PC]←[IR(address)] when we look at the detailed design of computers.
We have now demonstrated the flow of information that
An example of a conditional branch is as follows.
takes place during the execution of a single address computer
instruction. In the next section we reinforce some of the
Subtract X from contents of D0 things we have covered by showing how you can simulate a
If the result was zero then branch to computer architecture in C.
Last, otherwise continue
PC MAR
Program counter Memory address register
Address
Incrementer
Memory
Data
MBR
Literal to ALU
Clock Control unit
Data register D0
Control signals
A
The operand field of the
IR is fed to the data
register or ALU to provide f(A,B) ALU
a literal operand
The literal from the
instruction register B
can by loaded into The literal from the
the data register instruction register can
CCR be supplied to the ALU
Literal data paths
Condition code register
have avoided all but C’s most basic elements. All data types branch instructions. Only the store instruction performs a
are 8 bits and the only C constructs we use are the while, the write to memory.
if . . . then . . . else, and the switch constructs, which Choosing an instruction set requires many compromises;
select one of several courses of action. for example, if the number of bits in an instruction is fixed,
We are going to construct two simulators—the first increasing the number of different instructions reduces the
is a very primitive CPU with an 8-bit instruction that number of bits left for other functions such as addressing
simply demonstrates the fetch/execute cycle, and the modes or register selection.
second is not too dissimilar to typica first-generation 8-bit We can define an instruction set for our primitive 8-bit
microprocessors. machine as
PC MAR
Program counter Memory address register
Address
Incrementer
Memory
Path between
operand field of
IR and bus A to Data
literals
MBR
Op-code Address Memory buffer register
Data register D0
Address register A0
Control signals
A
Bus A
The output from
the ALU is fed via f(A,B) ALU
bus C to the registers Bus C
B
Bus B
CCR
Condition code register
We have provided only five instructions because these are Figure 7.6 defines the structure of an 8-bit instruction for
illustrative of all instructions. This computer has an 8-bit our simulated machine.
instruction format that includes both the op-code and the The first step in constructing a simulator is to describe the
operand. If we chose a 3-bit op-code (eight instructions) and action of the computer in pseudocode.
a 4-bit operand (a 16-bit
memory), the remaining bit
can be used to specify the
addressing mode (absolute
or literal). Real 8-bit micro-
processors solve the prob-
lem of instruction set design
by using 1 byte to provide an
operation code and then 0,
1, or 2 succeeding bytes to
provide an operand.
7.2 Simulating a CPU 303
8-bit instruction format C allows you to operate on individual bits of a byte; for
example, the operator n performs a right shift by n bits.
7 6 5 4 3 2 1 0
The opcode is obtained from the three most-significant bits of
the IR by shifting right five times. A bitwise logical AND can be
performed between a variable and a hexadecimal value; for
Op-code Operand example IR & 0x0F ANDs the IR with 000011112 to extract
000 = LDA the operand bits in the four least-significant bit positions.
001 = STA Addressing mode
010 = ADD 0 = absolute Once we’ve extracted the addressing mode (bit 4 of the
011 = BRA 1 = literal (immediate) instruction register) with amode (IR & 0x10) 4, we
100 = BEQ
can calculate the source operand for the load and add instruc-
Figure 7.6 Format of an 8-bit instruction. tion by
Most of the work done in the simulator takes place in the Instruction Operand
switch construct at the end of this program where each 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
instruction is interpreted.
Op-code
7.2.2 CPU with a 16-bit instruction Addressing mode 00 = absolute
01 = literal (immediate)
10 = indexed
We now describe a CPU that is much closer to the architec- 11 = PC relative
ture of typical 8-bit microprocessors. The simulator uses an
Bit 3 Direction 0 = register to memory
8-bit memory with 256 locations. Each instruction occupies not used
1 = memory to register
two consecutive memory locations—an 8-bit instruction
followed by an 8-bit operand. This arrangement provides us Figure 7.7 Format of the CPU’s instruction.
with a much richer instruction set than the previous
example. However, each fetch cycle requires two memory provided program counter relative addressing (discussed in
accesses. The first access is to fetch the op-code and the sec- the next chapter) in which the operand is specified with
onds to fetch the operand; that is; respect to the current value of the program counter; for
example, MOVE D0,12(PC) means store the con-
Copy contents of the PC to the MAR
tents of data register D0 12 bytes on from the loca-
Increment contents of the PC
Read the instruction from memory tion pointed at by the program counter.
Move the instruction to the IR The instruction itself is divided into four fields,
Save the op-code as Fig. 7.7 demonstrates. A 4-bit op-code in bits 7,
6, 5, 4 provides up to 16 instructions. A 2-bit
Copy contents of the PC to the MAR addressing mode in bits 1, 0 selects the way in
Increment contents of the PC which the current operand is treated. When the
Read the operand from memory
addressing mode is 00, the operand provides the
Move the operand to the IR
address of the data to be used by the current
Save the operand
instruction. When the addressing mode is 01 the
operand provides the actual (i.e. literal) operand. Modes 10
This multibyte instruction format is used by 8-bit and and 11 provide indexed and program counter relative
16-bit microprocessors. Indeed, the 68K has one 10-byte addressing respectively (i.e. the operand is added to the A0
instruction. register or the PC, respectively).
The architecture of this computer is memory to register or Bit 2 of the instruction is a direction bit that determines
register to memory; for example, it supports both ADD D0,M whether the source operand is in memory or is provided by
and ADD M, D0 instructions. In addition to the direct and lit- the data register; for example, the difference between
eral addressing modes, we have provided address register MOVE D0,123 and MOVE 123,D0 is determined by the value
indirect addressing with a single A0 register. We have also of the direction bit.
7.2 Simulating a CPU 305
Now that we have examined the sequence of events implement the instruction. In the next two sections we
that take place during the execution of an instruction, describe two different types of control unit; the micropro-
the next step is to demonstrate how the binary code grammed control unit and the so-called random logic control
of an instruction is translated into the actions that unit.
308 Chapter 7 Structure of the CPU
CMAR MAR
GMBR
comes only from the system bus. Note that this structure
F1 F0 function
allows the memory to transfer data directly to or from any
register; that is, all data does not have to pass through 0 0 add P to Q
the MBR. 0 1 subtract Q
The memory receives the address of the location to be 1 0 increment Q
accessed directly from the MAR, whose output is perma- 1 1 decrement Q
nently connected to the memory’s address input. A dedicated
connection between the MAR and memory is possible
Table 7.5 Decoding the ALU control code, F0, F1.
because the memory never receives an address input from a
source other than the memory address register. A permanent
connection removes the need for bus control circuits. this register is clocked by CALU, the output from the ALU
Two data paths link the memory to the system bus. In a is captured and can be put on the system bus by enabling
read cycle when memory control input R is asserted, data is gate GALU.
transferred from the memory to the system bus via tri-state Table 7.4 defines the 16 control signals in Fig. 7.8.
gate GMSR. During a memory write cycle when memory con- Instruction decoding takes an instruction and uses it to create
trol input W is asserted, data is transferred from the system a sequence of 16-bit signals that control the system in Fig. 7.8.
bus directly to the memory. The ALU is controlled by a two-bit code, F1, F0, which
The MBR, data register, program counter, and instruction determines its functions as defined in Table 7.5. These opera-
register are each connected to the system bus in the same way. tions are representative of real instructions, although a prac-
When one of these registers wishes to place data on the bus, tical ALU would implement, typically, 16 different functions.
its tri-state gate is enabled. Conversely, data is copied into a In order to keep the design of a random logic control unit
register from the bus by clocking the register. The instruction as simple as possible, we will construct a 3-bit operation code
register (IR) receives data from the memory directly, without giving a total of eight instructions. This instruction set
the data having to pass through the MBR. defined in Table 7.6 presents a very primitive instruction set
The ALU receives data from two sources, the system bus indeed, but it does include the types of instruction found in
and data register D0, and places its own output on the system real first-generation processors. We have defined explicit
bus. This arrangement begs the question, ‘If the ALU gets LOAD and STORE instructions rather than a single MOVE
data from the system bus how can it put data on the same bus instruction which does the work of both LOAD and STORE.
at the same time it is receiving data from this bus?’ Figure 7.8 Having constructed an instruction set, we define each
shows that the ALU contains an internal ALU register. When of the instructions in terms of RTL and determine the
310 Chapter 7 Structure of the CPU
Table 7.6 A primitive instruction set for the CPU of Fig. 7.8.
Table 7.7 Interpreting the instruction set of Table 7.6 in RTL and microinstructions.
7.3 The random logic control unit 311
Consider the load D0 from memory operation; this requires Then we have to put the memory in read mode, put the data
the following two steps: from the memory onto the bus by enabling the GMSR gate, and
finally capture the data in D0 by clocking register D0.
[MAR] ← [IR] EIR = 1 CMAR Copy operand address to MAR
[D0] ← [[MAR]] R = 1, EMSR = 1, CD0 Read memory and copy to D0
We have to send the operand address in the instruction reg- Table 7.7 tells us what signals have to be asserted to execute
ister to the memory address register by enabling the GIR gate the two operations required to interpret LOAD N. Table 7.8
and then clocking the data into the memory address register.
R W CMAR CMBR CPC CIR CD0 CALU EMSR EMBR EIR EPC ED0 EALU F1 F0
BEQ IF Z = 1 THEN
[PC] ← [IR] 0 0 0 0 Z 0 0 0 0 0 1 0 0 0 0 0
gives all the signals in the form of a 16-component vector; we require a source of signals to trigger each of the microin-
that is, the two vectors are structions. A circuit that produces a stream of trigger signals
0010000000100000 and is called a sequencer. Figure 7.11 provides the logic diagram of
1000000010001000 a simplified eight-step sequencer.
The outputs of three JK flip-flops arranged as a 3-bit
Figure 7.9 shows the timing of the execution phase of this
binary up-counter counting 000, 001, 010, . . . ,111, are con-
instruction. We have included only five of the 16 possible
nected to eight three-input AND gates to generate timing sig-
control signals because all the other 12 signals remain inac-
nals T0 to T7. Figure 7.12 illustrates the timing pulses created
tive during these two micro-operations.
by this circuit. Note that the timing decoder is similar to the
instruction decoder of Fig. 7.11. As not all macroinstructions
7.3.2 From op-code to operation require the same number of microinstructions to interpret
them, the sequencer of Fig. 7.11 has a reset input that can be
In order to execute an instruction we have to do two things. used to reset the sequencer by returning it to state T0.
The first is to convert the 3-bit op-code into one of eight The sequencer of Fig. 7.11 is illustrative rather than practi-
possible sequences of action and the second is to cause these cal, because, as it stands, the circuit may generate spurious
actions to take place. timing pulses at the timing pulse outputs due to the use of an
Figure 7.10 shows how the instructions are decoded and is asynchronous counter. All outputs of an asynchronous
similar in operation to the 3-line to 8-line decoder described counter don’t change state at the same instant and therefore
in Chapter 2. For each of the eight possible three-bit the bit pattern at its output may pass through several states (if
op-codes, one and only one of the eight outputs is placed in only for a few nanoseconds) before it settles down to its final
an active-high condition. For example, if the op-code corres- value. Unfortunately, these transient states or glitches may last
ponding to ADD (i.e. 010) is loaded into the instruction long enough to create spurious timing signals, which, in turn,
register during a fetch phase, the ADD line 2 from the AND may trigger undesired activity within the control unit. A solu-
gate array is asserted high while all other AND gate outputs tion to these problems is to disable the output of the timing
remain low. pulse generator until the counter has settled down (or to use
It’s no good simply detecting and decoding a particular a synchronous counter).
instruction. The control unit has to carry out the sequence of The next step in designing the control unit is to combine
microinstructions that will execute the instruction. To do this the signals from the instruction decoder with the timing sig-
nals from the sequencer to generate the actual control signals.
Figure 7.13 shows one possible approach.
Fetch phase Execute Fetch phase There are nine vertical lines in the decoder
Micro-operation 2
of Fig. 7.13 (only three are shown). One
Micro-operation 1 Read memory, vertical line corresponds to the fetch phase
Send IR to MAR latch into D0 and each of the other eight lines is assigned to
one of the eight instructions. At any instant
Clock
one of the vertical lines from the instruction
Put [IR] on bus decoder (or fetch) is in a logical one state,
EIR
enabling the column of two-input AND gates
Enable IR to bus to which it is connected. The other inputs to
Capture [IR] in MAR the column of AND gates are the timing
CMAR pulses from the sequencer.
Clock data into MAR As the timing signals, T0 to T7, are gener-
Read operand ated, the outputs of the AND gates enabled by
R the current instruction synthesize the control
Read memory signals required to implement the random
Put operand on bus
logic control unit. The output of each AND
EMSR gate corresponding to a particular microin-
Enable memory to bus struction (e.g. CMAR) triggers the actual
Capture data in D0
microinstruction (i.e. micro-operation). As
CD0 we pointed out earlier, not all macroinstruc-
Clock data into D0 tions require eight clock cycles to execute
them.
Figure 7.9 Timing of the execute phase of a LOAD N instruction.
7.3 The random logic control unit 313
3 op-code bits
111 BEQ N
Figure 7.10 The instruction decoder.
T0
T1
Eight timing pulses
(Only 4 outputs
T2 shown for simplicity)
T0
T1
T2
T3
T4
T5
T6
Op-code Op-code
EXECUTE
FETCH
R=1, EMSR =1, CIR R=1, EMSR =1, CD0 ED0=1, W=1
T1 T1 T1
Other
reset EXECUTE
inputs R Q
Fetch–execute
flip-flop
FETCH
S Q
Op-code
(from IR)
Select execute
phase
T0 T0 T0
T1 T1 T1
Select fetch
Select fetch phase
T2 phase
In the next section we look at how the design of a control data at the register’s D input terminals is transferred to its
unit can be simplified by putting the sequence of micro- output terminals and held constant until the register is
operations in a table and then reading them from a table, clocked again. The connections between the registers are by
rather than by synthesizing them in hard logic. means of m-bit wide data highways, which are drawn as a sin-
gle bold line. The output from each register can be gated onto
the bus by enabling the appropriate tri-state buffer. We have
used a multiplexer, labeled MPLX, to select the program
7.4 Microprogrammed counter’s input from either the incrementer or the operand
control units field of the instruction register. The multiplexer is controlled
by the 1-bit signal Mux, where Mux 0 selects the incre-
Before we describe the microprogrammed control unit, let’s menter path, and Mux 1 selects the branch target address
remind ourselves of the macro-level instruction, micro-level from the address/operand field of the instruction register,
instruction, and interpretation. The natural or native lan- IRaddress.
guage of a computer is its machine code whose mnemonic Suppose our computer performs a fetch–execute cycle in
representation is called assembly language. Machine-level which the op-code is ADD N,D0. This instruction adds the
instructions are also called macroinstructions. Each macroin- contents of the memory location specified by the operand
struction is executed by means of a number of primitive field N to the contents of the data register (i.e. D0) and
actions called microinstructions. The process whereby a deposits the result in D0. We can write down the sequence of
macroinstruction is executed by carrying out a series of operations that take place during the execution of ADD not
microinstructions is called interpretation. only in terms of register transfer language, but also in terms
Let’s begin with another simple computer. Consider of the enabling of gates and the clocking of flip-flops.
Fig. 7.16. The internal structure of this primitive CPU differs Table 7.10 illustrates the sequence of microinstructions exe-
slightly from that of Fig. 7.8 because there’s more than one cuted during the fetch–execute cycle of an ADD instruction. It
bus. The CPU in Fig. 7.16 includes the mechanisms by which should be emphasized that the fetch phase of all instructions
information is moved within the CPU. Each of the registers is identical and it is only the execute phase that varies
(program counter, MAR, data register, etc.) is made up of according to the nature of the op-code read during the fetch
D flip-flops. When the clock input to a register is pulsed, the phase.
316 Chapter 7 Structure of the CPU
Fetch T0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 X X 0 0
T1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 X X 0 0
T2 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0
T3 0 0 0 0 0 0 0 1 0 0 0 1 0 0 X X 0 0
T4 0 0 0 1 0 0 0 0 1 0 0 0 0 0 X X 0 1
LOAD T0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 X X 1 0
STORE T0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 X X 1 0
ADD T0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 X X 0 0
T1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
T2 0 0 0 0 0 0 0 1 0 0 0 0 1 0 X X 1 0
SUB T0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 X X 0 0
T1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0
T2 0 0 0 0 0 0 0 1 0 0 0 0 1 0 X X 1 0
INC T0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 X X 0 0
T1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
T2 0 1 0 0 0 0 0 1 0 0 0 0 0 0 X X 1 0
DEC T0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 X X 0 0
T1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0
T2 0 1 0 0 0 0 0 1 0 0 0 0 0 0 X X 1 0
BRA T0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 X X 1 0
BEQ T0 0 0 0 1 0 0 0 0 0 0 0 Z 0 0 X X 1 0
If, for each of the seven steps in Table 7.11, the 17 signals
7.4.1 The microprogram are fed to the various parts of the CPU in Fig. 7.16, then the
Imagine that the output of the control unit in Fig. 7.16 con- fetch–execute cycle will be carried out. Real microprogram-
sists of 10 signals that enable gates G1 to G10, the PC input med computers might use 64 to 200 control signals rather
multiplexer, two signals that control the memory, and five than the 17 in this example. One of the most significant dif-
clock signals that pulse the clock inputs of the PC, MAR, ferences between a microinstruction and a macroinstruction
MBR, IR, and D0 registers. Table 7.11 presents the 17 outputs is that the former contains many fields and may provide
of the control unit as a sequence of binary values that are gen- several operands, while the macroinstruction frequently
erated during the fetch and execute phases of an ADD instruc- specifies only an op-code and one or two operands.
tion. We have not included the ALU function signals in this The seven steps in Table 7.11 represent a micro-
table. program that interprets a fetch phase followed by an ADD
When the memory is accessed by E 1, a memory read or instruction.
–
write cycle may take place. The R/W (i.e. read/write) signal We have demonstrated that a macroinstruction is inter-
determines the nature of the memory access when E 1. preted by executing a microprogram, which comprises a
– –
When R/W 0 the cycle is a write cycle, and when R/W 1 sequence of microinstructions. Each of the CPU’s instruc-
the cycle is a read cycle. tions has its own microprogram. We now look at the
7.4 Microprogrammed control units 317
Clock
PC Q
PC D Clock
G1 MAR D
MAR Q
MPLX
E Address
Mux Memory
R/W Data
Incrementer
G3
in out
Branch G2 Bus A
path G4
Clock
PC Q G5
Q IR D
G10 Clock
MBR G6 Bus B
MBR Q
D
Literal
Control path
units G7
Clock
D0 Q
D0 D
Control signals
A
C ALU
B
G8
Bus C
F1 F2 F3
Copy bus A
to bus C
G9
Figure 7.16 Controlling the flow of
Copy bus A information in a computer.
to IR
Note 1 Where there is no entry in the column labeled ‘Operations required’, that operation happens automatically. For example, the output
of the program counter is always connected to the input of the incrementer and therefore no explicit operation is needed to move
the contents of the PC to the incrementer.
Note 2 Any three-state gate not explicitly mentioned is not enabled.
Note 3 Steps 1, 1a are carried out simultaneously, as are 4, 4a and 7, 7a, 7b.
Table 7.10 Interpreting a fetch–execute cycle for an ADD N,D0 instruction in terms of RTL.
318 Chapter 7 Structure of the CPU
Step Gate control signals and MPLX control Memory Register clocks
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
3 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 1 0 0
4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
5 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
6 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 1 0 0
7 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0
Table 7.11 Control signals generated during the fetch and execution phases of an ADD instruction.
Address
Microprogram memory
Data Microinstruction register
Branch on zero
Branch condition Branch on not zero These signals
Multiplexer are from the ALU
Branch never (logical zero) in the CPU Figure 7.17 The micro-
Branch always (logical one) programmed control unit.
microprogram itself and consider the hardware required to address from the microprogram counter is applied to the
execute it. The microprogram is executed by the same type of address input of the microprogram memory and the data
mechanism used to execute the macroprogram (i.e., machine output of the memory fed to the microinstruction register.
code) itself. This is a good example of the common expres- As we’ve said, the structure of the control unit that executes
sion wheels within wheels. the macroinstruction is very much like the structure of the
Figure 7.17 describes the basic structure of a micropro- CPU itself. However, there is one very big difference between
grammed control unit that has a microprogram counter, a the macroinstruction world and the microinstruction
microprogram memory, and a microinstruction register (this world—the microinstruction register is very much longer
structure is typical of the 1980s). The microinstruction than the macroinstruction register and the microinstruction’s
7.4 Microprogrammed control units 319
structure is much more complex that of the macro- the contents of the next microinstruction address field of the
instruction. current microinstruction field into the microprogram
Information in the microinstruction register is divided counter, a branch can be made to any point in the micropro-
into four fields: next microinstruction address field, micro- gram memory. In other words each microinstruction deter-
program counter load control field, condition select field, mines whether the next microinstruction is taken in
and CPU control field. Most of the bits in the microinstruc- sequence or whether it is taken from the next address field of
tion register belong to the CPU control field, which controls the current microinstruction. The obvious question to ask is,
the flow of information within the CPU by enabling tri-state ‘What determines whether the microprogram counter con-
gates and clocking registers as we’ve described; for example, tinues in sequence or is loaded from the next microinstruc-
all the control signals in Table 7.11 belong to this field. Our tion address field of the current microinstruction?’
next task is to describe one of the principal differences The microprogram load control field in the microinstruc-
between the micro- and macroinstruction. Each microin- tion register tells the microprogram counter how to get the
struction is also a conditional branch instruction that deter- next microinstruction address. This next address can come
mines the location of the next microinstruction to be from the incrementer and cause the microprogram to con-
executed. We will now explain how microinstructions are tinue in sequence. The next address can also be obtained
sequenced. from the address mapper (see below) or from the address in
the next microinstruction address field of the microinstruc-
tion register.
7.4.2 Microinstruction sequence The condition select field in the microinstruction register
implements conditional branches at the macroinstruction
control level by executing a conditional branch at the microinstruc-
If the microprogram counter were to step through the micro- tion level. In the simplified arrangement of Fig. 7.17, the con-
program memory in the natural sequence, 0, 1, 2, 3, . . . etc., a dition select field directly controls a 4-to-1 multiplexer that
stream of consecutive microinstructions would appear in the selects one of four flag bits representing the state of the CPU.
microinstruction register, causing the CPU to behave in the These flag bits are obtained from the ALU and are usually the
way described by Table 7.11. The CPU control bits of each flag bits in the condition code register (e.g. Z, N, C, V). The
microinstruction determine the flow of information within condition select field selects one of these flag bits for testing
the CPU. However, just as in the case of the macroprogram (in this example only the Z-bit is used). If the output of the
control unit, it is often necessary to modify the sequence in multiplexer is true, a microprogram jump is made to the
which microinstructions are executed. For example, we address specified by the contents of the next microinstruc-
might wish to repeat a group of microinstructions n times, or tion address field, otherwise the microprogram continues
we may wish to jump from a fetch phase to an execute phase, sequentially. In Fig. 7.17 two of the conditions are obtained
or we may wish to call a (microinstruction) procedure. from the CCR and two bits are permanently true and false. A
Microinstruction sequence control is determined by the false condition implies branch never (i.e. continue) and a true
three left-hand fields of the microinstruction register in condition implies branch always (i.e. goto).
Fig. 7.17, enabling the microprogram counter to implement To emphasize what we’ve just said, consider the hypotheti-
both conditional and unconditional branches to locations cal microinstruction of Fig. 7.18. This microinstruction is
within the microprogram memory. We shall soon see that interpreted as:
this activity is necessary to execute macroinstructions such as
IF Z = 1 THEN [PC] ← ADD3 ELSE [PC] ← [PC] + 1
BRA, BCC, BCS, BEQ, etc.
In normal operation, the microprogram counter steps where PC indicates the microprogram counter.
through microinstructions sequentially and the next micro- A conditional branch at the macroinstruction level (e.g.
program address is the current address plus one. By loading BEQ) is interpreted by microinstructions in the following
This field selects the This field provide the CPU control
Location of the This field determines where condition to be tested signals that select source and
next microinstruction the next microinstruction when making a destination registers, control buses,
to execute if a branch address comes from conditional branch and determine the ALU function
is taken
Next address field Load control Conditional select CPU control fields
way. The condition select field of the microinstruction selects some additional logic and a microprogram in ROM, a CPU
the appropriate status bit of the CCR to be tested. For exam- with a user-defined instruction set and wordlength may be
ple, if the macroinstruction is BEQ the Z-bit is selected. The created. Of course, the designer doesn’t have to construct a
microprogram counter load control field contains the new CPU out of bit-slice components. You can emulate an
operation ‘branch to the address in the microinstruction reg- existing microprocessor or even add machine-level instruc-
ister on selected condition true’. Thus, if the selected condi- tions to enhance it.
tion is true (i.e. Z 1), a jump is made to a point in the Figure 7.19 describes a typical bit-slice arithmetic logic
microprogram that implements the corresponding jump in unit that can generate one of eight functions of two inputs R
the macroprogram. If the selected condition is false (i.e. and S. These functions vary from R plus S to the exclusive
Z 0), the current sequence of microinstructions is termi- NOR of R and S. The values of R and S may be selected from
nated by the start of a new fetch–execute cycle. a register file of 16 general-purpose data registers, an external
input, a Q register, or zero.
The bit-slice ALU is controlled (i.e. programmed) by a
Implementing the fetch–execute cycle 9-bit input, which selects the source of the data taking part in
The first part of each microprogram executed by the control an arithmetic or logical operation, determines the particular
unit corresponds to a macroinstruction fetch phase that ends operation to be executed, and controls the destination
with the macroinstruction op-code being deposited in the (together with any shifting) of the result. Typical ALU
instruction register. The op-code from the instruction regis- operations are
ter is first fed to the address mapper, which is a look-up table
containing the starting address of the microprogram for [R7] ← [R7] + [R1]
each of the possible op-codes. That is, the address mapper [R6] ← [R6] - [R5]
translates the arbitrary bit pattern of the op-code into the [R9] ← [R9]⋅[R2]
location of the corresponding microprogram that will exe- [R7] ← [R7] + 1
cute the op-code. After this microprogram has been executed,
an unconditional jump is made to the start of the micropro- An arithmetic unit of any length (as long as it is a multiple
gram that interprets the macroinstruction execute phase, and of 4) is constructed by connecting together bit-slice ALUs.
the process continues. Designers can use the ALU’s internal registers in any way they
desire. For example, they may choose to implement eight
addressable data registers, two stack pointers (described
7.4.3 User-microprogrammed later), two index registers, a program counter, and three
scratchpad registers. Flexibility is the most powerful feature
processors of bit-slice microprocessors.
Before the advent of today’s powerful microprocessors, engi- This description of the microprogrammed control unit is
neers in the 1980s requiring high performance sometimes highly simplified. In practice the microprogram might
constructed their own microprogrammed computers; that is, include facilities for dealing with interrupts, the memory
the engineer designed a CPU to their own specifications. This system, input/output, and so on.
was fun because you could create your own architecture and One of the advantages of a microprogrammed control unit
instruction set. On the other hand, you ended up with a com- is that it is possible to alter the content of the microprogram
puter without an off-the-shelf operating system, compilers, memory (sometimes called the control store) and hence
or any of the other tools you take for granted when you use a design your own machine-level instructions. In fact it is per-
mainstream CPU. fectly possible to choose a set of microprograms that will exe-
At the heart of many of these systems was the bit-slice com- cute the machine code of an entirely different computer. In
ponent, which provided a middle path between microcom- this case the computer is said to emulate another computer.
puter and mainframe. Bit-slice components, as their name Such a facility is useful if you are changing your old computer
suggests, are really subsections of a microprocessor that can to a new one whose own machine code is incompatible with
be put together to create a custom CPU. For example, a 64-bit your old programs. Emulation applies to programs that exist
computer is made by putting together eight 8-bit bit-slice in binary (object) form on tape or disk. By writing micropro-
chips. grams (on the new machine) to interpret the machine code of
Bit-slice components are divided into two types corre- the old machine, you can use the old software and still get the
sponding to the functional division within the microproces- advantages of the new machine.
sor (i.e. the microprogram control and ALU). By using One of the greatest problems in the design of a bit-slice
several ALU and microprogram controller bit-slices plus computer lies in the construction and testing of the
7.4 Microprogrammed control units 321
Q0 Q shifter Qn
Data input
4-bit register RAM
selects inputs 16 addressable registers F Q
from control A data B data
unit Q register
output output
Logical zero
0 A B D Q
ALU data source selector
R S Carry C
Carry_in 8-function ALU Zero Z
Sign N
Overflow V
microprogram. You can, or course, write a program to software. Even if a computer user has the expertise to design
emulate the bit-slice processor on another computer. A pop- new microprogrammed macroinstructions, it is unlikely that
ular method of developing a microprogram is to replace the the system software and compilers will be able to make use of
microprogram ROM with read/write memory and to access these new instructions. Finally, RISC technology (as we shall
this memory with a conventional microprocessor. That is, the see) does not use microprogramming and interest in micro-
microprogram memory is common to both the bit-slice programming is much less than it once was.
system and the microprocessor. In this way, the microproces- In the next chapter we look at how the performance of
sor can input a microprogram in mnemonic form, edit it, computers can be enhanced by three very different tech-
assemble it, and then pass control to the bit-slice system. The niques. We begin with a brief introduction to the RISC revo-
microprocessor may even monitor the operation of the lution of the 1970s and 1980s and show how processors with
bit-slice system. regular instruction sets lend themselves to pipelining (the
Such a microprogram memory is called a writable overlapping of instruction execution). We also look at cache
control store and once a writable control store was memory and explain how a small quantity of very-high-speed
regarded as a big selling point of microprogrammed random access memory can radically improve a computer’s
minicomputers and mainframes. However, we have already performance. Finally, we describe the multiprocessor—a
pointed out that a microprogrammable control store is system that uses more than one processing unit to accelerate
of very little practical use due to the lack of applications performance.
322 Chapter 7 Structure of the CPU
■ SUMMARY ■ PROBLEMS
We have taken a step back from the complex CPU 7.1 Within a CPU, what is the difference between an address
architecture we described in the previous chapters and path and a data path?
have looked at how a simple processor can read an instruction
from memory, decode it, and execute it. We did this by 7.2 In the context of a machine-level instruction, what is an
considering the sequence of events that takes place when operand?
an instruction is executed and the flow of information
within the computer. 7.3 What is a literal operand?
In principle, the computer is a remarkably simple device. The
program counter contains the address of the next instruction to 7.4 How does a computer ‘know’ whether an operand in its
be executed. The computer reads the instruction from memory instruction register is a literal or a reference to memory (i.e. an
and decodes it. We have demonstrated that a typical instruction address)?
requires a second access to memory to fetch the data used by
the instruction. 7.5 Why is the program counter a pointer and not a
We have demonstrated how a simple computer that counter?
can execute only instructions that load and store data
7.6 Explain the function of the following registers in a CPU:
or perform arithmetic operations can implement the
conditional behavior required for loop and if . . . then . . . (a) PC
else constructs. (b) MAR
The second part of this chapter looked at two ways of (c) MBR
implementing a computer’s control unit. We started with a (d) IR
simple computer structure and demonstrated the control signals 7.7 What is the CCR?
required to implement several machine-level instructions. Then
we showed how you can use relatively simple logic and a timing 7.8 Does a computer need data registers?
sequencer to generate the signals required to interpret an
instruction. 7.9 Some microprocessors have one general-purpose data
Random logic control units are faster than their micro- register, some two, some eight, and so on. What do you think
programmed counterparts. This must always be so because the determines the number of such general-purpose data registers
random logic control unit is optimized for its particular in any given computer?
application. Moreover, a microprogrammed control unit is
slowed by the need to read a microinstruction from the 7.10 What is the significance of the fetch–execute cycle?
microprogram memory. Memory accesses are generally
slower than basic Boolean operations. 7.11 What is the so-called von Neumann bottleneck?
Microprogramming offers a flexible design. As the micro-
program lives in read-only memory, it can easily be modified at 7.12 Design a computer (at the register and bus level) to
either the design or the production stage. A random logic implement a zero address instruction set architecture.
control unit is strictly special purpose and cannot readily be
modified to incorporate new features in the processor 7.13 In the context of CPU design, what is a random logic
(e.g. additional machine-level instructions), and sometimes it control unit? What is the meaning of the word random in this
is difficult to remove design errors without considerable expression?
modification of the hardware.
The highpoint of microprogramming was the early 1970s 7.14 What is a microprogrammed control unit?
when main memory had an access time of 1–2 s and the
control store used to hold microprograms had an access 7.15 Microprogramming has now fallen into disfavor. Why do
time of 50–100 ns. It was then sensible to design complex you think this is so?
machine level instructions that were executed very rapidly as
7.16 For the computer structure of Fig. 7.20, state the
microcode. Today, things have changed and memories with
sequence of micro-operations necessary to carry out the
access times of below 50 ns are the norm rather than the
following instruction. Assume that the current instruction is in
exception. Faster memory makes microprogramming less
the IR.
attractive because hard-wired random logic control units
execute instructions much more rapidly than microcoded ADDsquare D0, D1
control units. Today’s generation of RISC (reduced instruction
set computers) and post-RISC architectures are not This instruction reads the contents of register D0, squares that
microprogrammed. value, and then adds it to the contents of register D1. The result
7.4 Microprogrammed control units 323
CMAR
MAR This computer has
two internal buses
GMAR A and B.
CMBR
All registers capture
MBR EMBR data from a bus when
they are clocked.
CIR GIR
All tri-state gates can
IR EIR put data on a bus when
they are enabled.
GPC The function of the ALU
CPC
is controlled by inputs
PC EPC F2, F1, F0.
GD0 This memory is controlled
CD0 by a read signal and a
D0 ED0 write signal.
GD1 The ALU performs an
CD1 operation on one or both
D1 ED1 its P and Q inputs
depending on the state
CB_to_A of its control inputs.
Data iputs for the ALU
CL1
EB_to_A come from two registers,
latch1 and latch 2.
ALU
CALU P Latch 1
f(P, Q)
EALU Function Q Latch 2
select
CL2
F2, F1, F0 Figure 7.20 A microprogrammed CPU.
is put in register D1. The function codes F2,F1, and F0 are given Assume that only one operand, A, is required by the instruction
below. and that operands B and C are in the next consecutive two
memory locations, respectively.
7.18 For the architecture of the hypothetical two-bus
F2 F1 F0 Operation
computer of Fig. 7.20, derive a microprogram to carry out the
operation
0 0 0 Copy P to F F=P
0 0 1 Add P to Q F=P+Q MOVE D0,[D1]
0 1 0 Subtract Q from P F=P–Q This operation copies the contents of register D0 into the
0 1 1 Add 1 to P F=P+1 memory location whose address is given by the contents of
1 0 0 Add 1 to Q F=Q+1 register D1.
1 0 1 Multiply P by Q F=Qx1 You should describe the actions that occur in plain English
(e.g. ‘Put data from this register on the B bus’) and as a sequence
of events (e.g. Read 1, EMSR). The table in Question 16 defines
7.17 For the structure of Fig. 7.20 write a microprogram to the effect of the ALU’s function code. All data has to pass
implement the operation through the ALU to get from bus B to bus A.
Note that the ALU has two input latches. Data has to be
D1=[A]+[B]+[C]+1 loaded into these latches before an ALU operation takes place.
324 Chapter 7 Structure of the CPU
7.19 For the computer of Fig. 7.20, what is the effect of the Your answer should explain what each of the micro-
following sequence of micro-operations? operations does individually. You should also state what
these actions achieve collectively; that is, what is the effect
ED0 = 1, CL1 of the equivalent assembly language operation?
ED0 = 1, CL2
F2,F1,F0 = 0,0,1, EALU, CMBR
EMBR = 1, CL1
ED0 = 1, CL2
F2,F1,F0 = 0,0,1, EALU, CD1
Accelerating performance 8
CHAPTER MAP
INTRODUCTION
We want faster computers. In this chapter we examine three very different ways in which we can
take the conventional von Neumann machine described in the last chapter and increase its
performance with little or no change in the underlying architecture or its implementation.1
The development of the computer comprises three threads: computer architecture, computer
organization, and peripheral technology. Advances in each of these threads have contributed to
increasing the processing power of computers over the years. The least progress has been made in
computer architecture and the programming model of a modern microprocessor would probably
not seem too strange to someone who worked with computers in the 1950s. They, would, however,
be astonished by developments in internal organization such as pipelining and instruction-level
parallelism. Similarly, someone from the 1940s would be utterly amazed by the development of
peripherals such as disk and optical storage. In 1940 people were struggling to store hundreds or
thousands of bits, whereas some home computers now have storage capacities of about 241 bits.
We look at the way in which three particular techniques have been applied to computer design
to improve throughput. We begin with pipelining, a technique that increases performance by
overlapping the execution of instructions. Pipelining is the electronic equivalent of Henry Ford’s
production line where multiple units work on a stream of instructions as they flow through a
processor. We then look at the way in which the apparent speed of memory has been improved
by cache memory, which keeps a copy of frequently used data in a small, fast memory. Finally,
we provide a short introduction to multiprocessing where a problem can be subdivided into
several parts and run on an array of computers.
Before discussing how we speed up computers, we need to introduce the notion of computer
performance. We need to be able to measure how fast a computer is if we are to quantify the
effect of enhancements.
1
Although we introduce some of the factors that have made computers
so much faster, we can’t cover the advances in semiconductor physics and
manufacturing technology that have increased the speed of processors,
improved the density of electronic devices, and reduced the power con-
sumption per transistor. These topics belong to the realm of electronic
engineering.
326 Chapter 8 Accelerating performance
RISC—REDUCED OR REGULAR?
What does the R in RISC stand for? The accepted definition of However, as time passed, RISC instruction sets grew in
RISC is reduced instruction set computer. First-generation complexity; by the time the PowerPC was introduced, it had
experimental RISC processors were much simpler devices than more variations on the branching instruction than some CISCs
existing CISC processors like the Intel 8086 family or the had instructions. However, RISC processors are still character-
Motorola 68K family. These RISCs had very simple instruction ized by the regularity of their instruction sets; there are very
set architectures with limited addressing modes and no few variations in the format of instructions.
complex special-purpose instructions.
328 Chapter 8 Accelerating performance
data from A to B, it is not easy to implement. The source 5 ns. Most of the advantages of microprogramming have
operand is in the memory location given by evaporated. The goal of RISC architectures is to execute an
12 [A2] [D0]. The processor has to extract the constant instruction in a single machine cycle. A corollary of this state-
12 and the register identifiers A2 and D0 from the op-code. ment is that complex instructions cannot be executed by pure
Two registers have to be read and their values added to the lit- RISC architectures. Before we look at RISC architectures
eral 12 to get the address used to access memory (i.e. there is themselves, we provide an overview of the research that led to
a memory access cycle to get the source operand). The value the hunt for better architectures.
at this location is stored at the destination address pointed at
by address register A1. Getting the destination address
requires more instruction decoding and the reading of regis- 8.2.1 Instruction usage
ter A1. Finally, the destination operand uses autoincrement- Computer scientists carried out extensive research over a
ing, so the contents of register A1 have to be incremented by decade or more in the late 1970s into the way in which com-
2 and restored to A1. All this requires a large amount of work. puters execute programs. Their studies demonstrated that
A reaction against the trend toward greater complexity the relative frequency with which different classes of instruc-
began at IBM with their 801 architecture and continued at tions are executed is not uniform and that some types of
Berkeley where David Patterson and Divid Ditzel coined the instruction are executed far more frequently than others.
term RISC to describe a new class of architectures that Fairclough divided machine-level instructions into eight groups
reversed earlier trends in microcomputer design. RISC archi- according to type and compiled the statistics described by
tectures redeploy to better effect some of the silicon real Table 8.1. The mean value represents the results averaged over
estate used to implement complex instructions and elaborate both program types and computer architecture.
addressing modes in conventional microprocessors of the The eight instruction groups are
68K and 8086 generation.
● data movement
Those who designed first-generation 8-bit architectures in
● program modification (i.e. branch, call, return)
the 1970s were striving to put a computer on a chip, rather
than to design an optimum computing engine. The designers ● arithmetic
of 16-bit machines added sophisticated addressing modes ● compare
and new instructions and provided more general-purpose
● logical
registers. The designers of RISC architectures have taken the
● shift
design process back to fundamentals by studying what many
computers actually do and by starting from a blank sheet (as ● bit manipulation
opposed to modifying an existing chip á la Intel). ● input/output and miscellaneous.
Two factors that influenced the architecture of first- and
second-generation microprocessors were microprogram- This data demonstrates that the most common instruction
ming and the complex instruction sets created to help pro- type is the data movement primitive of the form P Q in a
grammers. By complex instructions we mean operations like high-level language or MOVE Q,P in a low-level language.
MOVE 12(A3,D0),D2 and ADD (A6)+,D3. The program modification group which includes conditional
Microprogramming achieved its highpoint in the 1970s and unconditional branches together with subroutine calls
when ferrite core memory had a long access time of 1 s or and returns, is the second most common group of instruc-
more and semiconductor high-speed random access mem- tions. The data movement and program modification groups
ory was very expensive. Quite naturally, computer designers account for 74% of all instructions. A large program may
used the slow main store to hold the complex instructions contain only 26% of instructions that are not data movement
that made up the machine-level program. These machine- or program modification primitives. These results apply to
level instructions are interpreted by microcode in the much measurements taken in the 1970s and those measurements
faster microprogram control store within the CPU. Today, were the driving force behind computer architecture devel-
main stores use semiconductor memory with an access time opment; more modern results demonstrate similar trends as
of 40 ns or less and cache memory with access times below Table 8.2 shows.
Instruction group 1 2 3 4 5 6 7 8
Mean value 45.28 28.73 10.75 5.92 3.91 2.93 2.05 0.44
We now look at two of the fundamental aspects of the The Berkeley RISC and several other RISC processors
RISC architecture—its register set and pipelining. Multiple hardwire register R0 to zero. Although this loses a register
overlapping register windows have been implemented to because you can’t change the contents of R0, it gains a con-
reduce the need to transfer parameters between subroutines. stant. By specifying register R0 in an instruction, you force
Pipelining is a mechanism that permits the overlapping of the value zero; for example, ADD R1, R1, R2 implements MOVE
instruction execution (i.e. internal operations are carried out R1,R2.
in parallel). Note that many of the features of RISC pro- The experimental Berkeley led to the development of
cessors are not new. They have been employed long before the the commercial SPARC processor (Scalable Processor
advent of the microprocessor. The RISC revolution happened ARChitecture) by Sun Microsystems. SPARC is an open
when all these performance-enhancing techniques were architecture and is also manufactured by Fujitsu. Similarly, a
brought together and applied to microprocessor design. RISC project at Stanford led to the design of another classic
RISC machine, the MIPS. Figure 8.2 illustrates the format of
The Berkeley RISC, SPARC, and MIPS the MIPS instruction, which has three basic formats, a regis-
Although the CISC processors came from the large semicon- ter-to-register format for all data processing instructions, an
ductor manufacturers, one of the first RISC processors came immediate format for either data processing instructions
from the University of California at Berkeley.6 The Berkeley with a literal or load/store instructions with an offset, and a
RISC was not a commercial machine, but it had a tremen- branch/jump instruction with a 26-bit literal that is con-
dous impact on the development of other RISC architectures. catenated with the six most-significant bits of the program
Figure 8.1 describes the format of a Berkeley RISC instruc- counter to create a 32-bit address.
tion. Each of the 5-bit operand fields permits one of 32 inter-
nal registers to be accessed. Register windows
The Scc field determines whether the condition code bits An important feature of the Berkeley RISC architecture is the
are updated after the execution of an instruction; if Scc 1, way in which it allocates new registers to subroutines; that is,
the condition code bits are updated after an instruction. The when you call a subroutine, you get some new registers.
source 2 field uses an IM (immediate mode) bit to select one Suppose you could create 12 registers out of thin air each
of two functions. When IM 0, bits 5 to 12 are zeros and bits time you call a subroutine. Each subroutine would have its
0 to 4 provide the second source operand register. When own workspace for temporary variables, thereby avoiding
IM 1, the second source operand is a literal and bits 0 to 12 relatively slow accesses to main store. Although only 12 or so
provide a 13-bit constant (i.e. immediate value). registers are required by each invocation of a subroutine, the
Because five bits are allocated to each operand field, it fol- successive nesting of subroutines rapidly increases the total
lows that this RISC has 25 32 internal registers. This last number of on-chip registers assigned to subroutines. You
statement is emphatically not true, because the Berkeley might think that any attempt to dedicate a set of registers to
RISC has 138 user-accessible general-purpose internal regis- each new procedure is impractical, because the repeated call-
ters. The reason for the discrepancy between the number of ing of nested subroutines will require an unlimited amount
registers directly addressable and the actual number of regis- of storage.
ters is due to a mechanism called windowing, which gives the Although subroutines can be nested to any depth, research
programmer a view of only a subset of all registers at any has demonstrated that on average subroutines are not nested
instant. to any great depth over short periods. Consequently, it is
32 bits
31 25 24 23 19 18 14 13 12 5 4 0
0 00000000 s4 s3 s2 s1 s0
Op-code Scc Destination Source 1
1 i12 i11 i10 i9 i8 i7 i6 i5 i4 i3 i2 i1 i0
IM Source 2
Figure 8.1 Format of the Berkeley
7 bits 1 bit 5 bits 5 bits 9 bits 5 bits RISC instruction.
6
It would be unfair to imply that RISC technology came entirely from
academia. As early at 1974 John Cocke was working on RISC-like archi-
tectures at IBM’s Thomas J. Watson Research Center. The project was
called ‘801’ after then number of the building in which the researchers
worked. Cocke’s work led to IBM’s RISC System/6000 and the PowerPC.
8.2 The RISC revolution 331
32 bits
31 26 25 21 20 16 15 11 10 6 5 0
R-type instruction
(register to register) Op-code Source S Source T Destination Shift amount Function
I-type instruction
(register with Op-code Source S Source T Immediate value
immediate operand)
J-type instruction
(jump to target) Op-code Target
Call
Return
Depth of nesting
Time
feasible to adopt a modest number of local register sets for a within the program. Most conventional microprocessors
sequence of nested subroutines. have only global registers.
Figure 8.3 provides a graphical representation of the exe- Local space is private to the subroutine. That is, no other sub-
cution of a program in terms of the depth of nesting of sub- routine can access the current subroutine’s local address
routines as a function of time. The trace goes up each time a space from outside the subroutine. Local space
subroutine is called and down each time a return is made. is employed as temporary working space by the current
Even though subroutines may be nested to considerable subroutine.
depths, there are long runs of subroutine call that do not
Imported parameter space holds the parameters imported by
require a nesting level of greater than about five.
the current subroutine from its parent. In RISC terminology
An ingenious mechanism for implementing local variable
these are called the high registers.
work space for subroutines was adopted by the Berkeley
RISC. Up to eight nested subroutines could be handled using Exported parameter space holds the parameters exported by
on-chip work space for each subroutine. Any further nesting the current subroutine to its child. In RISC terminology these
forces the CPU to dump registers to main memory. Before we are called the low registers.
demonstrate the Berkeley RISC’s windowing mechanism, we
Consider the following fragment of C code. Don’t worry if
describe how the memory used by subroutines can be divided
you aren’t a C programmer—the fine details don’t matter.
into four types.
What we are going to do is to demonstrate the way in which
Global space is directly accessible by all subroutines that hold memory is allocated to parameters. The main program
constants and data that may be required from any point creates three variables x, y, and z. Copies of x and y are passed
332 Chapter 8 Accelerating performance
to the function (i.e. subroutine) calc. The result is returned Windows and parameter passing
to the main program and assigned to z. Figure 8.4 illustrates a One reason for the high frequency of data movement opera-
possible memory structure for the program. tions is the need to pass parameters to subroutines and to
Parameters x, y, and z are local to function main, and receive them from subroutines. The Berkeley RISC architec-
copies of x and y are sent to function calc as imported para- ture improves parameter passing by means of multiple over-
meters. We will assume that copies of these parameters are lapped windows. A window is the set of registers visible to the
placed on the stack before calc is called. The value returned current subroutine. Figure 8.5 illustrates the structure of the
by function calc is an exported parameter, and sum and RISC’s overlapping windows.
diff are local variables in calc.
9 R9 R9 R9
R10i –1
R15i –1
R16i –1
Local registers R16 to R25
are unique to each window
Local and can't be accessed from
other windows
R25i –1 Window i–1
R26i –1 R10i
R31i –1 R15i
R16i
Local
Local
R25i+1
R26i+1
R31i+1
total number of registers required to implement the Berkeley the number of registers is finite. If there are more nested calls
windowed register set are than register sets, then old register sets have to be moved from
windowed registers to main store and later restored. When all
10 global 8 10 local
windowed registers are in use, a subroutine call results in regis-
8 6 parameter transfer registers 138 registers
ter overflow and the system software has to intervene. Register
Although windowed register sets are a good idea, there are sets also increase the size of a task’s environment. If the operat-
flaws and only one commercial processor implements win- ing system has to switch tasks or deal with an exception, it may
dowing, the SPARC. The major problem with windows is that be necessary to save a lot of program context to main store.
334 Chapter 8 Accelerating performance
The Berkeley RISC instruction set effect of most instructions is self-explanatory. As you can see,
Although the Berkeley RISC is not a commercial computer, instructions have a three-operand instruction format and the
we briefly look at its instruction set because it provides a tem- only memory operations are load and store. In the absence of
plate for later RISC-like processors such as the MIPS, the byte, word, and longword operations, this RISC includes
SPARC, and the ARM. The instruction set is given below. The several memory reference operations designed to access
8-, 16-, and 32-bit values (bytes, half words, and words, Operand store The result obtained during the
respectively). execution phase is written into the operand destination.
Load and store instructions use register indirect This may be an on-chip register or a location in external
addressing with a constant and a pointer; for example, LDXW memory.
(Rx)S2,Rd loads destination register Rd with the 32-bit
Each of these five phases may take a specific time (although
value at the address pointed at by register Rx plus offset S2.
the time taken is an integer multiple of the system’s master
The value of the second source operand S2 is either a register
clock period). Some instructions may require phases; for
or a literal. Because register R0 is always zero, we can write
example, the CMP R1,R2 instruction, which compares R1
LDXW (R0)S2,R3 to generate LDXW S2,R3.
and R2 by subtracting R1 from R2, does not need an operand
store phase.
The inefficiency in the arrangement of Fig. 8.6 is clear.
8.3 RISC architecture and pipelining Consider the execution phase of an instruction. This takes
one-fifth of an instruction cycle leaving the instruction exe-
Historically, the two key attributes of RISC architectures are cution unit idle for the remaining 80% of the time. The same
their uniform instruction sets and the use of pipelining to applies to the other functional units of the processor that
increase throughput by overlapping instruction execution. also lie idle for 80% of the time. A technique called pipelining
We now look at pipelining. can be employed to increase the effective speed of the
Figure 8.6 illustrates the machine cycle of a hypothetical processor by overlapping the various stages in the
microprocessor executing an ADD R1,R2,R3 instruction. execution of an instruction. For example, when a pipelined
Imagine that this instruction is executed in the following five processor is executing one instruction, it is fetching the next
phases. instruction.
Instruction fetch Read the instruction from the system The way in which a RISC processor implements pipelining
memory and increment the program counter. is described in Fig. 8.7. Consider the execution of two
instructions. At time i instruction 1 begins execution with its
Instruction decode Decode the instruction read from mem-
instruction fetch phase. At time i1 instruction 1 enters its
ory during the previous phase. The nature of the instruction
instruction decode phase and instruction 2 begins its instruc-
decode phase is dependent on the complexity of the instruc-
tion fetch phase. This arrangement makes sense because it
tion encoding. A regularly encoded instruction might be
ensures that the functional units in a computer are used more
decoded in a few nanoseconds with two levels of gating
efficiently.
whereas a complex instruction format might require ROM-
Figure 8.8 illustrates the execution of five instructions in a
based look-up tables to implement the decoding.
pipelined system. We use a four-stage pipeline for the rest of
Operand fetch The operand specified by the instruction is this section because RISC processors don’t need an instruc-
read from the system memory or an on-chip register and tion decode phase because their encoding is so simple. As you
loaded into the CPU. In this example, we have two operands. can see, the total execution time is eight cycles. After instruc-
Execute The operation specified by the instruction is tion 4 has entered the pipeline, the pipeline is said to be full
carried out. and all stages are active.
Pipelining considerably speeds up a processor. Suppose an
unpipelined processor has four stages and each operation
takes 10 ns. It takes 4 10 ns 40 ns to execute an instruc-
One instruction
tion. If pipelining is used and a new instruction enters the
pipeline every 10 ns, a completed instruction leaves the
Instruction Instruction Operand Execute Operand
fetch decode fetch Store pipeline every 10 ns. That’s a speed up of 400% without
IF ID OF E OS
improving the underlying semiconductor technology.
Consider the execution of n instructions in a processor
IF ADD R1,R2,R3 Read this instruction from memory
with an m-stage pipeline. It will take m clock cycles for the
ID ADD R1,R2,R3 Decode this instruction into “ADD” and registers 1, 2, 3 first instruction to be competed. This leaves n 1 instruc-
OF ADD R1,R2,R3 Read operands R2 and R3 from the register file tions to be executed at a rate of one instruction per cycle. The
E ADD R1,R2,R3 Add the two operands together total time to execute the n instructions is, therefore,
OS ADD R1,R2,R3 Store the result in R1 m (n 1) cycles.
If we do not use pipelining, it takes n ⋅ m cycles to
Figure 8.6 Instruction execution. execute n instructions, assuming that each instruction is
336 Chapter 8 Accelerating performance
time
Time i Time i +1 Time i +1 Time i +1 Time i +1 Time i +1
executed in m phases. The speedup due to pipelining is, running on computers with large pipelines do not demon-
therefore, strate a dramatic performance improvement; for example, a
n·m 12-stage pipeline with four-instruction blocks has a speedup
S
m (n 1) ratio of 3.2 rather than 12. We will return to the implications
Let’s put some numbers into this equation and see what of this table when we’ve introduced the notion of the pipeline
happens when we vary the values of n (the code size) and m hazard.
(the number of stages in the pipeline). Table 8.4 gives the
results for m 3, 6, and 12 with instruction blocks ranging
from 4 to 1000 instructions.
8.3.1 Pipeline hazards
Table 8.4 demonstrates that pipelining produces a speedup Table 8.4 demonstrates that pipelining can provide a substan-
that is the same as the number of stages when the number of tial performance acceleration as long at the block of instruc-
instructions in a block is large. Small blocks of instructions tions being executed is much longer than the number of
8.3 RISC architecture and pipelining 337
i IF OF OE OS
i+1 IF OF OE OS
These two instructions
Bubble are not executed
i+2 IF OF OE OS
Instruction i is
BRA N and the next
instruction should be N IF OF OE OS
fetched from address
N rather than i+1. Figure 8.9 The pipeline
N+1 IF OF OE OS bubble.
Fetch i – 1 OF OE OS
BRA N
Fetch i OF OE OS
Fetch i+2 OF OE OS
This instruction
is not executed
This instruction Fetch N OF OE OS
is at the branch
target address Figure 8.10 Delayed branch.
IF OF E OS
Instruction 1
ADD R1,R2,R3 Get R2,R3 Add R2,R3 R1= R2+R3 Operand R5 is saved only
at thispoint
IF OF E OS
Instruction 2
ADD R5,R2,R4 Get R2,R4 Add R2,R4 R5= R2+R4
IF OF E OS
Bubble
Instruction 3
SUB R6,R7,R5 Get R7,R5 Sub R7–R5 R6=R7–R5
IF OF E OS
Instruction 4 AND R2,R2,R3 Get R3,R4 AND R3,R4 R2= R3,R4
Instruction 3 has to wait for the
previous instruction to save R5
the eyes of a conventional assembly language programmer, Figure 8.10 is a computed branch whose target address is cal-
who is not accustomed to seeing an instruction executed after culated during the execute phase of the instruction cycle.
a branch has been taken.
Unfortunately, it’s not always possible to arrange a pro-
gram in such a way as to include a useful instruction immedi- 8.3.2 Data dependency
ately after a branch. Whenever this happens, the compiler Another problem caused by pipelining is data dependency in
must introduce a no operation (NOP) instruction after the which an instruction cannot be executed because it requires a
branch and accept the inevitability of a bubble. This mecha- result from a previous operation that has not yet left the
nism is called a delayed jump or a branch-and-execute pipeline. Consider the following sequence of operations.
technique. Figure 8.10 demonstrates how a RISC processor These instructions are executed sequentially. However, a
implements a delayed jump. The branch described in problem arises when the third instruction, SUB R6, R7, R5,
8.3 RISC architecture and pipelining 339
Instruction 3 Fetch 3 OF OE OS
ADD R6,R7,R5
Instruction 4 Fetch 4 OF OE OS
ADD R2,R1,R4 Although this operation uses
register R1 as a source operand that
was generated by instruction 1, there
is no need for infernal forwarding
because R1 was stored in time slot 4
is executed on a pipelined machine. This instruction uses R5, written into the register file before it is read as a source
which is calculated by the preceding instruction, as a source operand by instruction 4.
operand. Clearly, the value of R5 will not have been stored by
the previous instruction by the time it is required by the
current instruction. Figure 8.11 demonstrates how data 8.3.3 Reducing the branch penalty
dependency occurs. If we’re going to reduce the effect of branches on the perfor-
Figure 8.11 demonstrates that the pipeline is held up or mance of RISC processors, we need to determine the effect of
stalled after the fetch phase of instruction 3 for two clock branch instructions on the performance of the system.
cycles. It is not until the end of time slot 5 that operand R5 is Because we cannot know how many branches a given pro-
ready and execution can continue. Consequently a bubble gram will contain, or how likely each branch is to be taken, we
must be introduced in the pipeline while an instruction waits have to construct a probabilistic model for the system. We will
for its data generated by the previous instruction. make the following assumptions.
Figure 8.12 demonstrates a technique called internal for-
warding designed to overcome the effects of data dependency. 1. Each non-branch instruction is executed in one cycle.
The example provided corresponds to a three-stage pipeline 2. The probability that a given instruction is a branch is pb.
like the RISC. The following sequence of operations is to be 3. The probability that a branch instruction will be
executed. taken is pt.
4. If a branch is taken, the additional penalty is b cycles.
1.
5. If a branch is not taken, there is no penalty.
2.
3. The average number of cycles executed during the execu-
4. tion of a program is the sum of the cycles taken for non-
branch instructions, plus the cycles taken by branch
instructions that are taken, plus the cycles taken by branch
Instruction 2 generates a destination operand R5 that is instructions that are not taken.
required as a source operand by the next instruction. If the If the probability of an instruction being a branch is pb, the
processor were to read the source operand requested by probability that an instruction is not a branch is 1pb
instruction 3 directly from the register file, it would see the because the two probabilities must add up to 1. Similarly, if pt
old value of R5. By means of internal forwarding the proces- is the probability that a branch will be taken, the probability
sor transfers R5 from instruction 2’s execution unit directly that a branch will not be taken is 1pt.
to the execution unit of instruction 3 (see Fig. 8.12). The total cost (i.e. time) of an instruction is
In this example, instruction 4 uses an operand generated
by an instruction 1 (i.e. the contents of register R1). However,
Tave (1 pb)·1 pb ·pt ·(1 b) pb ·(1 pt)·1
because of the intervening instructions 2 and 3, the destina-
tion operand generated by instruction 1 has time to be 1 pb ·pt ·b.
340 Chapter 8 Accelerating performance
The expression, 1 pb . pt. b, tells us that the number of 3. Predict branch not taken and branch not taken—success-
branch instructions, the probability that a branch is taken, ful outcome.
and the overhead per branch instruction all contribute to the 4. Predict branch not taken and branch taken—unsuccess-
branch penalty. We are now going to examine some of the ful outcome.
ways in which pb. pt. b can be reduced.
Suppose we apply a branch penalty to each of four these
possible outcomes. The penalty is the number of cycles
Branch prediction taken by that particular outcome, as Table 8.5 demonstrates.
If we can predict the outcome of the branch instruction For example, if we think that a branch will not be taken and
before it is executed, we can start filling the pipeline with get instructions following the branch and the branch is actu-
instructions from the branch target address if the branch is ally taken (forcing the pipeline to be loaded with instructions
going to be taken. For example, if the instruction is BRA N, the at the target address), the branch penalty in Table 8.5 is c
processor can start fetching instructions at locations N, cycles.
N 1, N 2 etc., as soon as the branch instruction is We now need to calculate the average penalty for a partic-
fetched from memory. In this way, the pipeline is always filled ular system. To do this we need more information about the
with useful instructions. system. The first thing we need to know is the probability that
This prediction mechanism works well with an uncondi- an instruction will be a branch (as opposed to any other cat-
tional branch like BRA N. Unfortunately, conditional branches egory of instruction). Assume that the probability that an
pose a problem. Consider a conditional branch of the form instruction is a branch is pb. The next thing we need to know
BCC N (branch to N on carry bit clear). Should the RISC is the probability that the branch instruction will be taken, pt.
processor make the assumption that the branch will not be Finally, we need to know the accuracy of the prediction. Let pc
taken and fetch instructions in sequence, or should it make be the probability that a branch prediction is correct. These
the assumption that the branch will be taken and fetch values can be obtained by observing the performance of real
instruction at the branch target address N? programs. Figure 8.13 illustrates all the possible outcomes of
Conditional branches are required to implement various an instruction. We can immediately write
types of high-level language construct. Consider the follow- (1 pb) probability that an instruction is not a branch
ing fragment of high-level language code. (1 pt) probability that a branch will not be taken
(1 pc) probability that a prediction is incorrect
These equations are obtained by using the principle that if
one event or another must take place, their probabilities must
add up to unity. The average branch penalty per branch
instruction, Cave, is therefore
The first conditional operation compares J with K. Only the
nature of the problem will tell us whether J is often less than K. Cave a(pbranch_predicted_taken_and_taken)
The second conditional in this fragment of code is pro- b(pbranch_predicted_taken_but_not_taken)
vided by the FOR construct, which tests a counter at the end c(pbranch_predicted_not_taken_but_taken)
of the FOR and then decides whether to jump back to the
body of the construct or to terminate the loop. In this case, d(pbranch_predicted_not_taken_and_not_taken)
you could bet that the loop is more likely to be repeated than
exited. Some loops are executed thousands of times before Cave a·(pt ·pc) b·(pt 1)·(1 pc)
they are exited. Therefore, it might be a shrewd move to look c·pt ·(1 pc) d·(1 pt)·pc
at the type of conditional branch and then either fill the
pipeline from the branch target if you think that the branch
will be taken, or fill the pipeline from the instruction after the
branch if you think that it will not be taken. Prediction Result Branch penalty
If we attempt to predict the behavior of a system with two
Branch taken Branch taken a
outcomes (branch taken or branch not taken), there are four
Branch taken Branch not taken b
possibilities.
Branch not taken Branch taken c
1. Predict branch taken and branch taken—successful outcome. Branch not taken Branch not taken d
2. Predict branch taken and branch not taken—unsuccess-
ful outcome. Table 8.5 The branch penalty.
8.3 RISC architecture and pipelining 341
Fetch i + 2 OF OE OS
Load instruction
Bubble due to external
memory access
The Berkeley RISC implements two addressing Thirteen-bit and 19-bit immediate fields may sound a little
modes: indexed and program counter relative. All other strange at first sight. However, because 13 19 32, RISC
addressing modes must be synthesized from these two permits a full 32-bit value to be loaded into a window register
primitives. The effective address in the indexed mode is in two operations. A typical microprocessor might take the
given by same number of instruction bits to perform the same action
(i.e. a 32-bit operation code field followed by a 32-bit literal).
The following describes some of the addressing modes that
can be synthesized from the RISC’s basic addressing modes.
where Rx is the index register (one of the 32 Conditional instructions do not require a destination address
general purpose registers) and S2 is an offset. The and therefore the 5 bits, 19 to 23, normally used to specify a
offset can be either a general-purpose register or a 13-bit destination register are used to specify the condition (one of
constant. 16 because bit 23 is not used by conditional instructions).
8.3 RISC architecture and pipelining 343
Registers
IR
Operand
latches
Source 1 S1 O1
S1 flip-flop
Result
ALU flip-flop
Source 2 S2 O2
S2 flip-flop
Instruction
PC memory Op-code
Destination D Result
2T delay
+
Op-code
Op
T delay
4
Figure 8.14 Using latches to
Read instruction from memory Fetch source operands Calculate result Store result implement pipelining.
Clock
PCoutput i i +1 i +2 i +3 i +4
Instruction i in IR
IRoutput i –1 i i +1 i +2 i +3
Source i –1 i i +1 i +2
operands
Result i +1
(ALU output) i –1 i
Figure 8.15 Timing diagram for a
Read instruction Read operands Generate result Latch result
pipelined computer.
During cycle T0 the output of the program counter interro- slot T2, the program counter contains the address of instruc-
gates the program memory and instruction i is read from tion i 2 and the instruction register contains the op-code
memory. for instruction i 1.
When the next clock pulse appears at the beginning of In Fig. 8.14 there is a block marked T delay in the path
cycle T1, instruction i is latched in the instruction register and between the op-code field of the IR and the ALU, and a block
held constant for a clock cycle. The program counter is incre- marked 2T delay between the destination field of the op-code
mented and instruction i 1 is read from memory. The and the register file. These delays are necessary to ensure that
instruction in the instruction register is decoded and used to data arrives at the right place at the right time. For example,
read its two source operands during cycle T1. the operand data that goes to the ALU passes through the
At the end of cycle T1, the two source operands for instruc- operand latches, which create a one-cycle delay. Consequently,
tion i appear at the operand latches (just as instruction i 1 the op-code has to be delayed for a cycle to avoid the data
appears at the IR). for instruction i getting to the ALU at the same time as the
During cycle T2, the operands currently in the operand op-code for instruction i 1.
latches are processed by the ALU to produce a result that is During cycle T3 the result of instruction i from the ALU is
captured by the result latch at the beginning of cycle T3. At latched into the register file. In cycle T3, instruction i 3 is in
this point, instruction i has been executed. Note that in time the program counter, instruction i 2 is in the instruction
344 Chapter 8 Accelerating performance
register, instruction i 1 is being executed, and the result of Cache memory can be understood in everyday terms by its
instruction i is being written back into the register file. A new analogy with a diary or notebook used to jot down telephone
instruction is completed (or retired) on each further clock numbers. A telephone directory contains hundreds of thou-
pulse. sands of telephone numbers and nobody carries a telephone
There is little point in increasing the speed of the process- directory around with them. However, lots of people have a
ing if memory cannot deliver data and instructions when notebook with a hundred or so telephone numbers that they
they are needed. This is a particularly critical issue in com- keep with them. Although the fraction of all possible tele-
puter design because memory speed has not kept up with phone numbers in someone’s notebook might be less than
processor speed. In the next section we look at how the effect- 0.01%, the probability that their next call will be to a number
ive speed of main store can be increased. in the notebook is high because they frequently call the same
people. Cache memory operates on exactly the same prin-
ciple, by locating frequently accessed information in the
8.4 Cache memory cache memory rather than in the much slower main memory.
Unfortunately, unlike the personal notebook, the computer
We now look at the cache memory that can dramatically cannot know, in advance, what data is most likely to be
increase the performance of a computer system at relatively accessed. You could say that computer caches operate on a
little cost. learning principle. By experience they learn what data is most
Cache memory provides system designers with a way of frequently used and then transfer it to the cache.
exploiting high-speed processors without incurring the cost The general structure of a cache memory is provided in
of large high-speed memory systems. The word cache is pro- Fig. 8.16. A block of cache memory sits on the processor’s
nounced ‘cash’ or ‘cash-ay’ and is derived from the French address and data buses in parallel with the much larger main
word meaning hidden. Cache memory is hidden from the memory. The implication of parallel in the previous sentence
programmer and appears as part of the system’s memory is that data in the cache is also maintained in the main mem-
space. There’s nothing mysterious about cache memory—it’s ory. To return to the analogy with the telephone notebook,
simply a quantity of very-high-speed memory that can be writing a friend’s number in the notebook does not delete
accessed rapidly by the processor. The element of magic their number in the directory.
stems from the ability of systems with a tiny cache memory Cache memory relies on the same principle as the note-
(e.g. 512 kbytes of cache memory in a system with 2 Gbytes book with telephone numbers. The probability of accessing
of DRAM) expecting the processor to make over 95% of its the next item of data in memory isn’t a random function.
accesses to the cache rather than the slower DRAM. Because of the nature of programs and their attendant data
First-generation microprocessors had truly tiny cache structures, the data required by a processor is often highly
memories; for example, 256 bytes. Up to the mid-1990s, clustered. This aspect of memories is called the locality of refer-
cache sizes of 8 to 32 kbytes were common. By the end of the ence and makes the use of cache memory possible (it is of
1990s, PCs had internal on-chip caches of 128 kbytes and course the same principle that underlies virtual memory).
external second-level caches of up to 1 Mbyte and in 2004 A cache memory requires a cache controller to determine
on-chip cache memories of 2 Mbytes and up to 4 Gbytes of whether the data currently being accessed by the CPU resides
main store. in the cache or whether it must be obtained from the main
Main store
Address Address bus
CPU
memory. When the current address is applied to the cache Figure 8.17 provides a plot of S as a function of the hit ratio
controller, the controller returns a signal called hit, which is (h). As you might expect, when h 0 and all accesses are
asserted if the data is currently in the cache. Before we look at made to the main memory, the speedup ratio is 1. Similarly,
how cache memories are organized, we will demonstrate when h 1 and all accesses are made to the cache the
their effect on a system’s performance. speedup ratio is 1/k. The most important conclusion to be
drawn from Fig. 8.17 is that the speedup ratio is a sensitive
function of the hit ratio. Only when h approaches about 90%
8.4.1 Effect of cache memory on does the effect of the cache memory become really signific-
computer performance ant. This result is consistent with common sense. If h drops
A key parameter of a cache system is its hit ratio (h), which below about 90%, the accesses to main store take a dis-
defines the ratio of hits to all accesses. The hit ratio is deter- proportionate amount of time and accesses to the cache have
mined by statistical observations of a real system and cannot little effect on system performance.
readily be calculated. Furthermore, the hit ratio is dependent Life isn’t as simple as these equations suggest. Computers
on the specific nature of the programs being executed. It is are clocked devices and run at a speed determined by the
possible to have some programs with very high hit ratios and clock. Memory accesses take place in one or more whole clock
others with very low hit ratios. Fortunately, the effect of local- cycles. If a processor accesses main store in one clock cycle,
ity of reference usually means that the hit ratio is very high— adding cache memory is not going to make the system faster.
often in the region of 95%. Before calculating the effect of a If we assume that a computer has a clock cycle time tcyc, and
cache memory on a processor’s performance, we need to accesses cache memory in p clock cycles (i.e. access
introduce some terms. time ptcyc) and main store in q clock cycles, its speedup
ratio is
Access time of main store tm
Access time of cache memory tc tm qtcyc
S
Hit ratio h htc (1 h)tm phtcyc (1 h)tcyc
Miss ratio m
Speedup ratio S q
ph (1 h)
The figure of merit of a computer with cache is called the
speedup ratio, which indicates how much the cache acceler-
If q 4 and p 2, the speedup ratio is given by
ates the memory’s access time. The speedup ratio is defined as
1/(2h/4 1 h) 2/(2 h).
the ratio of the memory system’s access time without cache to
In practice, we are more concerned with the performance
its access time with cache.
of the entire system. A computer doesn’t spend all its time
N accesses to a system without cache memory requires Ntm
accessing memory. The following expression gives a better
seconds. N accesses to a system with cache requires
picture of the average cycle time of a computer because it
N(htc mtm) seconds; that is, the time spent in accessing the
takes into account the number of cycles the processor spends
cache plus the time spent accessing the main memory multi-
performing internal (i.e. non-memory reference) operations.
plied by the total number of memory accesses. We can express
m in terms of h as m (1 h), because if an access is not a taverage=Finternal . Ntcyc+FMemory . tcyc(tcache + (1 – h)(tcache + tdelay))
hit it must be a miss. Therefore the total access time for a sys-
tem with cache is given by N(htc (1 h)tm).
The speedup ratio is therefore given by Speedup
ratio (S )
where the CPU. For the purpose of this discussion we need only
FInternal fraction of cycles the processor spends doing inter- consider the set and line (as it doesn’t matter how many
nal operations words there are in a line). The address in this example has a
N average number of cycles per internal operation 2-bit set field, a 2-bit line field, and a 1-bit word field. The
tcyc processor cycle time cache memory holds 22 4 lines of two words. When the
FMemory fraction of cycles processor spends doing memory processor generates an address, the appropriate line in
accesses the cache is accessed. For example, if the processor generates
tdelay additional penalty clock cycles required caused by a the 5-bit address 101002, word 0 of line 2 in set 2 is accessed.
cache miss A glance at Fig. 8.18 reveals that there are four possible
h hit ratio lines numbered two—a line 2 in set 0, a line 2 in set 1, a line 2
tcache cache memory access time (in clock cycles) in set 2, and a line 2 in set 3. In this example the processor
accessed line 2 in set 2. The obvious question is, ‘How does
Note that, by convention, the main memory access time is
the system know whether the line 2 accessed in the cache is
given by the number of cycles to access cache plus the addi-
the line 2 from set 2 in the main memory?’
tional number of cycles (i.e. the penalty) to access main store.
Figure 8.19 shows how a direct-mapped cache resolves the
If we put some figures into this equation, we get
contention between lines. Each line in the cache memory has
a tag or label, which identifies which set this particular line
taverage 40% 2 20 ns 60% 20 ns belongs to. When the processor accesses line 2, the tag
belonging to line 2 in the cache is sent to a comparator. At the
冢
0.9 1 0.1(1 3) 冣 same time the set field from the processor is also sent to
the comparator. If they are the same, the line in the cache is
16 ns 26 ns the desired line and a hit occurs.
42 ns If they are not the same, a miss occurs and the cache must
be updated. The old line 2 from set 1 is either simply dis-
The effect of cache memory on the performance of a com- carded or rewritten back to main memory depending on how
puter depends on many factors including the way in which the updating of main memory is organized.
the cache is organized and the way in which data is written to
main memory when a write access takes place. We will return
to some of these considerations when we have described how
cache systems are organized.
5-bit address from CPU
2 bits 2 bits 1 bit
Set Line Word
8.4.2 Cache organization Main store
Line 0
There are at least three ways of organizing a cache memory—
Line 1 Set 0
direct-mapped, associative-mapped, and set associative-
Line 2
mapped cache. Each of these systems has its own The line address
Line 3
performance:cost trade-off. selects the same
line (line 2) in each Line 0
of the four sets
Line 1 Set 1
Direct-mapped cache Line 2
The easiest way of organizing a cache memory employs direct Line 3
Cache memory
mapping, which relies on a simple algorithm to map data Line 0
Line 0
block i from the main memory into data block j in the cache. Line 1
Line 1 Set 2
For the purpose of this section we will regard the smallest Line 2
unit of data held in a cache as a line that is made up of typ- Line 2
Line 3
ically two or four consecutive words. The line is the basic unit Line 3
Line 0
of data that is transferred between the cache and main store
A line in the Line 1
and varies between 4 and 32 bytes. cache may
Line 2 Set 3
Figure 8.18 illustrates the structure of a highly simplified come from one
direct-mapped cache. As you can see, the memory space is of the four sets Line 3
divided into sets and the sets into lines. This memory is com-
posed of 32 words and is accessed by a 5-bit address bus from Figure 8.18 The direct-mapped cache.
8.4 Cache memory 347
Figure 8.20 provides a skeleton structure of a direct- Another important advantage of direct-mapped cache is
mapped cache memory system. The cache memory itself is its inherent parallelism. Because the cache memory holding
nothing more than a block of very-high-speed random access the data and the cache tag RAM are entirely independent,
read/write memory. The cache tag RAM is a fast combined they can both be accessed simultaneously. Once the tag has
memory and comparator that receives both its address and been matched and a hit has occurred, the data from the cache
data inputs from the processor’s address bus. The cache tag will also be valid (assuming the two cache data and cache tag
RAM’s address input is the line address from the processor memories have approximately equal access times).
that is used to access a unique location (one for each of the The disadvantage of direct-mapped cache is almost a
possible lines). The data in the cache tag RAM at this location corollary of its advantage. A cache with n lines has one
is the tag associated with that location. The cache tag RAM restriction—at any instant it can hold only one line num-
also has a data input that receives the tag field from the bered x. What it cannot do is hold a line x from set p and a line
processor’s address bus. If the tag field from the processor x from set q. This restriction exists because there is one page
matches the contents of the tag (i.e. set) field being accessed, frame in the cache for each of the possible lines. Consider the
the cache tag RAM returns a hit signal. following fragment of code:
As Fig. 8.20 demonstrates, the cache tag RAM is nothing
more than a high-speed random access memory with a built-
in data comparator. Some of the major semiconductor
manufacturers have implemented single-chip cache tag
RAMs.
The advantage of the directly mapped cache is almost self-
evident. Both the cache memory and the cache tag RAM are This fragment of code reads data and compares it with
widely available devices which, apart from their speed, are no another string until a match is found. Suppose that the
more complex than other mainstream devices. Moreover, the Get_data routine is in set x, line y and that part of the
direct-mapped cache requires no complex line replacement Compare routine is in set z, line y. Because a direct-mapped
algorithm. If line x in set y is accessed and a miss takes place, cache can hold only one line y at a time, the frame cor-
line x from set y in the main store is loaded into the frame for responding to line y must be reloaded twice for each path
line x in the cache memory and the tag set to y. That is, there through the loop. Consequently, the performance of a direct-
is no decision concerning which line has to be rejected when mapped cache can sometimes be poor. Statistical measure-
a new line is to be loaded. ments on real programs indicate that the very poor
348 Chapter 8 Accelerating performance
Data
Address bus
Address Main store
CPU
0
Line address
1 Cache Line and word
memory address
2
3
Associative-mapped cache
One way of organizing a cache memory that overcomes the Cache memory
limitations of direct-mapped cache is described in Fig. 8.21. Tag
Ideally, we would like a cache that places no restrictions on Tag
what data it can contain. The associative cache is such a
Tag
memory.
Tag
An address from the processor is divided into three fields:
Tag
the tag, the line, and the word. Like the direct-mapped cache,
Tag
the smallest unit of data transferred into and out of the cache
is the line. Unlike the direct-mapped cache, there’s no pre- The tag from the
address bus is Line n
determined relationship between the location of lines in the compared with all
tags in the cache A line in the cache may
cache and lines in the main memory. Line p in the memory simultaneously come from any line in
can be put in line q in the cache with no restrictions on the the main store
values of p and q. Consider a system with 1 Mbyte of main
store and 64 kbytes of associatively mapped cache. If the size Figure 8.21 Associative-mapped cache.
of a line is four 32-bit words (i.e. 16 bytes), the main memory
is composed of 220/16 64K lines and the cache is composed requires a 16-bit tag to uniquely label it as being associated
of 216/16 4096 lines. Because an associative cache permits with line i from the main store.
any line in the main store to be loaded into one of its lines, When the processor generates an address, the word bits
line i in the associative cache can be loaded with any one of select a word location in both the main memory and the
the 64K possible lines in the main store. Therefore, line i cache. Unlike the direct-mapped cache memory, the line
8.4 Cache memory 349
address from the processor can’t be used to address a line in Address from CPU
the associative cache. Why? Because each line in the direct-
mapped cache can come only from one of n lines in the main
store (where n is the number of sets). The tag resolves which
of the lines is actually present. In an associative cache, any of Cache 1 Cache 2 Cache 3 Cache 4
the 64K lines in the main store can be located in any of the
lines in the cache. Consequently, the associative cache Hit Hit Hit Hit
requires a 16-bit tag to identify one of the 216 lines from the
Each of the
main memory. Because the cache’s lines are not ordered, the caches is direct
mapped
tags are not ordered and cannot be stored in a simple look-up
Composite hit
table like the direct-mapped cache. In other words, when the A hit occurs if any one
CPU accesses line i, it may be anywhere in the cache or it may of the four caches
responds to an access
not be in the cache.
Associative cache systems employ a special type of mem- Figure 8.22 Set associative-mapped cache.
ory called associative memory. An associative memory has an
n-bit input but not necessarily 2n unique internal locations.
The n-bit address input is a tag that is compared with a tag
field in each of its locations simultaneously. If the input tag appropriate line in each of four direct-mapped caches is
matches a stored tag, the data associated with that location is accessed simultaneously. Because there are four lines, a
output. Otherwise the associative memory produces a miss simple associative match can be used to determine which
output. An associative memory is not addressed in the same (if any) of the lines in cache are to supply the data. In
way that a computer’s main store is addressed. Conventional Fig. 8.22 the hit output from each direct-mapped cache is
computer memory requires the explicit address of a location, fed to an OR gate, which generates a hit if any of the caches
whereas an associative memory is accessed by asking,‘Do you generate a hit.
have this item stored somewhere?’
Associative cache memories are efficient because they
place no restriction on the data they hold. In Fig. 8.21 the tag Level 2 cache
that specifies the line currently being accessed is compared The memory hierarchy can be expanded further by dividing
with the tag of each entry in the cache simultaneously. the cache memory into a level 1 and a level 2 cache. A level 1
In other words, all locations are accessed at once. cache is normally located on the same chip as the CPU itself;
Unfortunately, large associative memories are not yet cost that is, it is integrated with the processor. Over the years,
effective. Once the associative cache is full, a new line can be level 1 caches have grown in size as semiconductor technol-
brought in only by overwriting an existing line that requires ogy has advanced and more memory devices can be inte-
a suitable line replacement policy (as in the case of virtual grated on a chip. A level 2 cache was once invariably located
memories). off the processor chip but modern high-performance
devices have on-chip level 1 and level 2 caches. By 2005 Intel
Set associative-mapped cache
Pentium processors were available with 2 Mbyte level 2
Most computers employ a compromise between the direct- caches.
mapped cache and the fully associative cache called a set asso- When the processor makes a memory access, the level 1
ciative cache. A set associative cache memory is nothing more cache is first searched. If the data isn’t there, the level 2 cache
than multiple direct-mapped caches operated in parallel. The is searched. If it isn’t in the level 2 cache, the main store is
simplest arrangement is called a two-way set associative accessed. The average access time is given by
cache and consists of two direct-mapped cache memories so
that each line in the cache system is duplicated. For example, tave hL1tc1 (1 hL1)hL2tc2
a two-way set associative cache example has two line 5s and
it’s possible to store one line 5 from set x and one line 5 (1 hL1)(1 hL2)tmemory
from set y.
If the cache has n parallel sets, an n-way comparison is per- where hL1 and hL2 are the hit rates of the level 1 and level 2
formed in parallel against all members of the set. Because n in caches, and tc1 and tc2 are the access times of the L1 and L2
small (typically 2 to 16), the logic required to perform the caches, respectively.
comparison is not complex. Consider a system with a hit ratio of 0.90, a single-level
Figure 8.22 describes the common four-way set associa- cache access time of 4 ns, and a main store access time of
tive cache. When the processor accesses memory, the 50 ns. The speedup ratio is given by 1/(hk 1 h) 5.81.
350 Chapter 8 Accelerating performance
Suppose we add a level 2 cache to this system and that the (about 5 to 30% of memory accesses). If we take into account
level 2 cache has a hit ratio of 0.7 and an access time of 8 ns. the action taken on a miss during a read access and on a miss
In this case, the access time is during a write access, the average access time for writethough
memory is given by
tave hL1tc1(1hL1)hL2tc2(1hL1)(1hL2)
tmemory0.940.10.780.10.3505.66 ns tave htc (1 h)(1 w)tl (1 h)wtc
The speedup ratio with a level 2 cache is where w is the fraction of write accesses and tl is the time
50 ns/ 5.66 ns 8.83. taken to reload the cache on a miss. The (1 h)(1 w)tl
term is the time taken to reload the cache on a read access and
(1 h)wtc represents the time taken to access the cache on a
8.4.3 Considerations in cache design write miss. This equation is based on the assumption that
Apart from choosing the structure of a cache system and the writes occur infrequently and therefore the main store has
line replacement policy, the designer has to consider how time to store writethrough data between two successive write
write cycles are to be treated. Should write accesses be made operations.
only to the cache and then the main store updated when the Another aspect of cache memories that has to be taken into
line is replaced (a writeback policy)? Should the main mem- account in sophisticated systems is cache coherency. As we
ory also be updated each time a word in the cache is modified know, data in the cache also lives in the main memory. When
(a writethrough policy)? The writethrough policy allows the the processor modifies data it must modify both the copy in
cache to be written to rapidly and the main memory updated the cache and the copy in the main memory (although not
over a longer span of time (if there is write buffer to hold necessarily at the same time). There are circumstances when
the data until the bus becomes free). A writethrough the existence of two copies (which can differ) of the same
policy can lead to more memory write accesses than are item of data causes problems. For example, an I/O controller
strictly necessary. using DMA might attempt to move an old line of data from
When a cache miss occurs, a line of data is fetched from the the main store to disk without knowing that the processor has
main store. Consequently, the processor may read a byte from just updated the copy of the data in the cache but has not yet
the cache and then the cache requires a line of, say, 8 bytes updated the copy in the main memory. Cache coherency is
from the main store. As you can imagine, the cost of a miss on also known as data consistency.
an access to cache carries an additional penalty because an
entire line has to be filled from memory. Fortunately, modern
memories, CPUs, and cache systems support a burst-fill 8.5 Multiprocessor systems
mode in which a burst of consecutive data elements can be
transferred between the main store and cache memory. Let’s One way of accelerating the performance of a computer with-
look at cache access times again. out resorting to either new technology or to a new architec-
If data is not in the cache, it must be fetched from memory ture is to use a multiprocessor system; that is, you take two or
and loaded in the cache. If tl is the time taken to reload the more CPUs and divide the work between them. Here we
cache on a miss, the effective average access time is given by introduce some basic concepts of multiprocessing hardware
and the topology of multiprocessors.
tave htc (1 h)tm (1 h)tl
The speedup ratio, Sp, of a multiprocessor system using p
The term (1 h)tl is the additional time required to processors is defined as Sp T1/Tp, where T1 is the time taken
reload the cache following each miss. This expression can be to perform the computation on a single processor and Tp is
rewritten as the time taken to perform the same computation on p pro-
tave htc (1 h)(tl tm) cessors. The value of Sp must fall in the range 1
Sp
p. The
The term (tl tm) corresponds to the time taken to access lower limit on Sp corresponds to a situation in which the par-
main memory and to load a line in the cache following a miss. allel system cannot be exploited and only one processor can
However, because both accessing the element that caused the be used. The upper limit on Sp corresponds to a problem that
miss and filling the cache take place in parallel, we can note can be divided equally between the p processors. The effi-
that tl tm and simplify the equation to get ciency, Ep, of a multiprocessor system is defined as the ratio
between the speedup factor and the number of processors;
tave htc (1 h)tl
that is Ep Sp/p T1/pTp. The efficiency, Ep, must fall in the
The performance of cache memory systems is also deter- range 1 (all processors used fully) to 1/p (only one processor
mined by the relative amounts of read and write accesses and out of p used).
the different ways in which read and write cache accesses are Whenever I think of multiprocessing I think of air travel.
treated. Relatively few memory accesses are write operations Suppose you want to get from central London to downtown
8.5 Microprocessor systems 351
Manhattan. The time taken is the sum of the time to travel to parallel, Pp. If each task in Fig. 8.23 requires t seconds, the
Heathrow airport, the check-in time, the transatlantic jour- total time required by a serial processor is 8t. Because three
ney, the baggage reclaim time, and the time to travel from JFK pairs of tasks can be carried out in parallel, the total time
to downtown Manhattan. The approximate figures (in taken on a parallel system is 4t.
hours) are 0.9 1.5 6.5 0.5 1 10.4. Suppose you Suppose a task consists of a part that must be computed
now decide to speed things up and travel across the Atlantic serially and a part that can be computed by processors in par-
in a supersonic aircraft that takes only 3 hours; the new times allel. Let the fraction of the task executed serially be f and the
are 0.9 1.5 3 0.5 1 6.9 hours. The speedup ratio fraction executed in parallel be (1 f). The time taken to
between these two modes of travel is 10.4/6.9 1.51. process the task on a parallel processor is f T1 (1 f) T1/p,
Increasing the speed of the aircraft by a factor of 2.17 has where t is the time required to execute the task on a single
resulted in a speedup ratio of only 1.51, because all the other processor and p is the number of processors. The speedup
delays have not been changed. ratio is Sp T1/(fT1 (1 f )T1/p) p/(1 (p 1)f ).
The same problem affects multiprocessing—the speedup This equation is Amdahl’s law and tells us that increasing the
ratio is profoundly affected by the parts of a problem that number of processors in a system is futile unless the value of
cannot be computed in parallel. Consider, for example, the f is very low.
product (P Q)(P Q). The operations P Q and PQ Figure 8.24 demonstrates the relationship between the
can be carried out simultaneously in parallel, whereas their speedup ratio, S(f), and f (the fraction of serial processing)
product can be carried out serially only after P Q and for a system with 16 processors. The horizontal axis is
P Q have been evaluated. the fraction of a task that is executed in parallel, 1 f. As
Figure 8.23 shows how a task may have components that you can see, the speedup ratio rises very rapidly as 1 f
must be executed serially, Ps, and tasks that can be executed in approaches 1.
Parallel process
Pp Pp
Serial process
Start Ps Ps Pp Pp End
Pp Pp
Figure 8.23 Executing a task in serial and
If each process takes t seconds, the total time taken is 4t parallel.
20
16
12
S(f )
You can’t just plug an extra processor into an existing sys- system requires very large amounts of computer processing
tem to convert it into a multiprocessor. The global implica- power with relatively little I/O activity or disk access.
tions for the system hardware and its software are not trivial, Obviously it is reasonable to try to solve the problem by
because the individual processors have to share the available means of multiprocessing. For example, as one processor is
resources (i.e. memory and input/output). An effective mul- updating a target’s current position, another processor can be
tiprocessor system must be able to allocate resources to con- calculating its future position.
tending processors without seriously degrading the The preceding problem is described as classic, because it is
performance of the system. so well suited to multiprocessing. There are several ways of
Some multiprocessor systems are termed reconfigurable, allocating the mathematics involved in the radar calculations
because the structure of the hardware itself can be modified to the various processors. It is, unfortunately, much less easy
by the operating system. For example, the way in which mem- to decompose a general task into a number of subtasks that
ory is distributed between the individual processors or the can be run in parallel. Often it is necessary for the program-
paths between the processors can be changed dynamically mer to write programs in such a way that they involve the
under software control. Similarly, interrupt handling can be greatest amount of parallel activity. Other problems well
dynamically partitioned between the various processors to suited to parallel processing are the simulation of complex
maximize efficiency. We do not discuss reconfigurable archi- dynamic systems such as the atmosphere or the motion of
tectures further here. liquids.
Although the architecture of a stored-program computer
(i.e. a von Neumann machine) can be defined quite precisely,
there is no similar definition of a multiprocessor system.
8.5.1 Topics in multiprocessor systems
Multiprocessor systems come in many different flavors and a A key parameter of a multiprocessor system is its topology,
configuration suitable for one particular application may be which defines how the processors are arranged with respect
useless for another. The only really universal characteristic to each other and how they communicate. A more important
common to all multiprocessor systems is that they have more parameter of a multiprocessor system is the degree of cou-
than one processor. We shall soon examine the various classes pling between the various processors. We will discuss proces-
of multiprocessor system. sor coupling first and then look at multiprocessor topologies.
Multiprocessor systems design is not easy; there are a lot Processors with facilities for exchanging large quantities of
of factors to take into account; for example, the distribution data very rapidly are said to be tightly coupled. Such com-
of tasks between processors, the interconnection of the puters share resources like buses or blocks of memory. The
processors (i.e. the topology of the multiprocessor system), advantage of tightly coupled systems is their potential speed,
the management of the memory resources, the avoidance of because one processor doesn’t have to wait long periods of
deadlock, and the control of input/output resources. time while data is transferred from another. Their disadvan-
Deadlock occurs when two or more processors cannot tage arises from the complexity of the hardware and software
continue because each is blocking the other. necessary to coordinate the processors. If they share a bus or
The distribution of tasks between processors is of crucial memory, an arbiter is needed to determine which processor is
importance in selecting the architecture of the processor sys- permitted to access the resource at any time.
tem itself. In turn, the distribution of tasks is strongly deter- Although not a problem associated entirely with multi-
mined by the nature of the problem to be solved by the processors, the avoidance of deadlock must feature in the
computer. In other words, the architecture of a multipro- design of some classes of multiprocessor. Deadlock describes
cessor system can be optimized for a certain type of problem. the situation in which two tasks are unable to proceed
Conversely, a class of programs that runs well on one multi- because each task holds something needed by the other. In a
processor system may not run well on another. real-time system, the sequential tasks (i.e. the software)
A classic problem that can be solved by multiprocessing require resources (memory, disk drives, I/O devices, etc.),
belongs to the world of air-traffic control. A radar system whereas in a multiprocessor system these resources are
receives a periodic echo from the targets (i.e. aircraft) being required by the individual processors.
tracked. Each echo E, is a function of the bearing, and Every multiprocessor system, like every single-processor
distance or range, r, of the target. Due to noise and imperfec- system, has facilities for input or output transactions. We
tions in the system, there is an uncertainty or error, , associ- therefore have the problem of how I/O transactions are to be
ated with each echo. A new echo is received every few treated in a multiprocessor system. Does each processor have
millisecond. Given this stream of inputs, Er, , r, ,, the its own I/O arrangements? Is the I/O pooled between the
computer connected to the radar receiver has to calculate the processors, with each processor asking for I/O facilities as
current positions of the targets and then estimate the future they are needed? Finally, is it possible to dedicate one or more
track of each target and report any possible conflicts. Such a processors solely to the task of I/O processing?
8.5 Microprocessor systems 353
In a similar vein, the designer of a multiprocessor may of different parameters. One broad classification of multi-
need to construct an appropriate interrupt-handling system. processors depends on the processor’s relationship to mem-
When an I/O device interrupts a processor in a single-proces- ory and to other processing elements. Multiprocessors can be
sor system, there is not a lot to decide. Either the processor classified as processor to memory (P to M) structures or as
services the interrupt or it is deferred. In a multiprocessor processing element to processing element (PE to PE) struc-
system we have to decide which processor will service an tures. Figure 8.25 describes these two structures. A P to M
interrupt, which in turn begs the question,‘Do we pool inter- architecture has n processors, an interconnection network,
rupts or do we allocate certain types of interrupt to specific and n memory elements. The interconnection network allo-
processors?’ If interrupts are pooled, the interrupt-handling cates processor X to memory Y. The more general PE to PE
software must also be pooled, as processor A must deal with architecture uses n processors, each with its own memory,
an interrupt from device X in exactly the same way that and permits processor element X to communicate with PE Y
processor B would deal with the same interrupt. In addition via an interconnection network. The multiprocessors
to interrupts generated by I/O devices, it is possible for one described in this chapter best fit the PE to PE model.
processor to interrupt another processor.
Like any other computer, the multiprocessor requires an SISD (single instruction single data-stream)
operating system. There are two basic approaches to the The SISD machine is nothing more than the conventional
design of operating systems for multiprocessors. One of the single-processor system. It is called single instruction because
simplest arrangements is the master–slave operating system only one instruction is executed at a time, and single data-
in which a single operating system runs on the master proces- stream because there is only one task being executed at any
sor and all other processors receive tasks that are handed instant.
down from the master. The master–slave operating system is
little more than the type of operating system found in con- SIMD (single instruction multiple data-stream)
ventional single-processor systems. The SIMD architecture executes instructions sequentially,
The distributed operating system provides each processor but on data in parallel. The idea of a single instruction oper-
with its own copy of the operating system (or at least a ating on parallel data is not as strange as it may sound.
processor can access the common operating system via Consider vector mathematics. A vector is a multicomponent
shared memory). Distributed operating systems are more data structure; for example, the four-component vector A
robust than their master–slave counterparts because the fail- might be 0.2, 4.3, 0.2, 0.1. A very frequent operation in most
ure of a single processor does not necessarily bring about a branches of engineering is the calculation of the inner product
complete system collapse. of two n-component vectors, A and B:
The problems we have just highlighted serve to emphasize
that a multiprocessor system cannot easily be built in a vac- s A.B aibi
uum. Whenever we are faced with the design of a multi- For example, if A is (1, 4, 3, 6) and B is (4, 6, 2, 3), the inner product
processor system, it is necessary to ask, ‘Why do we need the A.B is 1 4 4 6 3 2 6 3 4 24 6 18 52.
multiprocessor system and what are its objectives?’ and then The inner product can be expressed as single operation (i.e.
to configure it accordingly. In other words, almost all design s (A.B), but involves multiple data elements (i.e. the
aspects of a multiprocessor system are very much problem {aI.bi}. Such calculations are used extensively in computer
dependent. graphics and image processing. One way of speeding up the
calculation of an inner product is to assign a processor to the
generation of each of the individual elements, the {ai.bi}. The
8.5.2 Multiprocessor organization simultaneous calculation of ai . bi for i 1 to n requires
Although there is an endless variety of multiprocessor archi- n processors, one for each component of the vector.
tectures, we can identify broad groups whose members have Such an arrangement consists of a single controller that
certain features in common. One possible approach to the steps through the program (i.e. the single instruction-stream)
classification of multiprocessor systems, attributed to and an array of processing elements (PEs) acting on the com-
Michael J. Flynn, is to consider the type of the parallelism (i.e. ponents of a vector in parallel (i.e. the multiple data-stream).
architecture or topology) and the nature of the interpro- Often, such PEs are number crunchers or high-speed ALUs,
cessor communication. Flynn’s four basic multiprocessor rather than the general-purpose microprocessors we have
architectures are referred to by the abbreviations SISD, been considering throughout this text.
SIMD, MISD, and MIMD and are described later. However, The SIMD architecture, or array processor, has a high per-
before continuing, we must point out that Flynn’s topological formance/cost ratio and is very efficient, as long as the task
classification of multiprocessor systems is not the only one running on it can be decomposed largely into vector opera-
possible, as multiprocessors may be categorized by a number tions. Consequently, the array processor is best suited to the
354 Chapter 8 Accelerating performance
Interconnection network
air-traffic control problem discussed earlier, to the processing The following example demonstrates how this smoothing
of weather information (this involves partial differential algorithm operates. The left-hand array represents the near
equations), and to computerized tomography where the out- neighbors of a pixel before it is smoothed. The right-hand
put of a body scanner is processed almost entirely by vector array shows the pixel after smoothing. As you can see the
arithmetic. As the SIMD architecture is generally built value 6 has been reduced to 3.2.
around a central processor controlling an array of special-
2 3 3 2 3 3
purpose processors, the SIMD architecture is not discussed in
2 6 3 2 3.2 3
any further detail here. However, we provide an example of a 2 1 2 2 1 2
SIMD architecture to illustrate one of its applications.
Figure 8.26 demonstrates the application of a SIMD architec- Before smoothing After smoothing
ture to image smoothing—an operation performed on images
If the smoothing algorithm were performed serially, it
to remove noise (we discuss image processing further when
would be necessary to compute the value for each of the
we introduce computer graphics).7 In this example, an image
512 512 pixels by looking at its eight nearest neighbors.
from a noisy source such as a spacecraft camera is smoothed
Parallel processing allows us to process groups of pixels at the
or filtered to yield a relatively noise-free image. Consider an
same time.
input array, I, of 512 512 pixels, which is to be smoothed to
Assume that an SIMD array has 1024 processing elements
produce an output array S.
(PEs), logically arranged as an array of 32 32 PEs as shown
A pixel is an 8-bit unsigned integer representing one of 256
in Fig. 8.26. Each PE stores a 16 16 pixel sub-image block
gray levels from 0 (white) to 255 (black). Each pixel Si,j in the
of the 512 512 pixel image I. For example, PE0 stores a
smoothed array, S, is the average of the gray levels of its eight
16 16 pixel sub-image block composed of columns 0 to 15
nearest neighbors. By averaging the pixels in this way, the
and rows 0 to 15; PE1 stores the pixels in columns 16 to 31 of
effects of noise bursts are reduced. The neighbors of Si,j in the
rows 0 to 15, etc. Each PE smoothes its own subimage, with all
input array are Ii1,ji, Ii1,j, Ii1,j1, Ii,j1, Ii,j1, Ii1,j-1, Ii1,j and
Ii1,j1. The top, bottom, and left- and right-edge pixels of S 7
Example from Large-scale Parallel Processing Systems,
are set to zero, because their corresponding pixels in I do not Microprocessors and Microsystems, January 1987, by Howard J. Siegel
have eight adjacent neighbors. et al. pp. 3–20.
8.5 Microprocessor systems 355
1 pixel 1 pixel
16 pixels
PEj
512
pixels
16 PEj 16
pixels pixels
16 pixels
1 pixel 1 pixel
PE1023
PE992 16 pixels
512
pixels Detail showing inter-PE data transfer
PEs operating on their subimages concurrently. At the edges (e.g. ALUs). It is not usually necessary to make each process-
of each 16 16 subimage, data must be transferred between ing element as complex as a CPU.
adjacent PEs in order to calculate the smoothed value. The
necessary data transfers for PEj are shown in Fig. 8.26. MISD (multiple instruction single data-stream)
Transfers between different PEs can take place simultan- The MISD architecture performs multiple operations con-
eously. For example, when PEj1 sends its upper right corner currently on a single stream of data and is associated with the
pixel to PEj, PEj can send its own upper right corner pixel to pipeline processor. We described the concept of the pipeline
PEj 1, and so on. when we introduced the RISC processor. The difference
To perform a smoothing operation on a 512 512 pixel between a MISD pipeline and a RISC pipeline is one of scale.
image by the parallel smoothing of 1024 subimage blocks of In multiprocessor terms, the various processors are
16 16 pixels, 256 parallel smoothing operations are arranged in line and are synchronized so that each processor
executed. However, the neighbors of each subimage edge pixel accepts a new input every t seconds. If there are n processors,
must be transferred between adjacent PEs and the total num- the total execution time of a task is n . t seconds. At each epoch,
ber of parallel data transfers required is (4 16) 4 68 a processor takes a partially completed task from a down-
(i.e. 16 for each of the top, bottom and left- and right-side stream processor and hands on its own task to the next
edges). The corresponding serial algorithm needs no data upstream processor. As a pipeline processor has N processors
transfers between PEs but 5122 264 144 smoothing calcu- operating concurrently and each task may be in one of the N
lations must be executed. If no data transfers were needed, stages, it requires a total of N . t (K 1) time slots to
the parallel algorithm would be faster than the serial algo- process K tasks. The MISD architecture is not suited to mul-
rithm by a factor of 262 144/256 1024. If the inter-PE data tiprocessor systems based on general-purpose microproces-
transfer time is included and it is assumed that each parallel sors. MISD systems are highly specialized and require
data transfer requires at most as much time as one smoothing special-purpose architectures; they have never been devel-
operation, then the time factor improvement is oped to the same extent as SIMD and MIMD architectures.
262 144/(256 68) 809.
The last step in the smoothing process is to set the edge MIMD (multiple instruction multiple data-stream)
pixels of S to zero. This creates an additional (although negli- The MIMD architecture is really the most general-purpose
gible) overhead, which is to enable only the appropriate PEs form of multiprocessor system and is represented by systems
when the zero values are stored for these edge pixels (only in which each processor has its own set of instructions oper-
enabled PEs execute the instructions broadcast to the PEs). ating on its own data structures. In other words, the pro-
Serially, this would require (4 512) 4 2044 parallel cessors are acting in a largely autonomous mode. Each
stores. The SIMD architectures can be implemented by individual processor may be working on part of the main task
means of arrays of relatively primitive processing elements and does not necessarily need to get in touch with its
356 Chapter 8 Accelerating performance
Computers
Serial Parallel
(a) The unconstrained topology. (b) The fully connected topology. Figure 8.28 The unconstrained topology.
neighbors until it has finished its subtask. The PE to PE archi- processor X to processor Y. The four basic topologies are the
tecture described in Fig. 8.25 can be thought of as a generic unconstrained topology, the bus, the ring, and the star,
MIMD machine. although, of course, there are many variants of each of these
Because of the generality of the MIMD architecture, it can pure topologies.
be said to encompass the relatively tightly coupled arrange-
ments to be discussed shortly and the very loosely coupled The unconstrained topology
geographically distributed LANs. Figure 8.27 provides a The unconstrained topology is so called because it is a
graphical illustration of the classification of multiprocessor random arrangement in which a processor is linked directly
systems according to E. T. Fathi and A. M. Krieger (Multiple to each processor with which it wishes to communicate
Microprocessor Systems: What, Why and When, Computer, (Fig. 8.28(a)). The unconstrained topology is not practicable
March 1983, pp. 23–32). for any but the simplest of systems. As the number of pro-
cessors grows, the number of buses between processors
becomes prohibitive. Figure 8.28(b) shows the limiting case
8.5.3 MIMD architectures of this topology, called the fully connected topology, because
Although the array processor or the pipeline processor is each processor is connected to each other processor. The
likely to be constructed from very special units, the more gen- advantage of the unconstrained topology system is the very
eral MIMD architecture is much more likely to be built from high degree of coupling that can be achieved. As all the buses
widely available off-the-shelf microprocessors. Therefore, the are dedicated to communication between only two proces-
major design consideration in the production of such a mul- sors, there is no conflict between processors waiting to access
tiprocessor concerns the topology of the system, which the same bus.
describes the arrangement of the communications paths
between the individual processors. The bus topology
Figures 8.28 to 8.32 depict the five classic MIMD topo- The bus (Fig. 8.29) is the simplest of topologies because each
logies. Multiprocessor structures are described both by their processor is connected to a single common data highway—
topology and by their interconnection level. The level of inter- the bus. The bus is a simple topology; not least because it
connection is a measure of the number of switching units avoids the problem of how to route a message from processor
through which a message must pass when going from X to processor Y. All traffic between processors must use the
8.5 Microprocessor systems 357
Bus Central
node
110 111
0 1 10 11
100 101
(a) Hypercube with n = 1.
010 011
00 01
000 001
(b) Hypercube with n = 2.
(c) Hypercube with n = 3.
0110 0111
1110 1111
0100 0101
1100 1101
1010 1011
1000 1001
0010 0011
0000 0001
connected to exactly n other neighbors. Figure 8.32 illustrates into two hypercubes of dimension n1, and these two sub-
the hypercube for n 1, 2, 3, and 4. cubes can, in turn, be divided into four subcube of dimension
Each processor in a hypercube has an n-bit address in the n2 and so on.
range 0 . . . 00 to 1 . . . 11 (i.e. 0 to 2n 1) and has n nearest The hypercube is of interest because is has a topology that
neighbors with an address that differs from the node’s address makes it relatively easy to map certain groups of algorithm
by only 1 bit. If n 4 and a node has an address 0100, its four onto the hypercube. In particular, the hypercube is well
nearest neighbors have addresses 1100, 0000, 0110, and 0101. suited to problems involving the evaluation of fast Fourier
A hypercube of dimension n is constructed recursively by transforms (FFTs)—used in sound and video signal pro-
taking a hypercube of dimension n 1, prefixing all its node cessing. The first practical hypercube multiprocessor was
addresses by 0, and adding to this another hypercube of built at Caltech in 1983. This was called the Cosmic Cube and
dimension n 1 whose node addresses are all prefixed by 1. was based on 64 8086 microprocessors plus 8087 floating
In other words, a hypercube of dimension n can be subdivided point coprocessors.
8.5 Microprocessor systems 359
PE PE PE
Bus A
Bus B
PE PE PE PE
PE PE PE
Processing element
or memory element
Bus A
Bus B
PE PE PE PE
Figure 8.33 The dual-bus
(b) Twin-bus multiprocessor with dual bus access from each processor. multiprocessor.
Hybrid topologies between any pair of processors, one using bus A and one
In addition to the above pure network topologies, there are using bus B. Although the provision of two buses reduces the
very many hybrid topologies, some of which are described in bottleneck associated with a single bus, it requires more con-
Fig. 8.33 to 8.36. Figure 8.33(a) and (b) both illustrate the nections between the processors and the two buses and more
dual-bus multiprocessor, although this topology may be complex hardware is needed to determine which bus a
extended to include any number of buses. In Fig. 8.33(a) the processor is to use at any time.
processors are split into two groups, with one group con-
nected to bus A and one connected to bus B. A switching unit The crossbar network
connects bus A to bus B and therefore allows a processor on Another possible topology, described in Fig. 8.34, is the so-
one bus to communicate with a processor on the other. The called crossbar switching architecture, which has its origin in
advantage of the dual-bus topology is that the probability of the telephone exchange where it is employed to link sub-
bus contention is reduced, because both buses can be oper- scribers to each other.
ated in parallel (i.e. simultaneously). Only when a processor The processors are arranged as a single column (processors
connected to one bus needs to transfer data to a processor on Pc1 to Pcm) and a single row (processors Pr1 to Prn). That is,
the other does the topology become equal to a single-bus there are a total of m n processors. Note that the processors
topology. may be processing elements or just simple memory elements.
The arrangement of Fig. 8.33(b) also employs two buses, Each processor in a column is connected to a horizontal bus
but here each processor is connected directly to both buses and each processor in a row is connected to a vertical bus.
via suitable switches. Two communication paths always exist A switching network, Sr,c, connects the processor on row r to
360 Chapter 8 Accelerating performance
Row of processors
Pr1 Pr2 Pr3 Pr4
Column of
processors
S1,1 S1,2 S1,3 S1,4 Pc1
In this example
processor Pr1 is
connected to
processor Pc2.
the processor on column c. This arrangement requires m n shows how processor P0110 communicates with processor
switching networks for the m n processors. P0100, by establishing backward links from P0110 to P01 and
The advantage of the crossbar matrix is the speed at which then forward links from P01 to P010 to P0100.
the interconnection between two processors can be set The topology of the binary tree has the facility to set up
up. Furthermore, it can be made highly reliable by multiple simultaneous links (depending on the nature of
providing alternative connections between nodes, should each of the links), because the whole tree is never needed to
one of the switch points fail. Reliability is guaranteed only link any two points. In practice, a real system would imple-
if the switches are failsafe and always fail in the off or no- ment additional pathways to relieve potential bottlenecks
connection position. and to guard against the effects of failure at certain switching
If the switches at the crosspoints are made multiway points. The failure of a switch in a right-hand column, for
(vertical to vertical, horizontal to horizontal or horizontal, to example, P0010, causes the loss of a single processor, whereas
vertical), we can construct a number of simultaneous path- the failure of a link at the left-hand side, for example, P0,
ways through the matrix. The provision of multiple pathways immediately removes half the available processors from the
considerably increases the bandwidth of the system. system.
In practice, the crossbar matrix is not widely found in gen-
eral-purpose systems, because of its high complexity. Cluster topology
Another penalty associated with this arrangement is its Figure 8.36 illustrates the cluster topology, which is a hybrid
limited expandability. If we wish to increase the power of the star–bus structure. The importance of this structure lies in its
system by adding an extra processor, we must also add application in highly reliable systems. Groups of processors
another bus, together with its associated switching units. and their local memory modules are arranged in the form of
a cluster. Figure 8.36 shows three processors per cluster in an
The binary tree arrangement called triple modular redundancy. The output
An interesting form of multiprocessor topology is illustrated of each of the three processors is compared with the output of
in Fig. 8.35. For obvious reasons this structure is called a the other two processors in a voting network. The output of
binary tree, although I am not certain whether it is really the voting circuit (or majority logic circuit) is taken as two
a special case of the unconstrained topology of Fig. 8.26, or a out of three of its inputs, on the basis that the failure of a
trivial case of the star topology (using three processors), single module is more likely than the simultaneous failure of
repeatedly iterated! Any two processors (nodes) in the tree two modules.
communicate with each other by traversing the tree right to The design of a clustered triple modular redundancy
left until a processor common to both nodes is found, and system is not as easy as might be first thought. One of the
then traversing the tree left to right. For example, Fig. 8.35 major problems associated with modular redundancy arises
8.5 Microprocessor systems 361
P0000
P000
P0001
P00
P0010
P001
P0011
P0 Processor P0100
P0100 communicates with
processor P0110.
P010
P0101
P01
P0110
P011
P0111
Figure 8.35 The binary tree topology
Processor
P P P P P P
M M M M M M Memory
Switch Switch
from a phenomenon called divergence. Suppose that three that, although the principles behind the design of multi-
identical processors have identical hardware and software processor systems are relatively straightforward, their
and that they receive identical inputs and start with the same detailed practical design is very complex due to a consider-
initial conditions at the same time; therefore, unless one able degree of interaction between hardware and software. As
processor fails, their outputs are identical, as all elements of we have already pointed out, topologies for multiprocessor
the system are identical. systems are legion.
In actual fact, the above statement is not entirely true. In
order to create truly redundant systems, each of the three Coupling
processors in a cluster must have its own independent clock Up to now we have been looking at the topology of multi-
and I/O channels. Therefore, events taking place externally processor systems with little or no consideration of the nuts
will not be seen by each processor at exactly the same time. If and bolts of the actual connections between the processors.
these events lead to conditional branches, the operation of a Possibly more than any other factor, the required degree of
processor in the cluster may diverge from that of its neigh- coupling between processors in a multiprocessor system
bors quite considerably after even a short period of opera- determines how the processors are to be linked. A tightly
tion. In such circumstances, it becomes very difficult to tell coupled multiprocessor system passes data between pro-
whether the processors are suffering from divergence or cessors either by means of shared memory or by allowing one
whether one of them has failed. processor to access the other processor’s data, address, and
The problem of divergence can be eliminated by providing control buses directly. When shared memory, sometimes
synchronizing mechanisms between the processors and by called dual-port RAM, is employed to couple processors, a
comparing their outputs only when they all wish to access the block of read/write memory is arranged to be common to
system bus for the same purpose. Once more it can be seen both processors. One processor writes data to the block and
362 Chapter 8 Accelerating performance
the other reads that data. Data can be transferred as fast as than direct-mapped cache and achieves a performance close to
each processor can execute a memory access. associative cache.
The degree of coupling between processors is expressed in The final part of this chapter introduced the multiprocessor,
terms of two parameters: the transmission bandwidth and the which uses two or more computers operating in parallel to
latency of the interprocessor link. The transmission band- improve performance. The speedup ratio of a parallel processor
(i.e. multiprocessor) is the ratio of the time taken by one pro-
width is defined as the rate at which data is moved between
cessor to solve a task to the time taken by p processors to
processors and is expressed in bits/s. For example, if a micro-
solve the same task. Ideally the speedup factor is p. However,
processor writes a byte of data to an 8-bit parallel port every Amdahl’s law states that the speedup ratio in a multiprocessor
1 s, the bandwidth of the link is 8 bits/1 s or 8 Mbits/s. system with p processors is given by p/(1 (p 1)f), where f is
However, if a 32-bit port is used to move words at the same the fraction of the code that is executed serially.
rate, the bandwidth rises to 32 Mbits/s. We introduced the topology of multiprocessor systems,
The latency of an interprocessor link is defined as the time which describes the way the individual processors are inter-
required to initiate a data transfer. That is, latency is the time connected. Multiprocessor topologies like the hypercube, the
that elapses between a processor requesting a data transfer crossbar switching network, and the binary tree are well suited
and the time at which the transfer actually takes place. A high for solving particular classes of problem. However, a class of
degree of coupling is associated with large transmission problem that is well suited to one type of topology may be ill
suited to a different type of topology.
bandwidths and low latencies. As might be expected, tightly
coupled microprocessor systems need more complex hard-
ware than loosely coupled systems. ■ PROBLEMS
8.1 The power of computers is often quoted in MIPS and
megaflops. Some computer scientists believe that such fig-
■ SUMMARY
ures of merit are, at best, misleading and, at worst, down-
It’s taken a concerted attempt to make computers run as fast as right dishonest. Why?
they do today. This chapter has demonstrated three ways in Why can’t you compare two different processors on the
which the performance of the computer has been enhanced basis of their clock speeds? Surely, a processor with a 4 GHz
over the years. We have concentrated on three aspects of clock is twice as fast as a processor with a 2 GHz clock.
computer acceleration: pipelining to improve the throughput of
8.2 What are the characteristics of a CISC processor?
the CPU, cache memory to reduce the effective access time of
the memory system, and increased parallelism to improve 8.3 The most frequently executed class of instruction is the
performance without modifying the instruction set data move instruction. Why is this? What are the implica-
architecture. tions for computer design?
The movement towards RISC processors in the 1980s was 8.4 The 68020 microprocessor has a BFFFO (bit field find first
driven by a desire to exploit instruction level parallelism by one) bit-field instruction. This instruction scans a string of
overlapping, or pipelining, the execution of instructions. We have up to 32 bits at any point in memory (i.e. the string does
seen how pipelining can be very effective with an n-stage not have to start on any 8-bit boundary) and returns the
pipeline providing an n-fold increase in performance—in theory. location of the first bit set to 1 in the instruction. For exam-
In practice, the ultimate performance of pipelined architectures ple, BFFFO (A0){D1:D2},D0 takes the byte at the
is degraded by the branch penalty and data dependency. address pointed at by address register A0 and locates the
Instructions that alter the flow of control (branches, subroutine start of the bit string at the number of bits in D1 away from
calls, and returns) throw away the instructions that are already the most-significant bit at this address. The string, whose
in the pipeline. length is in register D2, is scanned and the location of the
The second part of this chapter introduced the cache first 1 is deposited in D0 (this is a simplified description of
memory, which can radically improve the performance of a the BFFFO instruction).
computer system for relatively little cost. Cache memory uses a In order to demonstrate the complexity of a BFFFO
small amount of high-speed memory to hold frequently used instruction, write the equivalent 68K assembly language
data. Cache memory is effective because most programs may code to implement BFFFO (A0){D1:D2},D0.
have large program or data sets but, for 80% of the time, they 8.5 The Berkeley RISC has a 32-bit architecture and yet provides
access only 20% of the data. We looked at how the performance only a 13-bit literal.Why is this and does it really matter?
of cache memory can be calculated and how cache memory is
organized. We described the three forms of cache organization: 8.6 What are the advantages and disadvantages of register
directly mapped, associative, and set associative. Direct-mapped windowing?
cache is easy to design but is limited by its restriction on what 8.7 Some RISC processors with 32 registers, r0 to r31, force
data can be stored. Associative memory provides the optimum register r0 to contain zero. That is, if you read the contents
performance in theory but is impossible to construct and you of r0, the value returned is always 0. Why have the design-
have to implement an algorithm to replace old data once the ers wasted a register by making it read-only and perma-
cache is full. Set associative cache is only slightly more complex nently setting its content to 0?
8.5 Microprocessor systems 363
8.8 What is pipelining and how does it increase the computer has a 50 Mhz clock and all operations require at
performance of a computer? least two clock cycles. If the hit ratio is 92%, what is the
8.9 Consider the expression theoretical speedup ratio for this system?
(A1)(A2)(AB)(ABC)(ABC1) 8.18 A computer has main memory with an access time of
(ABC1)(DE)(AB)(ABC) 60 ns and cache memory with an access time of 15 ns. The
computer has a 50 Mhz clock and all operations require
Assuming a simple three-operand format with instruc- two clock cycles. On average the computer spends 40% of
tions ADD, SUB, MULT, and DIV (and that all data is in its time accessing memory and 60% performing internal
registers), write the assembly language code to implement operations (an internal operation is a non-memory
this expression with the minimum data dependency access). If the hit ratio is 92%, what is the speedup ratio
(assuming that dependency extends to the next for this system?
instruction only).
8.19 What is the fundamental limitation of a direct-mapped
8.10 The code of a computer is examined and it is found that, cache?
on average, for 70% of the time the runlength between
8.20 How can the performance of a direct-mapped cache
instructions that change the flow of control is 15 instruc-
memory be improved?
tions. For the remainder of the time, the runlength is 6
instructions. Other cases can be neglected. 8.21 A computer has main memory with an access time of
This computer has a five-stage pipeline and no special 50 ns and cache memory with an access time of 10 ns. The
techniques are used to handle branches. cache has a line size of 16 bytes and the computer’s mem-
What is the speedup ratio of this computer? ory bus is 32 bits wide. The cache controller operates in a
burst mode and can transfer 32 bytes between cache and
8.11 A pipeline is defined by its length (i.e. the number of
main memory in 80 ns. Whenever a miss occurs the cache
stages that can operate in parallel). A pipeline can be short
must be reloaded with a line. If the average hit ratio is
or long. What do you think are the relative advantages of
90%, what is the speedup ratio?
long and short pipelines?
8.22 What is cache coherency and why is it important only in
8.12 What is data dependency in a pipelined system and how
sophisticated systems?
can its effects be overcome?
8.23 What are the similarities and differences between mem-
8.13 RISC architectures don’t permit operations on operands in
ory cache and so-called disk cache?
memory other than load and store operations. Why?
8.24 For the following ideal systems, calculate the hit
8.14 The average number of cycles required by a RISC to exe-
ratio (h) required to achieve the stated speedup
cute an instruction is given by
ratio S.
Tave 1 pbptb peb (a) tm 60 ns tc 10 ns S 1.1
(b) tm 60 ns tc 10 ns S 1.5
where
(c) tm 60 ns tc 10 ns S 3.0
the probability that a given instruction is a branch is pb
(d) tm 60 ns tc 10 ns S 4.0
the probability that a branch instruction will be
taken is pt 8.25 Draw a graph of the speedup ratio for an ideal system for
if a branch is taken, the additional penalty is b cycles k 0.5, k 0.2, k 0.1 (plot the three lines on the same
if a branch is not taken, there is no penalty graph). The value of k defines the ratio of cache to main
pe is the effective probability of a branch (pb.pt) store access times (tc/tm).
The efficiency of a pipelined computer is defined as the
average number of cycles per instruction without branches 8.26 What is the meaning of speedup ratio and efficiency in the
divided by the average number of instructions with context of multiprocessor systems?
branches. This is given by 1/Tave. 8.27 In a multiprocessor with p processors, the ideal speedup
Draw a series of graphs of the average number of cycles factor is p and the efficiency is 1. In practice, both of these
per instruction as a function of pe for b 1, 2, 3, and 4. The ideal values are not achieved. Why?
horizontal axis is the effective probability of a branch 8.28 What is Amdahl’s law and why is it so important? Is it the
instruction and ranges from 0 to 1. same as ‘the law of diminishing returns’?
8.15 What is branch prediction and how can it be used to 8.29 If a system has 128 processors and the fraction of code
reduce the so-called branch penalty in a pipelined system? that must be carried out serially is 0.1, what is the speedup
8.16 A computer has main memory with an access time of ratio of the system?
60 ns and cache memory with an access time of 15 ns. If 8.30 A computer system has 32 microprocessors and the frac-
the average hit ratio is 92%, what is the maximum theo- tion of code that is carried out serially is 5%. Suppose you
retical speedup ratio? wish to run the same code on a system with 24 proces-
8.17 A computer has main memory with an access time of sors. What fraction of the code may be executed serially to
60 ns and cache memory with an access time of 15 ns. The maintain the same speedup ratio?
364 Chapter 8 Accelerating performance
8.31 In the context of a multiprocessor system, define the 8.32 What are the relative advantages and disadvantages of the
meaning of the following terms. unconstrained topology, the bus, and the ring multi-
(a) Topology processor topologies?
(b) Deadlock 8.33 A fully connected multiprocessor topology is one in which
(c) Tightly coupled each of the p processors is connected directly to each of
(d) Arbitration the other processors. Show that the number of connec-
(e) Latency tions between processors is given by p(p 1)/2.
Processor architectures 9
CHAPTER MAP
INTRODUCTION
When we introduced the CPU and assembly language programming, we were forced to limit the
discussion to one processor to avoid confusing readers by presenting a range of different processor
architectures. Because students should at least appreciate some of the differences between
architectures, we now look at two variations on the von Neumann architecture. We begin with the
microcontroller, which is a descendant of the first-generation 8-bit microprocessor. Our aim is
only to provide students with an idea of how the instruction set of a microcontroller differs from
that of more sophisticated architectures like the 68K.
The second processor to be introduced here is the ARM, a high-performance RISC processor. This
device has a 32-bit architecture with the characteristics of traditional RISC processors like MIPS
but which has some very interesting architectural facilities. We look at the ARM in greater detail
than the microcontroller and provide development tools on the disk accompanying this text.
This chapter is not essential reading in for all computer architecture courses—but it is worth
skimming though just to appreciate some of the architectural differences between processors.
large register sets (the Itanium has 128 general-purpose or more of the operands must be a register; that is, memory-
64-bit registers). to-memory operations are not generally permitted.
Processors with few registers use special instructions to The three-address format is used by RISC processors such
indicate the register such as LDX or LDY to specify load the X as MIPS, the PowerPC, and the ARM. Real processors require
or Y register. Processors with many registers number their three register addresses. Typically, the only memory accesses
registers sequentially and implement instructions such as permitted by RISC processors are load and store.
ADD R1, R2, R3.
The zero address format doesn’t require operand addresses example, a processor may not include a multiplication
because operations are applied to the element or elements instruction, which means that you have to write a routine to
at the top of the stack. This format is used only by some perform multiplication by shifting and adding.
calculators designed for arithmetic operations and some Some processors have instructions that are not necessary;
experimental processors; for example, performing the opera- for example, one processor might implement CLR D0 to load the
tion (Y Z) ·X might be implemented by the following contents of D0 with 0, whereas another processor may require
hypothetical code you to write MOVE #0,D0 or SUB D0,D0 to do the same thing.
The one-address format was adopted by first-generation The trend to complex instruction sets in the 1980s led to
processors and is still used by microcontrollers. instructions such as the 68020’s BFFFO which scans a sequence
The two-address instruction format is used by the main- of bits and returns the location first bit that was set to 1. Such
stream CISC processors such as the 68K or the Pentium. This complex instructions appeared to be dying out with the advent
is often called a ‘one-and-a-half-address’ format because one of the high-speed, streamlined RISC architectures of the 1980s.
9.2 The microcontroller 367
MICROCONTROLLER FAMILIES
High-performance microcomputers are like jet aircraft; 6805 Motorola’s 6805 was their first microcomputer with an
their development costs are so high that there are relatively few architecture very similar to the 6800. It was initially aimed at
different varieties.The same is not true of microcontrollers and the automobile market.
there are more members of microcontroller families than 68HC11 The 68HC11 is one of the best-selling
varieties of salad dressing. Because microcontrollers are a very- microcontrollers of all time. It has a 6800-like architecture but
low-cost circuit element, they have been optimized for very includes ROM, RAM, and peripherals.
specific applications.You can select a particular version of a
microcontroller family with the RAM, ROM, and I/O you require 68HC12 The 68HC12 is an extension of the 68HC11. It has
for your application. more instructions, enhanced addressing modes and some
The generic Motorola microcontroller families are as follows. 16-bit capabilities.
68HC16 The 68HC16 has a 16-bit architecture
6800 This was the original 8-bit Motorola microprocessor. The
and is an enhanced 68HC12 rather than a new
6800 is not a microcontroller because it lacks internal
architecture.
memory and peripherals.
In recent years, the trend towards simplicity has reversed We look at two processors. We begin with the 68HC12
with the advent of the so-called SIMD (single instruction, microcontroller to demonstrate what an embedded con-
multiple data) instruction. Such an instruction acts on multi- troller looks like. Then we introduce the ARM, a RISC
ple data elements at the same time; for example, a 64-bit processor with a simple instruction set and some interesting
register may contain eight 8-bit bytes that can be added in architectural features.
parallel to another eight bytes at the same time. These
instructions are used widely in multimedia applications
where large numbers of data elements representing sound or 9.2 The microcontroller
video are processed (e.g. Intel’s MMX extensions).
One of the first major competitors to Intel’s 8080 8-bit
microprocessor was the Motorola 6800, which has a signifi-
9.1.4 Addressing modes cantly simpler register model than the 8080. The 6800 has a
An important element of an instruction set is the addressing single 8-bit accumulator and 16-bit index register, which lim-
mode used to access operands. First-generation microproces- its its performance because you have to load the accumulator,
sors used absolute addressing, literal addressing, and indexed perform a data operation, and then store the result before you
(register indirect) addressing. The generation of 16- and can reuse the accumulator.
32-bit CISC processors widened addressing, modes by First-generation microprocessors had 16-bit program
providing a richer set of indexed addressing modes; for exam- counters that supported only 64 kbytes of directly address-
ple, the 68K provided autoincrementing with (A0), and able memory. Although 64 kbytes is tiny by today’s
double indexing with a literal displacement with 12(A0, D0). standard’s, in the mid-1970s 64 kbytes was considered as
positively gigantic.
Motorola later introduced the 6809, an architecturally
9.1.5 On-chip peripherals advanced 8-bit processor, to overcome the deficiencies of the
Microprocessor families also differ in terms of the facilities 6800. Unfortunately, the 6809 appeared just as the 68K was
they offer. The processor intended for use in workstations or about to enter the market; few wanted a super 8-bit processor
high-end PCs is optimized for performance. Peripherals such when they could have a 16- or 32-bit device.1
as I/O ports and timers are located on the motherboard. Motorola created a range of microcontrollers aimed at the
A microcontroller intended of use in an automobile, cell low-cost high-volume industrial microcontroller market. We
phone, domestic appliance, or toy is a one-chip device that are going to describe the architecture of the popular 8-bit
contains a wealth of peripherals as well as the CPU itself. For
example, a microcontroller may contain an 8-bit CPU, ran-
dom access memory, user-programmable read-only mem- 1
‘Better late than never’. No way! Motorola seems to have been very
ory, read/write RAM, several timers, parallel and serial I/O unlucky. The high-performance 68K with a true 32-bit architecture lost
devices, and even analog-to-digital converters. The micro- out to Intel’s 16-bit 8086 when IBM adopted the Intel architecture
because IBM couldn’t wait for the 68K. Similarly, the 6809 appeared just
controller can be used to implement a complete computer as the world of high-performance computing was moving from 8 bits to
system costing less than a dollar. 16/32 bits.
368 Chapter 9 Processor architectures
M68HC12, which is object code compatible with Motorola’s Motorola-style processors employed fewer specialized
8-bit MC68HC11 but with more sophisticated addressing registers than Intel-style processors. This approach reduced
modes and 16-bit arithmetic capabilities. the number of different instructions, because more bits have
Before we look at microcontroller register sets, we will say to be devoted to specifying which register is to take part in an
a few words about one of the differences between the Intel- operation. Equally, it makes it easier to write assembly
style processors and Motorola-style processors. The bits of an language code.
instruction are precious, because we wish to cram as many
different instructions into an 8-bit instruction set as possible.
Intel processors reduce the number of bits required to specify
9.2.1 The M68HC12
a register by using dedicated registers. For example, if arith- In this chapter we are interested in the instruction set
metic operations can be applied only to one accumulator, it’s architecture of microprocessors. The MC68HC12 microcon-
not necessary to devote op-code bits to specifying the register. troller family provides an interesting contrast with the 68K
On the other hand, this philosophy makes life difficult for because of its simplicity and its similarity to first-generation
programmers who have to remember what operations can be 8-bit microprocessors. We are not able to discuss the
applied to what registers. Moreover, programmers with lim- microprocessor’s most important features—its on-chip
ited registers have to spend a lot of time moving data between peripherals that let you implement a complete computer in
registers. one single chip. Figure 9.1 provides an indication of the
VRH VRH
VFP 32-KBYTE FLASH EEPROM/ROM VRL VRL
VDDA VDDA
VSSA VSSA
1-KBYTE RAM
AN0 PAD0
768-KBYTEPROM AN1 PAD1
ATD AN2 PAD2
PORT AD
TIMER AND
DDRT
PE7 DBE
SDI/MISO PS4
SDO/MOSI PS5
VDD × 2 SPI
SCK PS6
VSS × 2 CSSS PS7
PW3 PP3
DDRP
PB7
PB6
PB5
PB4
PB3
PB2
PB1
PB0
RxCAN RxCAN
msCAN
TxCAN TxCAN
ADDR10
ADDR15
ADDR14
ADDR13
ADDR12
ADDR11
ADDR7
ADDR6
ADDR5
ADDR4
ADDR3
ADDR2
ADDR1
ADDR0
ADDR9
ADDR8
PORT CAN
I/O PCAN2
DDRCAN
I/O PCAN3
I/O I/O PCAN4
DATA10
DATA15
DATA14
DATA13
DATA12
DATA11
DATA9
DATA8
DATA7
DATA6
DATA5
DATA4
DATA3
DATA2
DATA1
DATA0
MC68HC12’s capabilities. The microcontroller contains Note how the 8-bit mnemonic combines the operation
both read–write memory for scratchpad storage and flash and the operand; for example, INX increments the X register
EPROM to hold programs and fixed data. There is a wealth of and INCA and INCB increment the A and B accumulators.
input and output ports, serial interfaces, and counter timers. The M68HC12 uses load and store mnemonics to move data
Such a chip can be used to control a small system (e.g. a between memory and registers, rather than the 68K’s more
digital camera or a cell phone) with very little additional generic MOVE instruction; for example, LDX and STX load and
hardware. store data in the X register.
Figure 9.2 illustrates a user register model of the Eight-bit code uses variable-length instructions. Common
M68HC12 microcomputer family. Unlike the first 8-bit operations like INCA are encoded as a single byte. The
processors, it has two general-purpose 8-bit accumulators A equivalent operation, ADDA #1, takes two bytes—one for the
and B. The inclusion of a second accumulator reduces the op-code and one for the literal.
370 Chapter 9 Processor architectures
Table 9.1 The M68HC11’s indexed addressing modes (you can replace A by B and X by Y).
Arithmetic group
Add M to A [A] ← [A] [EA] ADDA
Add M to A [B] ← [B] [EA] ADDB
Add M to D [D] ← [D] [EA] ADDD
Add B to X [X] ← [X] [B] ABX
Add B to Y [Y] ← [Y] [B] ABY
Add B to A [A] ← [A] [B] ABA
Add M to A with carry [A] ← [A] [EA] [C] ADCA
Add M to B with carry [B] ← [B] [EA] [C] ADCB
Subtract M from A [A] ← [A] [EA] SUBA
Subtract M from B [B] ← [B] [EA] SUBB
Subtract M from D [D] ← [D] [EA] SUBD
Subtract B from A [A] ← [A] [B] SBA
Subtract M from A with carry [A] ← [A] [EA] [C] SBCA
Subtract M from B with carry [B] ← [B] [EA] [C] SBCB
Clear M [EA] ← 0 CLR
Clear A [A] ← 0 CLRA
Clear B [B] ← 0 CLRB
Negate M [EA] ← 0 [EA] NEG
Negate A [A] ← 0 [A] NEGA
Negate B [B] ← 0 [B] NEGB
Multiply A by B [D] ← [A] [B] MUL
Compare A with M [A] [EA] CMPA
Compare B with M [B] [EA] CMPB
Compare D with M [D] [EA] CMPD
Compare A with B [A] [B] CBA
Test M [EA] 0 TST
Test A [A] 0 TSTA
Test B [B] 0 TSTB
controllers. They are rather less effective when asked to per- 16-bit D register and puts the larger value in the memory
form numeric operations on floating point data or when they location or D register.
attempt to compile programs
in modern high-level EMAXM 0,X [[X]] ← max([[X]],[D]) maximum value in memory
languages. EMAXD 0,X [D] ← max([[X]],[D]) maximum value in D register
The MC68HC12’s instruc-
tion set is conventional with Similarly, EMIND 0, X puts the lower of the memory loca-
just a few special instructions intended to accelerate some tion and the D register in the D register.
applications; for example, instructions are provided to Consider the following fragment of code, which uses the
extract the maximum and minimum of two values. Consider 8-bit signed minimum function and indexed addressing with
the following example, which compares the memory location postincrementing to find the minimum value in a four-
pointed at by the X register and the unsigned contents of the element vector.
9.3 The ARM—an elegant RISC processor 375
Sample MC68HC12 code develop further generations of RISC processors (called the
In many ways, MC68HC12 code is not too dissimilar to 68K ARM family). In 1998 the company was floated on the stock
code; it’s just rather more verbose. The lack of on-chip registers market and became ARM Ltd. We are going to use the ARM
means that you have to frequently load one of the two accumu- processor to illustrate the RISC philosophy because it is easy to
lators from memory, perform an operation, and then restore understand and it incorporates some interesting architectural
the element to memory. The following example demonstrates a features.
program to find the maximum of a table of 20 values.
LDAA #0 ; clear A
STAA Maximum ; set up dummy maximum value of 0
LDX #Table+N-1 ; X points to the end of the table
LDAB #N-1 ; register B is the element counter set to count down
SHADOW REGISTERS
When an interrupt occurs, a processor is forced to suspend the Some devices like the ARM provide shadow registers. These
current program and to carry out a task defined by the are copies of a register that are associated with an
operating system. This means that registers being used by the interrupt. When an interrupt occurs and the processor
pre-interrupt program might be overwritten by the handles it, an old register is ‘switched out’ and a new one
interrupting program. switched in. When the interrupt has completed, the old
A simple solution is for interrupt handlers to save data in register is switched in again. In this way, no data has to be
registers, use the registers, and then restore the data before moved.
returning from the interrupt. This process takes time.
All operands are 32 bits wide, except for some multiplication The ARM runs in its user mode except when it switches to
instructions that generate a 64-bit product in two registers, and one of its other five operating modes. These modes corres-
byte and halfword accesses (64-bit and halfword accesses are pond to interrupts and exceptions and are not of interest to
available only on some members of the ARM family). The us in this chapter. Interrupts and exceptions switch in new
ARM has 16 user-accessible general-purpose registers called r0 r13 and r14 registers (the so-called fast interrupt switches in
to r15 and a current program status register (CPSR), that’s new r8 to r14 registers as well as r13 and r14). When a mode
similar to the condition code register we’ve described earlier. switch occurs, registers r0 to r12 are unmodified. For our cur-
The ARM doesn’t divide registers into address and data rent purposes we will assume that there are just 16 user-
registers like the 68K—you can use any register as an address accessible registers r0 to r15. Figure 9.3 describes the ARM’s
register or a data register. Most 32-bit RISC processors have register set.
32 general-purpose registers, which require a 5-bit operand The current processor status register is accessible to the pro-
field in the instruction. By reducing the number of bits in an grammer in all modes. However, user-level code can’t modify
instruction used to specify a register, the ARM has more bits the I, F, and M0 to M4 bits (this restriction is necessary to
available to select an op-code. The ARM doesn’t provide lots enable the ARM to support a protected operating system).
of different instructions like a CISC processor. Instead, it pro- When a context switch occurs between states, the CPSR is
vides flexibility by allowing instructions to do two or more saved in the appropriate SPSR (saved processor state register).
things at once (as we shall soon see). In some ways, the ARM In this way, a context switch does not lose the old value of
is rather like a microprogrammed CPU. the CPSR.
The ARM’s registers are not all general purpose because
two serve special purposes. Register r15 contains the program Summary of the ARM’s register set
counter and register r14 is used to save subroutine return ●
The ARM has 16 accessible 32-bit registers called r0 to r15.
addresses. In ARM programs you can write pc for r15 and lr ●
Register r15 acts as the program counter, and r14 (called the
(link register) for r14.
link register) stores the subroutine return address.
Because r15 is as accessible to the programmer as any other
register, you can easily perform computed gotos; that is,
●
You can write PC for r15 in ARM assembly language, lr for
r14, and sp for r13.
MOV pc, r10 forces a jump to the address in register r10.
By convention, ARM programmers reserve register r13 as a
●
By convention, register r13 is used as a stack pointer.
stack pointer, although that is not mandatory. However, there is no hardware support for the stack
pointer.
The ARM has more than one program status register
(CPSR—see Fig. 9.3). In normal operation the CPSR
●
The ARM has a current program status register (CPSR),
contains the current values of the condition code bits which holds condition codes.
(N, Z, C, and V) and eight system status bits. The I and F bits ●
Some registers are not unique because processor exceptions
are used to disable interrupt requests and fast interrupt create new instances of r13 and r14.
requests, respectively. Status bits M0 to M4 indicate the ●
Because the return address is not necessarily saved on the
processor’s current operating mode. The T flag is imple- stack by a subroutine call, the ARM is very fast at imple-
mented only by the Thumb-compatible versions of the ARM menting subroutine return calls.
family. Such processors implement two instruction sets, the As most readers will have read the chapter on the CISC
32-bit ARM instruction set and a compressed 16-bit Thumb processor and are now familiar with instruction sets and
instruction set.2
2
When an interrupt occurs, the ARM saves the pre-exception The ARM’s Thumb mode is designed to make the processor look
value of the CPSR in a stored program status register (there’s like a 16-bit device in order to simplify memory circuits and bus design
in low-cost applications such as cell phones.
one for each of the ARM’s five interrupt modes).
9.3 The ARM—an elegant RISC processor 377
Condition Interrupt control bits Figure 9.3 The ARM’s register set.
flag bits
addressing modes, we provide only a short introduction to the The bit clear instruction BIC performs the operation
ARM’s instruction set before introducing some of its develop- AND NOT so that BIC r1, r2, r3 is defined as [r1]
ment tools and constructing simple ARM programs. ←[r2]• [r3]. Consider the effect of BIC r1,r2,r3 on
operands [r2] 11001010 and [r3] 11110000. The
result loaded into r1 is 00001010 because each bit in the sec-
9.3.2 ARM instructions ond operand set to 1 clears the corresponding bit of the first
operand.
The basic ARM instruction set is not, at first sight, exciting. A The multiply instruction, MUL, has two peculiarities. First,
typical three-operand register-to-register instruction has the the destination (i.e. result) register must not be the same as
format the first source operand register; for example, MUL r0, r0,
r1 is illegal whereas MUL r0,r1,r0 is legal. Second, the MUL
instruction may not specify an immediate value as the second
and is interpreted as [r1] ← [r2] [r3]. Note the order operand.
of the operands—the destination appears first (left to right), The multiply and accumulate instruction MLA performs a
then the first source operand, and finally the second source multiplication and adds the result to a running total. It has
operand. Table 9.3 describes some of the ARM’s data process- the four-operand form MLA Rd,Rm,Rs,Rn. The RTL defini-
ing instructions. tion of MLA is
The ARM has a reverse subtract; for example, SUB r1, r2,
r3 is defined as [r1] ← [r2] [r3], whereas the reverse [Rd] ← [Rm] [Rs] [Rn]
subtract operation RSB r1, r2, r3 is defined as [r1]←
[r3] [r2]. A reverse subtract operation is useful because The result is truncated to 32 bits.
you can do things like
Table 9.3 The ARM data processing and data move instructions.
The ARM has two compare instructions. The conventional The ARM’s built-in shift mechanism
CMP Rn, Rs evaluates [Rn] [Rd] and sets the condition ARM data processing instructions can combine an arith-
codes in the CPSR register. The compare negated instruc- metic or logical operation with a shift operation. The shift is
tion, CMN Rn, Rs, also performs a comparison, except that applied to operand 2 rather than the result. For example, the
the second operand is negated before the comparison ARM instruction
is made.
The ARM has a test equivalence instruction, TEQ Rn, Rs,
which tests whether two values are equivalent. If the two shifts the 32-bit operand in register r3 left by four places
operands are equivalent, the Z-bit is set to 1. This instruction before adding it to the contents of register r2 and depositing
is very similar to the CMP, except that the V-bit isn’t modified the result in register r1. In RTL terms, this instruction is
by a TEQ. defined as
The test instruction, TST Rn, Rs, tests two operands by
ANDing their operands bit by bit and then setting the condi-
tion code bits. The TST instruction allows you to mask out Figure 9.4 illustrates the format of a data processing
bits of the operand you wish to test. For example, if r0 con- instruction. As you can see, the encoding of an ARM instruc-
tains 0 . . . 00011112, the effect of TST r1, r0 is to mask the tion follows the general pattern of other RISC architectures:
contents of r1 to four least-significant bits and then to an opcode, some control bits, and three operands. Operands
compare those bits with 0. Rn and Rd specify registers. Operand 2 in bits 0 to 11 of the
9.3 The ARM—an elegant RISC processor 379
31 28 27 26 25 24 21 20 19 16 15 12 11 0
cond 0 0 # op-code S Rn Rd operand 2
If bit 25 is zero
operand 2 is a 11 7 6 5 4 3 0 Figure 9.4 Format of the
shifted register. ARM’s data processing
0 number of shifts shifts 0 Rm instructions.
op-code in Fig. 9.4 selects either a third register or a literal. shifts the second operand in r3 three places left to multiply it
The ARM’s designers use this field to provide a shift function by 8. This value is added to operand 1 (i.e. r3) to generate
on all data processing instructions. 8 R3 R3 9 R3. However, instructions such as
When bit 25 of an op-code is 0, operand 2 both selects a ADD r3, r3, r3, LSL #3 take an extra cycle to complete
second operand register and a shift operation. Bits 5 to 11 because the ARM can read only two registers from the regis-
specify one of five types of shift and the number of places to ter file in a single cycle.
be shifted. The shifts supported by the ARM are LSL (logical This ability to scale operands is useful when dealing with
shift left), LSR (logical shift right), ASR (arithmetic shift tables. Suppose that a register contains a pointer to a table
right), ROR (rotate right), and RRX (rotate right extended by of 4-byte elements in memory and we wish to access ele-
one place). The RRX shift is similar to the 68K’s ROXL (rotate ment number i. What is the address of element i? The
right extended) in which the bits are rotated and the carry bit address of the ith element is the pointer plus 4 i. If we
is shifted into the vacated position. These shifts are similar to assume that the pointer is in register r0 and the offset is in
the corresponding 68K shifts and are defined as: r1, the pointer to the required element, r2, is given by
LSL The operand is shifted left by 0 to 31 places. The vacated bits at the least-significant end of the
operand are filled with zeros.
LSR The operand is shifted right 0 to 31 places. The vacated bits at the most-significant end of the
operand are filled with zeros.
ASL The arithmetic shift left is identical to the logical shift left. This multiplies a number by
2 for each shift.
ASR The operand is shifted right 0 to 31 places. The vacated bits at the most-significant end of the
operand are filled with zeros if the original operand was positive, or with 1s if it was negative (i.e. the
sign-bit is replicated). This divides a number by 2 for each place shifted.
ROR The operand is rotated by 0 to 31 places right. The bit shifted out of the least-significant end is
copied into the most-significant end of the operand. This shift preserves all bits. No bit is lost by the
shifting.
RRX The operand is rotated by one place right. The bit shifted out of the least-significant end of the
operand is shifted into the C-bit. The old value of the C-bit is copied into the most-significant end of
the operand; that is, shifting takes place over 33 bits (i.e. the operand plus the C-bit).
We have been able to scale the offset by 4 (because The 16 conditions described in Table 9.4 are virtually the
each integer requires 4 bytes) before adding it to r0 in a same as those provided by many other microprocessors. One
conventional way. This instruction performs the operation condition is the default case always and means that the cur-
[r2] ← [r0] [r1] 4. rent instruction is to be executed. The special case never is
The ARM permits dynamic shifts in which the number of reserved by ARM for future expansion and should not be
places shifted is specified by the contents of a register. In this used. In order to indicate the ARM’s conditional mode to the
case the instruction format is similar to that of Fig. 9.4, except assembler, all you have to do is to append the appropriate
that bits 8 to 11 specify the register that defines the number of condition to a mnemonic. Consider the following example in
shifts, and bit 4 is 1 to select the dynamic shift mode. If regis- which the suffix EQ is appended to the mnemonic ADD to get
ter r4 specifies the number of shifts, we can write
ADDEQ r1,r2,r3
9.3.3 ARM branch instructions The ARM’s ability to make the execution of each instruction
One of the ARM’s most interesting features is that each conditional makes it easy to write compact code. Consider
instruction is conditionally executed. Bits 28 to 31 of each the following extension of the previous example
ARM instruction provide a condition field
that defines whether the current instruc-
tion is to be executed—see Table 9.4.
There is, of course, nothing to stop you combining condi- 8-bit literal is N and the 4-bit alignment is n in the range 0 to 12,
tional execution and shifting because the branch and shift the value of the literal is given by N 22n. Note that the scale
fields of an instruction are independent. You can write factor is 2n. This mechanism is, of course, analogous to the way
in which floating point numbers are represented. Scaling is
performed automatically by the assembler. If you write
which is interpreted as
Op-code Operation
update flags
update flags
update flags
instructions are word aligned on a 32-bit boundary. Because the branch with link instruction can be
Consequently, the byte and halfword parts of the offset do made conditional, the ARM implements a full set of
not have to be stored as they will always be zero. conditional subroutine calls. You can write, for example,
The simple unconditional branch has the single-letter
mnemonic B, as the following demonstrates
The mnemonic BLLT is made up of B (branch
unconditionally), L (branch with link), and LT (execute on
You can implement a loop construct in the following way
condition less than).
a 32-bit address; for example, the MOV or MOV instructions, or it places the constant in
memory and uses program counter relative addressing to
load the constant.
loads the contents of register r0 with the 32-bit address
‘table’. The ARM assembler treats the ADR as a pseudo Accessing memory
instruction and then generates the code that causes the appro- The ARM implements two flexible memory-to-register and
priate action to be carried out. The ADR instruction attempts register-to-memory data transfer operations, LDR and STR.
to generate a MOV, MVN, ADD, or SUB instruction to load the Figure 9.7 illustrates the structure of the ARM’s memory ref-
address into a register. erence instructions. Like all ARM instructions, the memory
Figure 9.6 demonstrates how the ARM assembler treats an access operations LDR and STR have a conditional field and
ADR instruction. We have used ARM’s development system to can, therefore, be executed conditionally.
show the source code, the disassembled code, and the regis- The ARM’s load and store instructions use address register
ters during the execution of the program (we’ll return to this indirect addressing to access memory. ARM literature refers to
system later). As you can see, the instruction ADR r5, table1 this as indexed addressing. Any of the ARM’s 16 registers can
has been assembled into the instruction ADD r5,pc,0x18, act as an address (i.e. index) register.
because table1 is 1816 bytes onward from the current con- Bit 20 of the op-code determines whether the instruction is
tents of the program counter in r15. That is, the address a load or a store, and bit 25, the # bit, determines the type
table1 has been synthesized from the value of the PC plus of the offset used by indexed addressing. Let’s look at some of
the constant 1816. the various forms of these instructions. Simple versions of the
The ARM assembler also supports a similar pseudo opera- load and store operations that provide indexing can be written
tion. The construct LDR rd, value is used to load value
into register rd. The LDR
pseudo instruction uses
This is the
ADR
instruction in
the source
code
This is the
actual code
stored in
memory
31 28 21 26 25 24 23 22 21 20 19 16 15 12 11 0
Source/destination register
Base register
11 0
immediate offset
0 12-bit immediate value
11 7 6 5 4 3 0
register-based offset
1 shift length type 0 register
These addressing modes correspond exactly to the 68k’s the pointer register is also incremented by 8. By modifying
address register indirect addressing modes MOVE.L (A1),D0 the above syntax slightly, we can perform post-indexing by
and MOVE.L D2, (A3), respectively. accessing the operand at the location pointed at by the base
The simple indexed addressing mode can be extended by register and then incrementing the base register, as the fol-
providing an offset to the base register; for example, lowing demonstrates.
offset. The B-bit can be set to force a byte operation rather can be loaded and sign-extended. Typical load/store
than a word. Whenever a byte is loaded into a 32-bit register, instructions are
bits 8 to 31 are set to zero (i.e. the byte is not sign-extended).
The P- and W-bits control the ARM’s auto-indexing
modes. When W 1 and P 1, pre-indexed addressing is LDHR Load unsigned halfword (i.e., 16 bits)
performed. When W 0, P 0, post-indexed addressing is LDRSB Load signed byte
performed. LDHSH Load signed halfword
Consider the following example, which calculates the total STHR Store halfword
of a table of bytes terminated by zero.
Single step
control
The middle of the string is located when either the left character-fetch operations and therefore we have to take
pointer is one less than the right pointer or the left pointer is account of this when comparing the pointers). We can fix the
equal to the right pointer. problem in three ways: update the pointers only after the test for
We can easily write a fragment of code that scans the string. the middle, take copies of the pointers and move them back
In the following code (written in the form of a subroutine), before comparing them, or perform a new test on the copies for
register r0 points to the left-hand end of the string and regis- left_ pointer right_pointer 2 and left_pointer
ter r1 points to the right hand end of the string. Remember right_pointer 1 . We will use the first option to get
again LDRB r3,[r0],#1 ;get left hand character and update pointer
LDRB r4,[r1],#-1 ;get right hand character and update pointer
CMP r3,r4 ;compare characters the at ends of the string
BNE notpal ;if characters different then fail
.
. ;test for middle of string
.
BNE again ;if middle not found then repeat
waspal ;end up here if string is palindrome
notpal MOV pc,lr ;return from subroutine
The code we wrote to scan the palindrome automatically The following code provides the complete program to test
updates the pointers when they are used to fetch characters (e.g. a string. We begin by scanning the string (which is terminated
the left pointer is used and updated by LDRB r3, [r0],#1 and by a 0) to find the location of the right-hand character. The
the right pointer is updated by LDRB r3,[r1],#-1) subroutine either returns 0 in r10 (not palindrome) or 1 (is
This means that both pointers are updated during the palindrome).
9.3 The ARM—an elegant RISC processor 389
Note the three lines of code labeled by stop. I copied the The linker creates a new file called PROG1, which can be
code from ARM’s literature because it offers a means of loaded into the ARM simulator.
halting program execution by calling an operating system Once we have created a file to run, we can call the
function. Other versions of the ARM simulator may require Windows-based ARM debugger by clicking on the ADW icon
different termination mechanisms. You can always terminate (assuming you’ve loaded ARM’s package on your system).
a program by implementing an infinite loop: This loads the development system and creates the window
Finish B Finish shown in Fig. 9.10. By selecting the File item on the top
Having written the program (using an ASCII editor), we toolbar, you get a pull-down menu whose first item is Load
assemble it with the command ARMASM. If the program is image (see Fig. 9.11). Clicking on Load image invokes the
called PROG1.s, it is assembled by window used to open a file and lists the available files (see
Fig. 9.12). In this case, we select the file called Prog1.
ARMASM -g PROG1.s Figure 9.13 shows the situation after this program has been
The assembly process produces a new object file called loaded.
PROG1.o. The ‘-g’ option generates debugging information The Execution window in Fig. 9.13 shows the code
for later use. If no errors are found during the assembly loaded into the debugger. Note that the ARM development
phase, the object code must be linked to produce the binary system creates a certain amount of header code in addition
code that can be executed by an ARM processor (or simulated to your program. We are not interested in this code.
on a PC). The command used to link a program is ARMLINK. Figure 9.13 shows address 0 00008008 highlighted—this
In this case we write is the point at which execution is to begin (i.e. the initial
value of the program counter). However, you can also start
ARMLINK PROG1.o -o PROG1
390 Chapter 9 Processor architectures
Figure 9.10 The initial window after loading the ARM debugger.
the program by setting a breakpoint to 0 8080 and then program’s comment field (without losing the Registers
running the code to the breakpoint. Doing this executes the window). The first instruction to be executed is highlighted.
start up code and then stops simulation at the appropriate We can now begin to execute the program’s instructions to
point. test whether a string is a palindrome. There are several ways
We can view other windows beside the Execution of running a program in the ARM debugger; for example, we
window. In Fig. 9.14 we have selected the View command can run the whole program until it terminates, execute a group
on the top toolbar and have chosen Registers from of instructions, or execute a single instruction at a time. If you
the pull-down list to give a second pull-down list of click on the step-in icon on the toolbar, a single instruction at
registers. a time is executed. The effect of program execution can be
Figure 9.15 shows the debugger with the register window observed by monitoring the contents of the registers in the
active. You can modify the contents of any register in this Registers window.
window by double clicking on the appropriate register. In Fig. 9.18 we have begun execution and have reached the
Figure 9.16 shows how the current contents of a register second instruction of the subroutine ‘pal’. In Fig. 9.19 we
appear in a Modify Item window. In this diagram the PC have executed some more instructions and have reached line
contains 0 00008008, which we alter to 0 00008080 (the number 25 in the code.
address of the start of prog1). This address (i.e. 8080) is a fea- Let’s return to the View pull-down menu on the tool bar to
ture of the ARM development system I used. display more information about the program. In Fig. 9.20 we
Figure 9.17 shows the state of the system after the PC has have pulled down the menu and in Fig. 9.21 we have selected
been reloaded. As you can see, the code that we originally the Disassembly mode and have been given the disassem-
entered is now displayed. bly address window.
In Fig. 9.17 we have resized the windows to make best use Figure 9.22 shows the Disassembly window. You can see
of the available space in order to see as much as possible of the the contents of the memory locations starting at 0 00008080.
Figure 9.19 The situation after executing part of the subroutine pal.
Figure 9.20 Using the View function to select the Disassembly display
396 Chapter 9 Processor architectures
Note that the symbolic labels are displayed, although the text We test two characters and then branch to notpal if they
string is interpreted as instructions. aren’t the same. From notpal, we perform a return by plac-
ing the return address in the link register into the pc. Steve
Simplifying the code uses conditional execution to combine these two instruc-
We can simplify the code we’ve developed to test for a palin- tions; that is,
drome; that’s one of the advantages of writing a program in
assembly language. The following provides an improved ver-
sion (without the header, data, and termination mechanism, CMP r3,r4 ;compare the ends of the string
which don’t change). MOVNE pc,lr ;if not same then return
We’ve used two improvements. The first is to use r10 (the Steve’s final version is
success/fail flag) to test for the terminator at the end of the
string. In this way, we begin the sub-
routine with [r10] 0 and save an pal LDRB r3,[r0],#1 ;get left hand character
instruction. The major change is in the LDRB r4,[r1],#-1 ;get right hand character
test for the middle of the string. If we CMP r3,r4 ;compare the ends of the string
MOVNE pc,lr ;if not same then return
automatically increment the left
CMP r0,r1 ;compare pointers
pointer and decrement the right
BMI pal ;not finished
pointer when they are used, we will MOV r10,#1
have one of two situations when we MOV pc,lr ;return (success)
reach the middle. If the string is even,
the left and the right hand pointers will
have swapped over. If the string is odd, the two pointers will be ■ SUMMARY
pointing at the same character. The code subtracts the left When we first introduced the computer, we used Motorola’s
pointer from the right pointer and stops on zero or negative. 68K as a teaching vehicle because it is both powerful and easy
to understand. In this chapter, we have looked at two contrast-
Further simplification ing microprocessors; a simple 8-bit device used in devices rang-
Steve Furber at Manchester University pointed out that the ing from toys to cell phones, and a more sophisticated 32-bit
code can be simplified even further. Look at the way I handled RISC processor, the ARM.
a return if the string wasn’t a palindrome. The 8-bit M68HC12 looks very much like the first-generation
processors that go back to the late 1970s. These
CMP r3,r4 ;compare the ends of the string processors have relatively few internal registers and
BNE notpal ;if different then fail you have only two general-purpose accumulators.
. However, their processors have a wealth of on-chip
. I/O ports, which means that they provide a single-
notpal MOV pc,lr ;return chip solution to many computing problems.
398 Chapter 9 Processor architectures
The ARM processor is a 32-bit machine with a register- 9.9 Most RISC processors have 32 user-accessible registers,
to-register (or load/store) architecture with instructions like whereas the ARM has only 16. Why is this so?
ADD r1, r2,r3. We introduced the ARM because it has some
9.10 Construct an instruction set that has the best features of a
very interesting features. The program counter is one of the
CISC processor like the 68K and a RISC processor like the ARM.
processor’s general-purpose registers, which means that the pro-
Write some test programs for your architecture and compare
grammer can access the PC like any other register. This feature
them with the corresponding pure 68K and ARM programs.
can be exploited in returning from a subroutine because you can
transfer the return address to the PC without having to perform 9.11 All ARM instructions are conditional, which means that
a memory access. they are executed only if a defined condition is met; for exam-
Another feature of the ARM is its ability to shift the second ple, ADDEQ means ‘add if the last result set the zero flag’.
operand as part of a normal data processing instruction. This Explain how this feature can be exploited to produce very
mechanism provides a limited degree of parallel processing compact code. Give examples of the use of this feature to
because you can execute two instructions at once (provided one implement complex conditional constructs.
is a shift).
One of the most interesting features of the ARM is its condi- 9.12 What is the effect of the following ARM instructions?
tional execution, where an instruction is executed only if a con-
dition is met. This facility makes it possible to generate very (a)
compact code. (b)
(c)
(d)
■ PROBLEMS (e)
(f)
9.1 What are the advantages and disadvantages of (g)
microprocessor wordlengths that are not powers of 2 (h)
(e.g. 12 bits and 24 bits)?
9.2 We said that all processors permit register-to-memory, 9.13 The ARM has a wealth of move multiple register instruc-
memory-to-register, and register-to-register moves, whereas tions, which copy data between memory and several registers.
few microprocessors permit direct memory-to- The load versions of these instructions are
memory moves. What are the advantages and dis-
advantages of direct memory-to-memory moves? LDMIA, LDMIB, LDMDA, LDMDB, LDMFD, LDMFA, LDMED, LDMEA
9.3 Some computers have a wide range of shift What do these instructions do? You will need to look up ARM
operations (e.g. logical, arithmetic, and rotate). Some computers literature to answer this question.
have very few shift operations. Suppose that your computer had
only a single logical shift left operation. How would you synthe- 9.14 How are subroutines handled in ARM processors?
size all the other shifts using this instruction and other appropri- 9.15 Implement a jump table in ARM assembly language. A
ate operations on the data? jump table is used to branch to one of a series of addresses
9.4 Some microprocessors implement simple unconditional stored in a table. For example, if register r3 contains the value i,
procedure (i.e. subroutine) calls with a BSR (branch to subrou- a jump (i.e. branch) will be made to the address of the ith entry
tine) instruction. Other microprocessors have a conditional in the table. Jump tables can be used to implement the case or
branch to subroutine instruction that let’s you call a subroutine switch construct in high-level languages.
conditionally. What are the relative merits and disadvantages of 9.16 Consider the fragment of C-code if (p 0)
these two approaches to instruction design? q q11; else q q*4;
9.5 Some registers in a microprocessor are part of its architec- How can conditional execution be exploited by the compiler for
ture which is visible to the programmer, whereas other registers this code?
belong to the processor’s organization and are invisible to the
programmer. Explain what this statement means. 9.17 A 32-bit IEEE floating point number is packed and con-
tains a sign bit, biased exponent, and fractional mantissa. Write
9.6 The MC68HC12 instruction set of Table 9.2 has a very large an ARM program that takes a 32-bit IEE floating point number
number of instructions. Design a new instruction set that per- and returns a sign-bit (most significant bit of r1), a true expo-
forms the same operations but uses fewer instruction types (e.g. nent in r3, and a mantissa with a leading 1 in register r3.
employ a MOVE instruction to replace many of the 6809’s exist- Write a program to convert an unsigned 8-digit decimal inte-
ing data transfer instructions). ger into a 32-bit IEEE floating point number. The 8-digit decimal
integer is stored at the memory location pointed at by r1 and
9.7 What are the relative advantages and disadvantages of vari-
the result is to be returned in r2. The decimal number is right
able-length instructions (in contrast with fixed-length instructions).
justified and leading digits are filled with zeros; for example,
9.8 In what significant ways does the ARM differ from the 68K? 1234 would be stored at 00001234.
Buses and input/output mechanisms 10
CHAPTER MAP
INTRODUCTION
Computers receive data from a wide variety of sources such as the keyboard and mouse, the
modem, the scanner, and the microphone. Similarly, computers transmit data to printers,
displays, and modems. Computer peripherals can be discussed under two headings. The first
is the techniques or strategies whereby information is moved into and out of a computer
(or even within the computer). The second is the peripherals themselves; their characteristics,
operating modes, and functions. We first look at the way in which information is moved
into and out of the computer and in the next chapter we describe some important
peripherals.
We begin with the bus, the device that distributes information within a computer and between
a computer and external peripherals. We describe both high-speed parallel buses and slower,
low-cost buses such as the USB bus that connects keyboards and similar devices to the computer.
We introduce a very unusual bus, the IEEE488 bus, which illustrates many important aspects of
I/O technology.
The middle part of this chapter looks at the strategies used to implement I/O such as programmed
I/O and interrupt-driven I/O.
This chapter concludes with a description of two peripheral chips that automate the
transmission of data between a computer and peripheral. One interface chip handles parallel
data and the other serial data. The precise details of these chips are not important. Their
operating principles are because these chips demonstrate how a lot of the complexity
associated with input and output transactions can be moved from the CPU to an interface.
400 Chapter 10 Buses and input/output mechanisms
access to the bus, and an interrupt bus that deals with requests
10.1 The bus for attention from peripherals. The data transfer bus is, itself,
composed of sub-buses; for example, there’s an address bus
We’ve examined the internal structure and operation of the to communicate the address of the memory location being
computer’s central processing unit. The next step is to show accessed, a data bus to carry data between memory and CPU,
how the computer communicates with the outside world. and a control bus, which determines the sequence of opera-
In this chapter we look at how information gets into and out tions that take place during a data transfer.
of a computer; in the next chapter we turn our attention to Buses are optimized for their specific application; for
devices like the printer and the display that are connected to example, speed (throughput), functionality, or cost (e.g. the
the computer. USB bus). A computer such as the PC may have several buses.
This chapter begins with the bus that distributes informa- Figure 10.2 illustrates the structure of a PC with buses that
tion both within a computer and between a computer and are linked by bridges (i.e. circuits) that control the flow
external devices. We then demonstrate how the CPU imple- of traffic between buses that might have widely different
ments input and output transactions—the CPU doesn’t parameters.
dirty its hands with the fine details of input/output (I/O) In Fig. 10.2 a system bus links together the processor and
operations. The CPU hands over I/O operations to special- its memory. This is the fastest bus in the system because the
purpose interface chips; for example, the computer sends computer cannot afford to wait for instructions or data from
data to one of these chips and logic within the chip handles memory. The system bus is connected to a local bus that deals
the transfer of data between the chip and the external device. with data transfers between slower devices such as audio
We describe the operation of two typical interface chips— subsystems or interfaces to external peripherals. A logic
one that handles I/O a byte (or a word) at a time and one that system that may be as complex as a CPU is used to connect
handles I/O a bit at a time. the system bus to the local bus.
Address
Data
Control
Arbitration bus
Interrupt bus
reception assumed. Open-ended data transfers correspond to The aircraft acknowledges receipt of the message and reads
the basic level of service offered by the mail system. A letter is back any crucial data (i.e. the identification of the runway
written and dropped into a mailbox. The sender believes that is 25 and the altimeter pressure setting is 32.13 inches of
after a reasonable delay, the letter will be received. However, mercury). This data transfer demonstrates the operation of
the sender doesn’t know whether the letter was received. a closed-loop system. In the computer world, a closed-loop
In many circumstances the open-ended transfer of data is data transfer simply indicates that data has been received
perfectly satisfactory. The probability of data getting lost or (the data itself isn’t read back).
corrupted is very small and its loss may be of little importance.
If Aunt Mabel doesn’t get a birthday card, the world doesn’t Open-loop data transfer
come to an end. Consider now the following exchange of Figure 10.3 illustrates an open-loop data transfer between
information between a control tower and an aircraft. a computer and a peripheral. Figure 10.3(a) shows a computer
Approach control ‘Cherokee Nine Four Six November and peripheral with a data path and a 1-bit control signal,
cleared for straight in approach to runway 25. Wind 270 DAV, Fig. 10.3(b) gives a timing diagram for an open-loop
degrees 10 knots. Altimeter 32 point 13. Report field in sight.’ write in which data is sent from the computer to the peripheral,
and Fig. 10.3(c) provides a transaction of protocol diagram
Aircraft ‘Straight in runway 25. 32 point 13. Cherokee Nine
that presents the sequence of actions in the form of messages.
Four Six November.’
At point A data from the computer becomes valid (the
shading before point A indicates that the data is invalid).
At point B the computer asserts the DAV (data valid) control
Cache Main
Processor Video signal to indicate that the data from the computer is valid.
memory memory
The peripheral must read the data before it vanishes at point
D. DAV is negated at point C to inform the peripheral that
the data is no longer valid. This data transfer is called open
System bus
loop because the peripheral doesn’t communicate with the
The system bus is fast
because it has to handle CPU and doesn’t indicate that it has received the data.
CPU to memory transfers
The bridge allows Closed-loop data transfer
Scanner Audio Bridge signals on one bus In a closed-loop data transfer, the device receiving the data
to be transferred
to another bus. acknowledges its receipt. Figure 10.4 illustrates a closed-loop
data transfer between a computer and peripheral. Initially,
Local Bus the computer (i.e. originator of the data) makes the data
The local bus handles available and then asserts data DAV at point B to indicate that
slower data transfers the data is valid just as in an open-loop data transfer. The
between the CPU
and peripherals peripheral receiving the data sees that DAV has been asserted,
indicating that new data is ready. The peripheral asserts its
Figure 10.2 A system with multiple buses. acknowledgement, DAC (data accepted), at point C and reads
DAV Data is
1 valid Asserted
DAC
DAV
0 Negated
Data
available
1 Asserted
DAC
0 Negated
Data
accepted
(c) Transaction (protocol) diagram
Computer Peripheral
A Data
B DAV asserted
DAC asserted C
DAC negated D time
E DAV negated
F Data removed
1 DAC negarted E
DAC
0 F Data removed
Data Data accepted
accepted negated
the data. The data accepted signal is a reply to the computer because the transfer is held up until the peripheral indicates
informing it that the data has been accepted. Once the data its readiness by asserting DAC.
has been read by the peripheral, the DAV and DAC signals Figure 10.5 shows how the handshaking process can be
may be negated and the data removed. This sequence of taken a step further in which the acknowledgement is itself
events is known as handshaking. Apart from indicating the acknowledged,to create a fully interlocked data transfer.The term
receipt of data, handshaking also caters to slow peripherals, fully interlocked means that each stage in the handshaking
10.1 The Bus 403
HANG UPS
In data transfers with handshaking, a problem arises when the an action and the declaration of a failure state is called a
transmitter asserts DAV, but DAC isn’t asserted by the receiver timeout.
in turn (because the equipment is faulty or the receiver is not When a timeout occurs, an interrupt (see Section 10.2.2) is
switched on). When the transmitter wishes to send data, it generated, forcing the computer to take action. In a poorly
starts a timer concurrently with the assertion of DAV. If the designed system without a timeout mechanism, the
receiver doesn’t assert DAC after a given time has passed, the non-completion of a handshake causes the transmitter to
operation is aborted. The period of time between the start of wait for DAC forever and the system is then said to hang up.
procedure can continue only when the previous stage has average m minutes, the system wouldn’t work because some
been acknowledged. At point A in Fig. 10.5 the data becomes patients arrive at approximately the same time. Doctors have
valid and at point B the transmitter asserts DAV indicating solved this problem long ago by putting new patients in a
the availability of data. At C the receiver asserts DAC indicat- waiting room until they can be dealt with. Sometimes the
ing that DAV has been observed and the data accepted. So far waiting room becomes nearly full when patients enter more
this is the same procedure as in Fig. 10.4. rapidly than average.
The transmitter sees that DAC is asserted and de-asserts The solution used by doctors can be applied to any I/O
(i.e. negates) DAV at D, indicating that data is no longer valid process. Data is loaded into a FIFO (first-in first-out) memory
and that it is acknowledging that the receiver has accepted that behaves almost exactly like a waiting room. Data arrives at
the data. Finally, at E the receiver de-asserts (i.e. negates) DAC the memory’s input port and is stored in the same sequence
to complete the cycle, and to indicate that it has seen the in which it arrives. Data leaves the memory’s output port when
transmitter’s acknowledgement of its receipt of data. it is required. Like the doctor’s waiting room, the FIFO can fill
The difference between the handshaking and fully inter- with data during periods in which data arrives faster than it can
locked handshaking of Figs. 10.4 and 10.5 should be stressed. be processed. It’s up to the designer to provide a FIFO with
Handshaking merely involves an acknowledgement of data, sufficient capacity to deal with the worst case input burst. There
which implies that the assertion of DAV is followed by the is, however, one significant difference between the FIFO and
assertion of DAC. What happens after this is undefined. In the waiting room. FIFOs aren’t littered with piles of battered
fully interlocked handshaking, each action (i.e. the assertion 10-year-old copies of National Geographical. Saving data in a
or negation of a signal) takes place in a strict sequence store until it is required is called buffering and the FIFO store is
that ends only when all signals have finally been negated. often called a buffer. Some interfaces incorporate a buffer into
Interlocked handshaking is a two-way process because the their input or output circuits to control the flow of data.
receiver acknowledges the assertion of DAV by asserting DAC
whereas the transmitter acknowledges the assertion of DAC Bus terminology
by negating DAV. Moreover, because fully interlocked hand-
Bus technology has its own vocabulary. Before we continue it’s
shaking also acknowledges negations, it is said to be delay
necessary to introduce some of the concepts and terminology
insensitive.
associated with computer buses.
Many real systems employing closed-loop data transfers
make the entire handshaking sequence automatic in the Arbitration Arbitration is a process whereby a device on the
sense that it is carried out by special-purpose hardware. The bus competes with other devices for control of the bus and
computer itself doesn’t get involved in the process. Only if is granted access to the bus. A simple bus-based system with
something goes wrong does the processor take part in the only one processor and no other bus master doesn’t require bus
handshaking. arbitration because the CPU permanently controls the bus.
How fast should an interface operate? As fast as it can—any Backplane Parallel buses fall into two groups: passive back-
faster and it wouldn’t be able to keep up with the data—any planes and motherboards. A motherboard is a printed circuit
slower and it would waste time waiting for data. Unfortunately, board that includes the CPU and its associated circuitry; for
most real interfaces don’t transfer data at anything like an example, the motherboard found in the PC. A backplane
optimum speed. In particular, data can sometimes arrive so contains the bus and slots (sockets) into which modules such
fast that it’s impossible to process one element before the next as memory cards, processors, and peripherals can be plugged.
is received. The backplane is passive because it provides information
Doctors have a similar problem. If a doctor took exactly m paths but not functionality; that is, there is no CPU or other
minutes to treat a patient and a new patient arrived every m subsystem on the backplane. A backplane is more versatile
minutes, all should be well. However, even if patients arrive than a motherboard and is generally found in commercial or
on average every m minutes and a consultation takes on professional systems.
404 Chapter 10 Buses and input/output mechanisms
Bandwidth The bandwidth of a bus is a measure of its Bus topology The topology of a bus is a description of the
throughput, the rate at which data is transmitted over the paths that link devices together.
bus. Bandwidth is normally expressed in bytes/s and is pro-
Latency A bus’s latency is the time the bus takes to respond
portional to the width of the data bus; for example, if an 8-bit
to a request for a data transfer. Typically, a device requests the
data bus can transfer 200 Mbytes/s, increasing the bus’s width
bus for a data transfer (or a burst of data transfers) and then
to 64 bits increases the bandwidth to 1.6 Gbytes/s.
waits until the bus has signaled that it is ready to perform
Bus architecture Just as we speak about processor architecture the transfer. This waiting period is the bus’s latency.
or memory architecture, we can refer to a bus’s architecture.
Motherboard A motherboard is similar to a backplane
The architecture of a bus (by analogy with the CPU) is an
because it contains a bus and sockets that accept modules
expression of its functionality and how it appears to the user.
such as memory and peripherals. The difference between a
Bus architecture includes a bus’s topology, its data exchange
backplane and motherboard is that the motherboard is
protocols, and its functionality such as its arbitration and
active; it contains a processor and control logic. Modern PCs
interrupt-handling capabilities.
have such sophisticated motherboards that they can operate
Bus contention When two or more devices attempt to without any cards plugged into the system bus because the
access a common bus at the same time, bus contention takes motherboard implements I/O, sound, and even the video
place. This situation is resolved by arbitration, the process display.
that decides which of the contenders is going to gain access to
Multiplexed bus Some data transfer buses have separate
the bus.
address and data sub-buses; that is, the address bus sends
Bus driver Logic systems are wired to bus lines via the location of the next word to be accessed in memory and
gates. Special gates called bus drivers have been designed to the data bus either transmits information to the memory
interface the CPU to a bus or other logic. A bus driver is a dig- in a write cycle or receives information in a read cycle.
ital circuit with the added property that its output terminal Some computer buses use the same lines to carry both
can provide the necessary voltage swing and current neces- addresses and data. This arrangement, called multiplexing,
sary to drive a bus up to a 1 state or down to a 0 state. Bus dri- reduces the number of lines required by the bus at the
vers are required because of the electrical characteristics of expense of circuit complexity. A multiplexed bus works
bus lines. in two or more phases; an address is transmitted on the
common address/data lines and then the same lines are
Bus master A bus master is a device that can actively take
used to transfer data.
control of a bus and use it to transfer data. CPUs are bus
masters. A bus slave, on the other hand, is a device that is
attached to a bus but which can only be accessed from a bus
master. A bus slave cannot initiate a bus access.
10.1.3 The PC bus
You might think it would be easier to wire peripherals and
Bus protocol A bus is defined by the electrical characteristics
memory directly to a PC’s own address and data bus. Indeed,
of its signals (i.e. what levels are recognized as 1s and 0s) and
some single-board microcontrollers do take this approach.
the sequence of signals on the various lines of the bus used to
Connecting the processor to memory and peripherals is not
carry out some transaction. The rules governing the sequenc-
viable in sophisticated systems for several reasons. First, a
ing of signals during the exchange of data are known as a
processor chip cannot provide the electrical energy to drive
protocol.
lots of memory or peripheral chips. Second, a bus can be
Bus termination A bus can be quite long and extend the standardized and equipment from different manufacturers
width of a computer system. Signals put on the bus propagate plugged into it. Third, if we didn’t have buses, all interface
along the bus at close to the speed of light (the actual speed is circuits would have to be modified whenever a new processor
given by the electrical properties of the bus lines and insulators were introduced.
between them). When a pulse reaches the end of a bus, it may A bus makes a computer system independent of processor,
be reflected back towards its source just like a wave that hits memory, or peripheral characteristics and allows independent
the side of a swimming pool. If you place a terminating net- development of CPU or processor technology.
work across the ends of a bus, it can absorb reflections and The history of the IBM PC and its clones is as much the
stop them bouncing from end to end and triggering spurious history of its bus as its central processing unit. Indeed, the
events. PC’s bus structure has advanced more radically than its
10.1 The Bus 405
Width
PCI PCI-X
64
16 ISA MCA
ISA
8
PCI
Serial express
Time
1980 1990 2000 2004
processor architecture. Figure 10.6 describes some of the Figure 10.7 illustrates the structure of the PCI bus in a
steps along the path of the PC’s bus architecture. typical PC system. The processor bus is also called the host
When the PC was first created, its bus was very limited bus, or in PC terminology, the front side bus. The logic system
in terms of its speed, width, and functionality. The original that connects the processor bus to the PCI bus is called a
XT bus supported the Intel 8088, a processor with a 16-bit north bridge. The circuits that implement inter-bus interfaces
internal architecture and an 8-bit external data bus. The in a PC environment have come to be known colloquially
XT bus operated with a 4.77 MHz clock and could access as chipsets.
1 Mbytes of memory. The 8088 was soon replaced by the Figure 10.8 describes Intel’s 875 chipset, which uses an
8086, a processor with an identical architecture but with 82875 MCH chip to provide a north bridge interface between
a true 16-bit data bus. A new version of the PC with a the processor and memory and AGP (the advanced graphics
16-bit bus called ISA (Industrial Standard Architecture,) card slot that provides a fast dedicated interface to a video
was created. card) and an ICH5 chip, which provides an interface to the
As performance increased, the ISA bus rapidly became PCI bus, LAN, and other subsystems. This chipset provides
obsolete and was replaced by three competing buses forcing much of the functionality that was once provided on plug-in
PC users to choose between one of these mutually incompat- PCI cards such as an audio interface, a USB bus interface, and
ible systems. IBM produced its high-performance propriety a LAN interface.
Micro Channel Architecture bus, which was protected by The PCI bus operates at clock speeds of 33 or 66 MHz
patents. This bus died because it was uncompetitive. Two and supports both 32- and 64-bit systems. Data can be
other PC buses were the VESA and EISA buses. transferred in an efficient high-speed burst mode by sending
In 1992 Intel announced the PCI (peripheral interconnect) an address and then transferring a sequence of data bytes.
bus to provide higher performance, to provide a path for A 64-bit-wide bus operating at 66 MHz can transfer data at
future expansion, and to gain control of the bus market. The a maximum rate of 66 8 528 Mbytes/s.
PCI 2.0 bus was 32 bits wide and had a speed of 33 MHz. The The PCI supports arbitration; that is, a PCI card can take
original PCI 2.0 specification was replaced by the PCI 2.1 control of the PCI bus and access other cards on the PCI bus.
specification and the PCI bus was so successful that it rapidly We now look at the IEEE488 bus, which was designed for
replaced all other buses in PCs.1 use in professional systems in commercial environments such
In 2004 the PCI express bus was introduced. This is a major
departure from conventional backplane buses because it
uses a pair of serial data paths operating at 2.5 Gbytes/s in
each direction. Such a pair of buses is called a lane and the 1
Computer buses did not originate in the PC world. Professional
PCI express may use multiple lanes to increase the overall systems had long since used standardized computer buses such as
data rate. Motorola’s VMEbus or the Multibus.
406 Chapter 10 Buses and input/output mechanisms
Cache
Processor
memory
This high-speed bus
operates at the same
rate as the CPU.
Processor bus
Host to PC
Memory bus DRAM
bridge
PCI bus
ISA bridge
The legacy bus is used to
support older peripherals
(obsolete today).
ISA bus
Pentium 4
processor
6.4 Gbytes/s
6.4 Gbytes/s
2.0 Gbytes/s DDR
AGP8X 82875P
MCH Main memory
AGP video port
North bridge
150 Mbytes/s
Serial ATA ports 6 channel audio
133 Mbytes/s
ICH5 PCI bus
LAN interface
BIOS interface
Figure 10.8 The Intel 875 PCI chipset.
10.1 The Bus 407
LEGACY DEVICES
The term legacy device describes facilities that were Similarly, the growth of the high-performance and flexible
incorporated in all PCs but which have now become largely USB interface has largely rendered the traditional serial
obsolete. For example, the ISA bus is obsolete. However, and parallel PC interfaces used by modems and printers
because there are many ISA cards such as modems still in unnecessary. These interfaces are also called legacy
existence, some PCs contain both PCI and ISA buses to devices and are omitted from many modern high-
enable users to keep their old modem cards. As time passes, performance PCs.
fewer and fewer systems have ISA buses.
as instrumentation and control. Some students may omit this master (talker) and several slaves (listeners) in a broadcast
section because the IEEE488 bus is very specialized—we have mode of operation.
included it because it illustrates several important aspects The IEEE bus uses 16 information lines that are divided
of bus design and operation. into three distinct groups—the data bus, the data bus
control lines and the bus management lines (see Fig. 10.9).
The data lines, carry two types of information: bus control
10.1.4 The IEEE 488 bus information and information sent from one bus user to
The IEEE 488 bus dates from 1967 when the Hewlett Packard another. The IEEE bus supports the following three data
Company began to look for a standard bus to link together transmission modes.
items of control and test instrumentation2 in automatic test 1. A byte of user data is called a multiline message and is
environments in industry. We cover it here because it has two transmitted over the 8-bit data bus. The message doesn’t
interesting facets. First, it implements an unusual patented directly affect the operation of the bus itself or the IEEE
three-line data transfer protocol. You have to have a license bus interface and its meaning depends only on the nature
from the patent holders to use the IEEE 488 bus. Second, it of the devices sending and receiving it.
transmits control messages in two ways: via special control
2. A byte of IEEE bus interface control information can be
signals and via encoded data messages.
transmitted over the data bus. Control information acts
Figure 10.9 illustrates the relationship between the IEEE
on the interfaces in the devices connected to the bus or
bus, the IEEE interface, and the devices that communicate
affects the operation of the devices in some predetermined
with each other via the bus. As this diagram demonstrates,
fashion defined in the IEEE 488 standard.
the IEEE standard covers only the bus and the interfaces but
not the devices connected to the interfaces. This distinction is 3. A single bit of information can be transmitted over one of
important because we shall soon discover that the IEEE bus the five special-purpose bus management lines. Certain
implements different communication methods between bus management lines may be used concurrently with the
devices and between interfaces. operations on the data bus.
The IEEE bus supports three types of device: the con- Information flow on DIO1 to DIO8 is managed by three
troller, the talker, and the listener. A talker (transmitter) can control lines, NRFD, DAV, and NDAC (i.e. not ready for data,
put data on the bus, a listener (receiver) can read data from data available, and not data accepted). All data exchanges
the bus, and a controller is a device that manages the bus and between a talker and one or more listeners are fully interlocked,
determines which device may talk and which may listen. Only and, if a talker is sending information to several listeners,
one controller may be active at any given time. An active con- the data is transmitted at a rate determined by the slowest
troller can give up control of the bus by permitting another listener. The operation of the three data bus control lines is
controller to take control. In general, the controller is part of controlled by the bus interfaces in the devices connected to
the host computer on which the applications program is the bus, and is entirely transparent to the user.
being run. Furthermore, this computer invariably has the The bus management lines, IFC, ATN, SRQ, REN, and
functions of controller, talker, and listener. EOI, perform functions needed to enhance the operation of
At any instant only one talker can send messages over the
IEEE bus, although several listeners may receive the messages 2
The IEEE standard was introduced in 1976 and revised in 1978. An
from the talker. The ability to support a single talker and mul- updated version of the standard IEEE 488.2 includes changes to the soft-
tiple listeners simultaneously demonstrates a fundamental ware environment but no significant modifications to the underlying
physical layer. The IEEE 488 bus is known by several names: the General
difference between typical backplane buses and the IEEE bus. Purpose Interface Bus (GPIB), the Hewlett Packard Instrument Bus
Backplane buses transfer data between a master and a single (HPIB) the, IEC 625-1 bus, the ANSI MC1-1 bus, or, more simply, the
slave, whereas the IEEE bus is able to transfer data between a IEEE bus.
408 Chapter 10 Buses and input/output mechanisms
DIO1-8
Controller
Data bus
DIO1-8
DIO1-8
Talker
Device byte
transfer control
(handshake)
DIO1-8
Listener
General
interface
management
the bus. In a minimal implementation of the IEEE 488 bus, is interpreted as a control message. When ATN is false
only ATN is absolutely necessary. The functions of the bus (i.e. electrically high) the message is a device-dependent
management lines are summarized as follows. message from a talker to one or more listeners. The expression
ATN (attention) The ATN line distinguishes between data device-dependent data means that the data is in a format that
and control messages on the eight data lines. When ATN is has a meaning only to the device using the IEEE bus. Only the
true (i.e. electrically low), the information on DIO1 to DIO8 controller can assert the ATN line (or the IFC or REN lines).
10.1 The Bus 409
IFC (interface clear) The controller uses the IFC line to interlocked handshaking procedure. The signals used by
place the bus in a known state. Asserting IFC resets the IEEE the IEEE bus are all active-low, with an electrically high level
bus interfaces but not the devices connected to them. After an representing a negated level and an electrical low level repre-
IFC message has been transmitted by a controller for at least senting an asserted level. Active-low signal levels make it
100 ms, any talker and all listeners are disabled and the serial possible to take advantage of the wired-OR property of the
poll mode (if active) is aborted. open-collector bus driver (i.e. if any open-collector circuit
SRQ (service request) The SRQ line performs the same role pulls the line down to ground, the state of the line is a logical
as an interrupt request and is used by a device to indicate to one). The definitions of the three signals controlling data
the controller that it wants attention. The controller must movement on the IEEE bus are as follows.
perform a serial poll to identify the device concerned, using a DAV (data valid) When true (i.e. electrically low), DAV
specified protocol. indicates to a listener or listeners that data is available on the
REN (remote enable) The REN line is used by the controller eight data lines.
to select between two alternative sources of device control. NRFD (not ready for data) When true, NRFD indicates that
When REN is true a device is controlled from the IEEE bus, one or more listeners are not ready to accept data.
and when false it is controlled locally. In general, local control
NDAC (Not Data Accepted) When true, NRFD indicates that
implies that the device is operated manually from its front
one or more listeners have not accepted data.
panel. The REN line allows a device to be attached to the IEEE
bus, or to be removed from it. In the world of automated The timing diagram of a data transfer between a talker and
testing, the assertion of REN turns a manually controlled several listeners is given in Fig. 10.10. Suppose the bus is
instrument into one that is remotely controlled. initially quiet with no transmitter activity and that three active
EOI (end or identify) The EOI line serves two, mutually receivers are busy and have asserted NRFD to inform the
exclusive, purposes. Although the mnemonic for this line is transmitter that they are busy. In this state, the NRFD line will
EOI, it is frequently written END (end) or IDY (identify), be pulled down by open-collector bus drivers into a logical
depending on the operation being carried out. When asserted one state (remember that the IEEE bus uses negative logic in
by a talker, END indicates the end of a sequence of device- which the true or asserted state is the electrically low state).
dependent messages. When a talker is transmitting a string of When one of the listeners becomes free, it releases (i.e.
device-dependent messages on DIO1 to DIO8, the talker negates) its NRFD output. The negation of NRFD by a listener
asserts EOI concurrently with the last byte to indicate that has no effect on the state of the NRFD line, as other listeners
it has no more information to transmit. When asserted by are still holding it down. This situation is shown by dotted
the controller in conjunction with the ATN line, the EOI line lines in Fig. 10.10. When, at last, all listeners have released
performs the identify (IDY) function and causes a parallel their NRFD outputs, the NRFD line is negated, signifying
poll in which up to eight devices (or groups of devices) may that the listeners are all not ‘not ready for data’—that is, they
indicate simultaneously whether they require service. are ready for data. Now the talker can go ahead with a data
transfer.
Data transfer The talker places data on DIO1 to DIO8 and asserts DAV.
Data transfers on the IEEE data bus, DIO1 to DIO8, are As soon as the listeners detect DAV asserted, they assert
interesting because they involve a patented three-line, fully NRFD to indicate that they are once more busy.
2.4 V
DI01-D108
(Talker) 0.8 V
2.4 V
DAV 0
(Talker) 1 All devices 0.8 V
ready for
data 2.4 V
NDAC 0
(Listener) 1 0.8 V
Last device Figure 10.10 The three-wire
accepts data handshake.
410 Chapter 10 Buses and input/output mechanisms
Meanwhile, the listeners assert their NDAC outputs the aid (or intervention) of the controller. The format and
electrically low to indicate that they have not accepted data. interpretation of this data is outside the scope of the IEEE 488
When a listener detects that DAV has been asserted, it reads standard, but, as we have said, is frequently represented by
the data off DIO1 to DIO8 and negates its NDAC output. ISO (ASCII) characters. Note that the controller is acting as
That is, if its ‘not data’ accepted output is negated, then it an intermediary between talkers and listeners, in contrast
must be signifying data accepted. to other buses in which potential talkers and listeners are
Because all listeners must negate their NDAC outputs usually autonomous.
before the NDAC line can rise to an electrical high state, the
talker does not receive a composite data-accepted signal until Serial and parallel polling
the last listener has released NDAC. The talker terminates the Like many other buses, the IEEE 488 bus provides facilities
data transfer cycle when it releases DAV and the receivers for devices to request service from controllers (i.e. an inter-
release NDAC in turn. rupt mechanism). The IEEE bus supports two forms of
supervisor request—the serial poll and the parallel poll,
Configuring the IEEE bus
although the parallel poll cannot strictly be classified as an
Before the IEEE bus can be used by the devices connected interrupt.
to it, the controller must first assign one device as a talker and A device connected to the IEEE bus can request attention
one or more devices as listeners The controller communicates by asserting the SRQ (service request) bus management line.
with all other devices either by uniline messages (asserting The controller detects the service request and may respond
one of the bus management lines), or by multiline messages by initiating a serial poll. A service request, in IEEE bus
(asserting ATN and transmitting a message via DIO1 to terminology, corresponds to an interrupt request in conven-
DIO8). Multiline messages can be further subdivided into tional computer terminology. As the controller does not know
those intended for all devices (universal commands) and which device initiated the service request, it must poll all
those intended for specific devices (addressed commands). devices sequentially. The recommended sequence of actions
Remember that all messages use only 7 bits of an 8-bit byte, that should be carried out by the controller in response to a
enabling 7-bit ISO characters to be assigned to the control service request is
messages.
Three multiline messages are used by the controller to
configure talkers and listeners on the bus: MLA (my listen
address), MTA (my talk address), and MSA (my secondary
address). Consider first the action of the MLA command.
Before a device may listen to device-dependent traffic on
the bus, it must be addressed to listen by the controller. The
31 my listen address codes from 00100000 to 00111110 select
31 unique listener addresses. Each listener has its own address,
determined either at the time of its manufacture or by After entering the serial poll mode the controller transmits
manually setting switches, generally located on its rear panel. successive talk addresses (MTAs) and examines the service
By sending a sequence of MLAs, a group of devices can messages from each of the devices addressed to talk, until an
be configured as active listeners. The 32nd listener address, affirmative response is obtained. The controller ends the polling
00111111, has a special function called unlisten (UNL). sequence by an SPD (serial poll disable) command.
Whenever the UNL command is transmitted by the controller, A parallel poll is initiated by the controller and involves
all active listeners are disabled. An unlisten command is issued several devices concurrently. The controller sets up the
before a string of MLAs to disable any listeners previously parallel poll by assigning individual data bus lines to devices
configured for some other purpose. (or groups of devices). For example, device 5 may be told
Having set up the listeners, the next step is to configure a to respond to a parallel poll by asserting DIO3. Then, the
talker, which is done by transmitting an MTA. There are controller initiates the parallel poll and the configured
31 my talk address codes from 01000000 to 01011110. As only devices respond.
one device can be the active talker at any given time, the act The controller asserts the ATN and IDY (identify) lines
of issuing a new MTA has the effect of automatically disabling simultaneously to carry out a parallel poll. Whenever the IEEE
the old (if any) talker. The special code 01011111 is called UNT bus is in this state with ATN and IDY asserted, the predeter-
(untalk) and deactivates the current talker. Once a talker mined devices place their response outputs on the assigned
and one or more listeners have been configured, data can be data lines and the controller then reads the contents of the
transmitted from the talker to the listener(s) at the rate of data bus. A parallel poll can be completed in only a few
the slowest device taking part in the exchange and without microseconds unlike the serial poll.
10.1 The Bus 411
10.1.5 The USB serial bus Sync PID Address ENDP CRC EOP
First-generation PCs suffered from poor connectivity. PCs Token packet
had an RS232C serial port for modems and a parallel port for
printers. All external systems had to be interfaced to these
relatively slow interfaces that had not been designed to be Sync PID Data CRC EOP
flexible. You could plug a special card into the PC’s mother-
Data packet
board to support a particular interface or use the expensive
SCSI bus designed for hard-disk interfaces.
Two of the greatest advances in PC technology were the
USB interface and the plug-and-play philosophy. The USB, or Sync PID EOP
universal serial bus, interface is a low-cost plug and socket Handshake packet
arrangement that allows you to connect devices from printers
and scanners to digital cameras and flash-card readers to a PC
with minimal effort. Moreover, the USB is expandable—you Sync PID Frame number CRC EOP
can connect a USB port to a hub and that hub can provide
other USB connectors. A processor with a USB port lets you Start-of-frame packet
connect up to 127 devices to the computer. Plug-and-play
Figure 10.11 USB packets.
allows the device connected to the USB port to negotiate with
the operating system running on the host and to supply the
necessary drivers and set-up parameters. current packet. Packets are terminated by an end-of-packet
The first-generation USB implementation supported a data (EOP) field.
rate of 11 Mbps whereas the USB 2.0 replacement that emerged Other packet fields in Fig. 10.11 are the data field used to
in 2000 supports data transfer rates of 1.5, 12, and 480 Mbps. transport applications-oriented data between the host computer
A USB connector has four pins. Two provide a 5 V power and USB device, and the CRC field, which is used to detect
supply and two transmit the data. The power supply can be retransmission errors. The ENDP field defines the packet’s
used by a USB device as long as its power requirements are endpoint, which provides a destination (one of four) for the
modest. This arrangement allows devices like keyboards, mice, packet within a USB device. This arrangement allows the
flashcard readers, etc. to be connected to a USB port without USB to treat packets as belonging to four different types
the need for their own power supply or batteries. of stream or pipe. The four pipes supported by the USB are
Data on the USB is transmitted differentially, that is, the the default or control pipe, the bulk pipe used for raw data
signal on the two data lines is transmitted as the pair (0.1 V transmission, the interrupt pipe, and the isochronous pipe
0.1 V) or (–0.1 V,0.1 V) so that the information content for streaming video or audio. Note that each pipe consists of
lies in the potential difference between the data terminals, two pipes in opposite directions for host-to-USB device and
which is either 0.2 V or –0.2 V. Information encoding is called USB device-to-host data transfers.
NRZ1 (non-return to zero 1) where the voltage between the Most data transfers use the bulk data pipe where informa-
data lines is unchanged to transmit a 1 and it is switched to tion is sent in units of up to 64 bytes. Isochronous data transfers
transmit a 0; that is, information is transmitted by switching provide a guaranteed bandwidth that is needed for video
polarity whenever there is a 0 in the data stream. or audio links. These data transfers don’t use error checking
Information is transmitted without a reference clock because there’s nothing that can be done if an error occurs in
leaving the receiver to extract data from the incoming stream a real-time video or audio stream.
of pulses. If you transmit a long string of 1s, there are no
transitions in the data stream from which you can extract Setting up the USB
timing information. Consequently, whenever six 1s are The universal serial bus is a dynamic system that can adapt to
transmitted, a 0 is automatically transmitted to force a data changing circumstances; that is, you can hot-plug devices
transition to help create a synchronizing signal. If you recover into the USB bus at any time without powering down and
six 1s, you know the next bit must be a 0 and you simply drop you can remove devices from the bus at any time.
it. This mechanism is called bit stuffing. When a device is plugged into the USB, the host detects that
The individual bits transmitted across the USB bus are a new device has been connected and then waits 100 ms to
grouped into units called packets or frames. Figure 10.11 ensure that the new device has had time to be properly inserted
illustrates four of the 10 USB packets. Packets begin with a and powered up. The host then issues a reset command
synchronizing field followed by a packet identification field, to place the new device in its default state and allow it to
Packet identition (PID) Field, which defines the type of the respond to address zero (the initial default address).
412 Chapter 10 Buses and input/output mechanisms
The host then asks the newly connected device for the first data transfer. Whenever the computer wants to send data to
64 bytes of its device descriptor. Each USB device is able to the display, an instruction in the program writes data into the
supply a device descriptor that defines the device to the host output port that communicates with the display. Similarly,
processor; for example, the descriptor includes information when the computer requires data, an instruction reads data
about the product and its vendor, its power requirements, the from the input port connected to the keyboard. The term port
number of interfaces it has, endpoint information, and so on. indicates a gateway between the computer and an external
Once the full device descriptor has been transmitted, the host I/O device. Programmed data transfer or programmed I/O
is able to communicate with the USB device using the appro- represents the strategy by which the information is moved
priate device drivers. The host can now assign an address to but tells us nothing about how the data is moved—that is
the new device. handled by the interface between the computer and external
peripheral. In this example the keyboard and display are the
I/O devices proper (i.e. peripherals).
10.2 I/O fundamentals Consider data that’s sent from a computer to a remote
display terminal (see Fig. 10.12). When the computer sends
Computer I/O covers several topics because input and output data to its output port, the output port transmits that data
transactions involve the host processor, its software, and the to the display. The output port is frequently a sophisticated
peripherals sending or receiving data. We can divide I/O into integrated circuit whose complexity may approach that
three areas. of the CPU itself. Such a semi-intelligent device relieves the
computer of the tedious task of communicating with the
1. The strategy by which data is moved into or out of the LCD display directly, and frees it to do useful calculations.
computer. The connection between a computer and a display may
2. The interface circuit that actually moves the data into or consist of a twisted pair (two parallel wires twisted at regular
out of the computer. intervals). Because the data written into the output port by
the CPU is in parallel form, the output port must serialize the
3. The input/output devices themselves that convert data
data and transmit it a bit at a time over the twisted pair to the
into a form that can be used by an external system or that
display. Moreover, the output port must supply start and stop
take data from the outside world and convert it into a form
bits to enable the display to synchronize itself with the stream
that can be processed digitally. Data may be converted into
of bits from the computer. Chapter 14 deals in more detail
an almost infinite number of representations, from a close
with serial data transmission. We can now see that the output
approximation to human speech to a signal that opens or
port is the device that is responsible for moving the data
closes a valve in a chemical factory. Input/output devices
between the processor and the peripheral.
are frequently called peripherals.
The display terminal is the output device proper. It accepts
The difference between these three aspects of I/O can serial data from the computer, reconstitutes it into a parallel
be illustrated by two examples. Consider first a computer form, and uses the data to select a character from a table
connected to a keyboard and an LCD display. Data is moved of symbols. The symbols are then displayed on a screen.
into or out of the computer by a strategy called programmed Sometimes the transmitted character performs a control
Computer Display
The program
MOVE.B data, D0
MOVE.B D0, output
Display controller
Serial data
Parallel to serial 1
converter 0 t Serial to parallel
Figure 10.12 Relationship
Transmission path converter
between a computer and a
peripheral.
10.2 I/O fundamentals 413
function (e.g. carriage return, line-feed, or backspace) that when a microprocessor executes an OUT 123 operation, the
determines the layout of the display. contents of a data register are placed on the data bus. At the
Figure 10.13 illustrates the relationship between the CPU, same time the number 123 is placed on the eight least-
the peripheral interface chip, and the peripheral device itself. significant bits of the address bus and a pulse is generated on
As you can see, the peripheral interface chip looks just like a the system’s I/O write line. Each of the I/O ports in such a
memory location to the CPU (i.e. you read or write data to system monitors the address lines. When an I/O interface sees
it). However, this chip contains specialized logic that allows it its own address together with a read-port or a write-port
to communicate with the peripheral. signal, the interface acts on that signal and executes an I/O
The way in which a block of data is written to a disk drive data transfer.
provides another example of the relationship between I/O
strategy, the I/O interface, and the peripheral. It’s impractical Memory-mapped I/O
to use programmed data transfers for disk I/O because that is Many microprocessors lack explicit I/O instructions like the
too slow. The I/O strategy most frequently used is direct OUT port we’ve just described and have no special input
memory access (DMA) in which the data is transferred from or output instructions whatsoever. Microprocessors without
the computer’s memory to a peripheral, or vice versa, without special I/O instruction must use memory-mapped I/O in
passing through the CPU’s registers. The CPU tells the DMA which the processor treats interface ports as an extension to
hardware to move a block of data and the DMA hardware gets memory. That is, part of the CPU’s normal memory space is
on with the task, allowing the CPU to continue its main func- dedicated to I/O operations and all I/O ports look exactly like
tion of information processing. This strategy (i.e. DMA) normal memory locations.
requires special hardware to implement it. Memory-mapped I/O ports are accessed by memory refer-
An interface chip called a DMA controller (DMAC) is ence instructions like MOVE D0,IO_PORT (to output data)
responsible for moving the data between the memory and the and MOVE IO_PORT,D0 (to input data). A disadvantage of
peripheral. The DMAC provides addresses for the source or memory-mapped I/O is that memory space available to
destination of data in memory, and informs the peripheral programs and data is lost to the I/O system.
that data is needed or is ready. Furthermore, the DMAC must Figure 10.14 describes the organization and memory map
grab the computer’s internal data and address buses for the of an I/O port. An output port located at address 800016 is
duration of a data transfer. Data transfer by DMA must be connected to a display device. Data is transmitted to the dis-
performed while avoiding a conflict with the CPU for the play by storing it in memory location 800016. As far as the
possession of the buses. In this example the peripheral is a processor is concerned, it’s merely storing data in memory.
disk drive—a complex mixture of electronics and high-preci- The program in Table 10.1 sends 128 characters (starting at
sion mechanical engineering designed to store data by locally 200016) to the display. Note that we’ve provided both conven-
affecting the magnetic properties of the surface of a disk tional comments and RTL definitions of the instructions.
rotating at a high speed. The numbers in the right-hand column in Table 10.1 give
the time to execute each instruction in microseconds, assum-
ing a clock rate of 8 MHz. To output the 128 characters takes
10.2.1 Programmed I/O approximately 128 (8 8 8 10)/8 544 s, which is
Programmed I/O takes place when an instruction in the pro- a little over 1⁄2 thousandth of a second. Data is transferred at a
gram performs the data transfer; for example, a programmer rate of one character per 41⁄4 s.
writes MOVE.B Keyboard,D0 to read a byte of data from Although the program in Table 10.1 looks as if it should
the keyboard and puts it in D0. Some microprocessors have work, it’s unsuited to almost all real situations involving pro-
special instructions that are used only for I/O; for example, grammed output. Most peripherals connected to an output
Address bus
Peripheral
Data bus interface chip Peripheral
CPU
device
Control bus
Peripheral
CPU side Peripheral side bus
Figure 10.13 Relationship
between a computer and a
Part of the computer An external device peripheral.
414 Chapter 10 Buses and input/output mechanisms
Program
port are slow devices and sending data to them at this rate Almost all memory-mapped I/O ports occupy two or more
would simply result in almost all the data being lost. Some memory locations. One location is reserved for the actual data
interfaces can deal with short bursts of high-speed data to be input or output, and one holds a status byte associated
because they store data in a buffer; they can’t deal with a with the port. For example, let 800016 be the location of the
continuous stream of data at high speeds because the buffer port to which data is sent and let 800216 be the location of
fills up and soon overflows. the status byte. Suppose that bit 0 of the status byte is a 1 if
You can deal with a mismatch in speed between the the port is ready for data and a 0 if it is busy. The fragment of
computer and a peripheral by asking the peripheral if it’s program in Table 10.2 implements memory-mapped output
ready to receive data, and not sending data to it until it is ready at a rate determined by the peripheral. The comments at
to receive it. That is, we introduce a software handshaking the beginning of the program describe the data transfer in
procedure between the peripheral and the interface. pseudocode.
10.2 I/O fundamentals 415
Table 10.2 Using the polling loop to control the flow of data.
The program in Table 10.2 is similar to the previous Note that all the I/O strategies we are describing use
example in Table 10.1 except for lines 8 to 12 inclusive. In line memory-mapped I/O.
8 an address register, A2, is used to point to the status byte of By the way, if you are designing a computer with memory-
the interface at address 800216. In line 10 the status byte of mapped I/O and a memory cache,3 you have to tell the cache
the interface is read into D2 and masked down to the least- controller not to cache the port’s status register. If you don’t
significant bit (by the action of AND.B #1,D2 in line 11). do this, the cache memory would read the status once, cache
If the least-significant bit of the status byte is zero, a branch it, and then return the cached value on successive accesses to
back to line 10 is made by the instruction in line 12. When the the status. Even if the status register in the peripheral
interface becomes free, the branch to WAIT is not taken and changes, the old value in the cache is frozen.
the program continues exactly as in Table 10.1.
Lines 10, 11, and 12 constitute a polling loop, in which the
output device is continually polled (questioned) until it
10.2.2 Interrupt-driven I/O
indicates it is free, allowing the program to continue. A slow A computer executes instructions sequentially unless a jump
mechanical printer might operate at 30 characters/second, or or a branch is made. There is, however, an important excep-
approximately 1 character per 33 000 s. Because the polling tion to this rule called an interrupt, an event that forces the
loop takes about 3 s, the loop is executed 11 000 times per CPU to modify its sequence of actions. This event may be
character. a signal from a peripheral (i.e. a hardware interrupt) or an
Operating a computer in a polled input/output mode is internally generated call to the operating system (i.e. a software
grossly inefficient because so much of the computer’s time is interrupt). The term exception describes both hardware and
wasted waiting for the port to become free. If the micro- software interrupts.
computer has nothing better to do while it is waiting for a Most microprocessors have an active-low interrupt
peripheral to become free (i.e. not busy) polled I/O is per- request input, IRQ, which is asserted by a peripheral to
fectly acceptable. Many first-generation PCs, were operated request attention. The word request implies that the interrupt
in this way. However, a more powerful computer working in request may or may not be granted. Figure 10.15 illustrates
a multiprogramming environment can attend to another task the organization. of a system with a simple interrupt-driven
program during the time the I/O port is busy. In this case a I/O mechanism.
better I/O strategy is to ignore the peripheral until it is ready
for a data transfer and then let the peripheral ask the CPU 3
Cache memory is very fast memory that contains a copy of frequently
for attention. Such a strategy is called interrupt-driven I/O. accessed data. We looked at cache memory in Chapter 8.
416 Chapter 10 Buses and input/output mechanisms
Address bus
Data bus
Port
Data register Interrupt registers
CPU are read by the CPU
Memory Status register
to determine the
IVR peripheral's status
IRQ IRQ
Informs CPU that the I want attention
peripheral wants attention
Interrupt request to CPU
In Figure 10.15 an active-low interrupt request line connects Interrupt-driven I/O requires a more complex program
all peripherals to the CPU. A peripheral asserts its IRQ output than programmed I/O because the information transfer takes
when it requires attention. This system is analogous to the place not when the programmer wants or expects it, but when
emergency handle in a train. When the handle is pulled in one the data is available. The software required to implement
of the carriages, the driver knows that a problem has arisen but interrupt-driven I/O is frequently part of the operating system.
doesn’t yet know who pulled the handle. Similarly, the CPU A fragment of a hypothetical interrupt-driven output routine
doesn’t know which peripheral caused the interrupt or why. in 68K assembly language is provided in Table 10.3. Each time
When the CPU detects that its IRQ input has been asserted, the interrupt handling routine is called, data is obtained from
the following sequence of events takes place. a buffer and passed to the memory-mapped output port at
$008000. In a practical system some check would be needed
●
The CPU finishes its current instruction because micro-
to test for the end of the buffer.
processors cannot be stopped in mid-instruction. Individual
Because the processor executes this code only when a peri-
machine code instructions are indivisible and must always
pheral requests an I/O transaction, interrupt-driven I/O is very
be executed to completion.4
much more efficient than the polled I/O we described earlier.
●
The contents of the program counter and the processor Although the basic idea of interrupts is common to most
status word are pushed onto the stack. The processor status computers, there are considerable variations in the precise
must be saved because the interrupt routine will almost nature of the interrupt-handling structure from computer to
certainly modify the condition code bits. computer. We are now going to look at how the 68K deals
●
Further interrupts are disabled to avoid an interrupt being with interrupts because this microprocessor has a particularly
interrupted (we will elaborate on this partially true state- comprehensive interrupt handling facility.
ment later).
Prioritized interrupts
●
The CPU deals with the interrupt by executing a program
called an interrupt handler. Computer interrupts are almost exactly analogous to inter-
rupts in everyday life. Suppose two students interrupt me when
●
The CPU executes a return from interrupt instruction at
I’m lecturing—one with a question and the other because they
the end of the interrupt handler. Executing this instruction
feel unwell. I will respond to the more urgent of the two
pulls the PC and processor status word off the stack and
requests. Once I’ve dealt with the student who’s unwell, I
execution then continues normally—as if the interrupt had
answer the other student’s question and then continue my
never happened.
teaching. Computers behave in the same way.
Figure 10.16 illustrates the sequence of actions taking place
when an interrupt occurs. In a 68K system the processor 4
This statement is not true of all microprocessors. It is possible to
status word consists of the system byte plus the condition
design microprocessors that can save sufficient state information to inter-
code register. The system byte is used by the operating system rupt an instruction and then continue from the point at which execution
and interrupt processing mechanism. had reached.
10.2 I/O fundamentals 417
Stack before
interrupt
Normal processing Interrupt handling Stack
Interrupt
Stack proces
so
and return ad r status
dress SP
TOS
Save working
registers
Stack after
Interrupt handling interrupt
routine
Stack
d Restore working
Restore PC an registers SP
es so r st at us
proc
Return
Status
Old TOS
Figure 10.16 Interrupt
sequence.
Most computers have more than one interrupt request input. be serviced. If an interrupt at a level higher than IRQ3 occurs,
Some interrupt request pins are connected to peripherals it will be serviced before the level 3 interrupt service routine
requiring immediate attention (e.g. a disk drive), whereas is completed. However, interrupts generated by IRQ1 or
others are connected to peripherals requiring less urgent IRQ2 will be stored pending the completion of IRQ3’s service
attention (e.g. a keyboard). For the sake of accuracy, we routine.
should point out that the processor’s interrupt request input The 68K does not have seven explicit IRQ1 to IRQ7
is connected to the peripheral’s interface, rather than the interrupt request inputs (simply because such an arrange-
peripheral itself. If the disk drive is not serviced when its data ment would require seven precious pins). Instead, the 68K
is available, the data will be lost because it will be replaced by has a 3-bit encoded interrupt request input, IPL0 to IPL2.
new data. In such circumstances, it is reasonable to assign a The 3-bit value on IPL0 to IPL2 reflects the current level of
priority to each of the interrupt request pins. interrupt request from 0 (i.e. no interrupt request) to 7 (the
The 68K supports seven interrupt request inputs, from highest level corresponding to IRQ7). Figure 10.17 illustrates
IRQ7, the most important, to IRQ1, the least important. some of the elements involved in the 68K’s interrupt handling
Suppose an interrupt is caused by the assertion of IRQ3 and structure. A priority encoder chip is required to convert an
no other interrupts are pending. The interrupt on IRQ3 will interrupt request on IRQ1 to IRQ7 into a 3-bit code in IPL0
418 Chapter 10 Buses and input/output mechanisms
to IPL2. The priority encoder automatically prioritizes inter- When the 68K services an interrupt, the interrupt mask
rupt requests and its output reflects the highest interrupt bits are reset to make them equal to the level of the interrupt
request level asserted. currently being serviced. For example, if the interrupt mask
The 68K doesn’t automatically service an interrupt request. bits were set to 2 and an interrupt occurred at level IRQ5, the
The processor status byte in the CPU in Fig. 10.17 controls the mask bits would be set to 5. Consequently, the 68K can now
way in which the 68K responds to an interrupt. Figure 10.18 be re-interrupted only by interrupt levels 6 and 7. After the
describes the status byte in more detail. The 3-bit interrupt interrupt has been serviced, the old value of the processor
mask field in the processor status byte, I2, I1, I0, determines status byte saved on the stack, and therefore the interrupt
how the 68K responds to an interrupt. The current interrupt mask bits, are restored to their original level.
request is serviced if its level is greater than that of the inter-
rupt mask; otherwise the request is ignored. For example, Non-maskable interrupts
if the interrupt mask has a current value of 4, only interrupt Microprocessors sometimes have a special interrupt request
requests on IRQ5 to IRQ7 will be serviced. input called a non-maskable interrupt request. The term
68000 microprocessor
IRQ1
Encoded IPL0 IRQ2
interrupt Priority IRQ3 Interrupt request
request IPL1
encoder IRQ4 inputs from
input peripherals
IPL2 IRQ5
IRQ6
IRQ7
IACK1
Function code FC0 IACK2
indicates type Interrupt
IACK IACK3
of bus cycle. FC1 acknowledge
1,1,1 = IACK encoder IACK4 outputs to
cycle FC2 IACK5 peripherals
IACK6
In an IACK cycle the A01 IACK7
CPU puts the level A
of the IACK on the 02
address bus A03 I want attention
Memory You've got it
Stack pointer
Reset vector
IRQ IACK IRQ IACK
IVEC is returned
Processor status byte IVEC IVEC by a peripheral to
acknowledge on
I2 I1 I0
Peripheral Peripheral interrupt
Interrupt mask
bits set the level
below which Vector 255
interrupts will
not be processed
Address bus
Data bus
T S I2 I1 I0 N X Z V C
Memory map
Vectored interrupts
Following the detection and acceptance
8000 Peripheral data Data port
of an interrupt, the appropriate interrupt-
handling routine must be executed. You
8002 RDY IRQ ERR Status byte can test each of the possible interrupters,
in turn, to determine whether they were
15 8 7 0
responsible for the interrupt. This opera-
tion is called polling and is the same
mechanism used for programmed I/O.
Ready flag Interrupt flag We now look at how the 68K deals
set if this device set if this device Errot status
is ready to take set to indicate error with the identification of an interrupt
generated an
part in an I/O interrupt request that came from one of several
transaction possible devices. However, before we
do this it’s instructive to consider how
Figure 10.19 A memory-mapped data and status port.
first-generation microprocessors performed the task of
isolating the cause of an interrupt request.
non-maskable means that the interrupt cannot be turned off Figure 10.19 shows the structure of a memory-mapped
(i.e. delayed or suspended) and must be serviced immediately. I/O port with a data port at address 800016 and a status byte
Non-maskable interrupts are necessary when the interrupt is at location 800216. We have defined 3 bits in the status byte:
caused by a critical event that must not be missed; for example,
an interruption of the power supply. When power is lost, the RDY (ready) indicates that the port is ready to take part in a
●
system still functions for a few milliseconds on energy stored data transaction
in capacitors (devices found in all power supplies). A non- IRQ indicates that the port has generated an interrupt
●
maskable interrupt generated at the first sign of a power loss ERR indicates that an error has occurred (i.e. the input or
●
but never gets it. We next demonstrate how some processors The 68K uses function code outputs, FC0, FC1, FC2, to
allow the peripheral that requested attention to identify itself inform peripherals that it’s acknowledging an interrupt (see
by means of a mechanism called the vectored interrupt. Fig. 10.17). These three function code outputs tell external
In a system with vectored interrupts the interface itself devices what the 68K is doing. For example, the function code
identifies its interrupt-handling routine, thereby removing tells the system whether the 68K is reading an instruction or
the need for interrupt polling. Whenever the 68K detects an an operand from memory. The special function code 1,1,1
interrupt, the 68K acknowledges it by transmitting an inter- indicates an interrupt acknowledge.
rupt acknowledge (called IACK) message to all the interfaces Because the 68K has seven levels of interrupt request,
that might have originated the interrupt. it’s necessary to acknowledge only the appropriate level of
001=110
level 6 IRQ
68K microprocessor
0 IRQ1
IPL2 IRQ2
0 IRQ3
IPL1 Priority
IRQ4
The function 1 encoder
IPL0 IRQ5
code 1,1,1
indicates an IRQ6
IACK IRQ7
FC0 High during
FC1 IACK cycle
FC2
E
IACK1
1 IACK2
A03 3-line to IACK3
1 8-line IACK4
A02
decoder IACK5
0
A01 IACK6
IACK7
IACK at Interrupt request
level 6 at level 6
Interrupt acknowledge
Memory at level 6
IRQ IACK
Interrupt
vector Peripheral
4 × 4016 =10016 Interrupt vector
10016 00001234 table register supplies
interrupt vector
IVEC 4016 during IACK cycle
Address bus
Data bus
Interrupt vector
supplied on data bus
in IACK cycle
interrupt. It would be unfair if a level 2 and a level 6 interrupt The interrupt vector table itself takes up 4 256 1024
occurred nearly simultaneously and the interface requesting bytes of memory. Figure 10.20 illustrates the way in which the
a level 2 interrupt thought that its interrupt was about to be 68K responds to a level 6 vectored interrupt.
serviced. The 68K indicates which level of interrupt it’s acknow-
ledging by providing the level on the three least-significant Daisy-chaining
bits of its address bus (A01 to A03). External logic detects FC0, The vectored interrupt scheme we’ve just described has a
FC1, FC2 1, 1, 1 and uses A01 to A03 to generate seven inter- flaw. Although there are 256 interrupt vector numbers, the
rupt acknowledge signals IACK0 to IACK7 . 68K supports only seven levels of interrupt. A mechanism
After issuing an interrupt request, the interface waits for an called daisy-chaining provides a means of increasing the
acknowledgement on its IACK input. When the interface number of interrupt levels by linking the peripherals together
detects IACK asserted, it puts out an interrupt vector number in a line. When the CPU acknowledges an interrupt, a message
on data lines d00 to d07. That is, the interface responds with a is sent to the first peripheral in the daisy chain. If this peri-
number ranging from 0 to 255. When the 68K receives this pheral doesn’t require attention, it passes the IACK down the
interrupt vector number, it multiplies it by 4 to get an entry line to the next peripheral.
into the 68K’s interrupt vector table; for example, if an inter- Figure 10.21 shows how interrupt requesters at a given
face responds to an IACK cycle with a vector number of 100, priority level are prioritized by daisy chaining. Each periph-
the CPU multiplies it by 4 to get 400. In the next step, the 68K eral has an IACK–IN input and an IACK–OUT output. The
reads the contents of memory location 400 to get a pointer to IACK–OUT pin of a peripheral is wired to the IACK–IN pin
the location of the interrupt-handling routine for the inter- of the peripheral on its right. Suppose an interrupt request
face that initiated the interrupt. This pointer is loaded into at level 6 is issued and acknowledged by the 68K. The inter-
the 68K’s program counter to start interrupt processing. face at the left-hand side of the daisy chain closest to the
Because an interface can supply one of 256 possible vector 68K receives the IACK signal first from the CPU. If this
numbers, it’s theoretically possible to support 256 unique interface generated the interrupt, it responds with an inter-
interrupt-handling routines for 256 different interfaces. We rupt vector. If the interface did not request service, it passes
say theoretically, because it’s unusual for 68K systems to the IACK signal to the device on its right. That is, IACK–IN
dedicate all 256 vector numbers to interrupt handling. In is passed out on IACK–OUT. The IACK signal ripples down
fact, the 68K itself uses vector numbers 0 to 63 for purposes the daisy chain until a device responds with an interrupt
other than handling hardware interrupts (these vectors are vector.
reserved for other types of exception). Daisy-chaining interfaces permit an unlimited number of
The 68K multiplies the vector number by 4 because each interfaces to share the same level of interrupt and each inter-
vector number is associated with a 4-byte pointer in memory. face to have its own interrupt vector number. Individual
IRQ to CPU
interfaces are prioritized by their position with respect to the the memory. At the same time, the DMAC provides a
CPU. The closer to the CPU an interface is, the more chance TransferGrant signal to the peripheral, which is then able to
it has of having its interrupt request serviced in the event of write to, or read from, the memory directly. When the DMA
multiple interrupt requests at this level. operation has been completed, the DMAC hands back con-
trol of the bus to the CPU.
A real DMA controller is a very complex device. It has
10.3 Direct memory access several internal registers with at least one to hold the address
of the next memory location to access and one to hold the
The third I/O strategy, called direct memory access (DMA), number of words to be transferred. Many DMACs are able
moves data between a peripheral and the CPU’s memory to handle several interfaces, which means that their registers
without the direct intervention of the CPU itself. DMA must be duplicated. Each interface is referred to as a channel
provides the fastest possible means of transferring data and typical single-chip DMA controllers handle up to four
between an interface and memory, as it requires no CPU channels (i.e. peripherals) simultaneously.
overhead and leaves the CPU free to do useful work. DMA is Figure 10.23 provides a protocol flowchart for the sequence
complex to implement and requires a relatively large amount of operations taking place during a DMA operation. This figure
of hardware. Figure 10.22 illustrates the operation of a system shows the sequence of events that takes place in the form of
with DMA. a series of transactions between the peripheral, DMAC, and
DMA works by grabbing the data and address buses from the CPU.
the CPU and using them to transfer data directly between the DMA operates in one of two modes: burst mode or cycle
peripheral and memory. During normal operation of the stealing. In the burst mode the DMA controller seizes the
computer in Fig. 10.22, bus switch 1 is closed and bus switches system bus for the duration of the data transfer operation (or
2 and 3 are open. The CPU controls the buses, providing an at least for the transfer of a large number of words). Burst
address on the address bus and reading data from memory or mode DMA allows data to be moved into memory as fast as
writing data to memory via the data bus. the weakest link in the chain memory/bus/interface permits.
When a peripheral wishes to take part in an I/O transac- The CPU is effectively halted in the burst mode because it
tion it asserts the TransferRequest input of the DMA cannot use its data and address buses.
controller (DMAC). In turn, the DMA controller asserts In the cycle steal mode described by Fig. 10.24, DMA
DMArequest to request control of the buses from the CPU; operations are interleaved with the computer’s normal
that is the CPU is taken offline. When the CPU returns memory accesses. As the computer does not require access to
DMAgrant to the DMAC, a DMA transfer takes place. the system buses for 100% of the time, DMA can take place
Bus switch 1 is opened and switches 2 and 3 closed. The when they are free. This free time occurs while the CPU is busy
DMAC provides an address to the address bus and hence to generating an address ready for a memory read or write cycle.
Address bus
Data bus
Enable CPU
Bus switch 1 Bus switch 2 Bus switch 3
Enable Enable
DMA DMA
System clock
Address bus DMA address CPU address DMA address CPU address
Data bus DMA data CPU data DMA data CPU data
circuits with 8-bit data interfaces allowing them to be used The interface between the PIA and the CPU is conven-
with 8-bit processors. tional; the PIA’s CPU-side looks like a block of four locations
in RAM to the CPU. CPU-side pins comprise a data bus
and its associated control circuits. Two register-select pins RS0
10.4.1 The parallel interface and RS1 are connected to the lower-order bits of the CPU’s
The peripheral interface adapter (PIA) is an integrated circuit address bus and discriminate between the PIA’s internal
with two independent 8-bit ports. It contains all the logic registers.
needed to control the flow of data between an external periph- The PIA has two independent interrupt request outputs,
eral and a computer. A port’s eight pins may be programmed one for each port. When the PIA is powered up, the contents
individually to act as inputs or outputs; for example, an 8-bit of all its internal registers are put in a zero state. In this
port can be configured with two input lines and six output mode the PIA is in a safe state with all its programmable pins
lines. The PIA can automatically perform handshaking with configured as inputs. It would be highly dangerous to permit
devices connected to its ports. the PIA to assume a random initial configuration, because
Figure 10.25 gives a block diagram of the PIA from which any random output signals might cause havoc elsewhere.
it can be seen that the two I/O ports, referred to as the A side To appreciate how the PIA operates, we have to understand
and the B side, appear symmetrical. In general this is true, the function of its six internal registers. The PIA has two
but small differences in the behavior of these ports are peripheral data registers (PDRA and PDRB), two data-direction
described when necessary. Each port has two control pins registers (DDRA, and DDRB), and two control registers (CRA
that can transform the port from a simple I/O latch into a and CRB). The host computer accesses a location within
device capable of performing a handshake or initiating inter- the PIA by putting the appropriate 2-bit address on register
rupts, as required. select lines RS0 and RS1. Because RS0 and RS1 can directly
PB0
PB1
PB2
Output register B
Data bus PB3
Data bus 7 6 5 4 3 2 1 0 Peripheral
buffers interface B PB4
d7 d6 d5 d4 d3 d2 d1 d0
PB5
PB6
PB7
distinguish between only four of the six internal registers, we The peripheral data registers provide an interface between
need a means of accessing the other registers. The PIA uses bit the PIA and the outside world. When one of the PIA’s 16 I/O
2 in the control registers (CRA2 or CRB2) as a pointer to pins is programmed as an input, data is moved from that pin
either the data register or the data-direction register. through the peripheral data register onto the CPU’s data bus
Table 10.4 demonstrates how this arrangement works. during a read cycle. Conversely, when acting as an output, the
Register select input RS1 determines which of the two 8-bit CPU latches a 1 or 0 into the appropriate bit of the peripheral
I/O ports of the PIA is selected and RS0 determines whether data register to determine the state of the corresponding
the control register or one of the pair of registers formed by the output pin.
peripheral data register and the data register, is selected. The The data-direction registers determine the direction of data
control registers can always be unconditionally accessed transfer at the PIA’s I/O pins. Writing a zero into bit i of
when RS0 1, but to select a peripheral data register or a DDRA configures bit i of the A side peripheral data register as
data-direction register, bit 2 of the appropriate control regis- an input. Conversely, writing a one into bit i of DDRA con-
ter must be set or cleared, respectively. figures bit i of the A side peripheral data register as an output.
The pins of the PIA’s A side or B side ports may be defined as
inputs or outputs by writing an appropriate code into DDRA
or DDRB, respectively. The PIA’s I/O pins can be configured
RS1 RS0 CRA2 CRB2 Location selected Address dynamically and the direction of data transfer altered during
the course of a program. The DDR’s bits are cleared during a
0 0 1 X Peripheral data register A BASE power-on-reset to avoid accidentally forcing any pin into an
0 0 0 X Data direction register A BASE output mode.
0 1 X X Control register A BASE2 Table 10.5 demonstrates how side A of a PIA memory-
1 0 X 1 Peripheral data register B BASE4 mapped at address $80 0000 is configured as an input and
1 0 X 0 Data direction register B BASE4
side B as an output. The registers are accessed at $80 0000,
$80 0002, $80 0004, and $80 0006. Consecutive addresses
1 1 X X Control register B BASE6
differ by 2 rather than 1 because the 68K’s data bus is 16 bits
X don’t care wide (2 bytes) whereas the PIA is 8 bits wide.
BASE base address of the memory-mapped PIA Once the PIA has been configured, data can be read from
RS0 register select 0 RS1 register select 1
CRA2 bit 2 of control register A CRB2 bit 2 of control register B
side A of the PIA into data register D0 by a MOVE.B PDRA,D0
instruction, and data may written into side B by writing to the
Table 10.4 The register selection scheme of the PIA. PIA with a MOVE.B D0,PDRB instruction.
Controlling the PIA CA1 control Bits CRA0 and CRA1 determine how the PIA
The control registers control the special-purpose pins associated responds to a change of level (0-to-1 or 1-to-0) at the CA1
with each port of the PIA. Pins CA1 and CA2 control the flow of control input. The relationship between the CA1 control
information between the peripheral’s A side and the PIA by pro- input, CRA0, CRA1, and the interrupt flag IRQA1 is
viding any required handshaking between the peripheral and described in Table 10.6. CRA1 determines the sense (i.e. up or
PIA. Similarly, side B has control pins CB1 and CB2. down) of the transition on CA1 that causes the CRA7 inter-
The bits of control register A (CRA) can be divided into rupt flag (i.e. IRQA1) to be set. CRA0 determines whether an
four groups according to their function. Bits CRA0 to CRA5 active transition on CA1 generates an interrupt request by
define the PIA’s operating mode (Fig. 10.26). Bits CRA6 and asserting the IRQA output. CA1 can be used as an auxiliary
CRA7 are interrupt status bits that are set or cleared by the input if bit CRA0 is clear, or as an interrupt request input if
PIA itself. Bit CRA6 is interrupt request flag 1 (IRQA1), bit CRA0 is set.
which is set by an active transition at the CA1 input pin. Whenever an interrupt is caused by an active transition on
Similarly, CRA7 corresponds to the IRQA2 interrupt request CA1, the interrupt flag in the control register, IRQA1, is set
flag and is set by an active transition at the CA2 input pin. We and the IRQA output pin goes low. After the CPU has read
now examine the control register in more detail. the contents of peripheral data register A, interrupt flag
IRQA1 is automatically reset. In a typical applica-
tion of the PIA, CA1 is connected to a peripheral’s
RDY output so that the peripheral can request
7 6 5 4 3 2 1 0
attention when it is ready to take part in a data
IRQA1 IRQA2 CA2control DDRA CA1control transfer.
For example, if CRA1, CRA0 is set to 0, 1, a neg-
IRQA CA2 CA1 ative (falling) edge at the CA1 control input sets the
IRQA1 status flag in control register CRA to 1, and
the PIA’s IRQA interrupt request output is asserted
to interrupt the host processor. CRA1 determines
Interrupt request Handshake controls 8-bit data bus
the sense of the transition on CA1 that sets the
to CPU to peripheral
interrupt flag status and CRA0 determines
Figure 10.26 Structure of the PIA’s side A control register. whether the PIA will interrupt the host processor
when the interrupt flag is set.
Data direction access control (CRA2) When regis-
CRA1 CRA0 Transition of CA1 IRQA1 interrupt Status of side A ter select input RS0 is 0, the data-direction access
control input flag status interrupt request control bit determines whether data-direction regis-
ter A or peripheral data register A is selected. When
0 0 negative edge set on negative edge masked
the PIA is reset, CRA2 is 0 so that the data-direction
0 1 negative edge set on negative edge enabled
register is always available after a reset.
1 0 positive edge set on positive edge masked
CA2 control (CRA3, CRA4, CRA5) The CA2 con-
1 1 positive edge set on positive edge enabled (asserted)
trol pin may be programmed as an input that gener-
ates an interrupt request in a similar way to CA1, or
Table 10.6 Effect of CA1 control bits. it may be programmed as an output. Bit 5 of the
control register determines CA2’s function. If bit 5 is
0, CA2 is an interrupt request input (Table 10.7)
CRA5 CRA4 CRA3 Transition of CA2 IRQA2 interrupt Status interrupt
control input flag status request and if bit 5 is 1, CA2 is an output (Table 10.8).
Table 10.7 demonstrates that the behavior of CA2,
0 0 0 negative edge set on negative masked when acting as an interrupt-request input, is
edge entirely analogous to that of CA1.
0 0 1 negative edge set on negative enabled When CA2 is programmed as an output with
edge (asserted) CRA5 1 it behaves in the manner defined in
0 1 0 positive edge set on positive masked Table 10.8.
edge
0 1 1 positive edge set on positive enabled
1. Case 1 (CRA5 1, CRA4 0, CRA3 0). This
edge (goes low) is the handshake mode used when a peripheral is
transmitting data to the CPU via the PIA. A tim-
ing diagram of the action of the handshake mode
Table 10.7 Effect of CA2 control bits when CRA5 0 (note that E is the
of CA2 is given in Fig. 10.27, together with an
PIA’s clock).
10.4 Parallel and serial interfaces 427
0 1 0 0 Low on the falling edge of the High when interrupt flag bit CRA7 is set
E clock after a CPU read side A by an active transition of CA1 input
data operation
1 1 0 1 Low on the falling edge of E after High on the negative edge of the first E
a CPU read side A data operation pulse occurring during a deselect state
2 1 1 0 Low when CRA3 goes low as a Always low as long as CRA3 is low.
result of a CPU write to CRA. Will go high on a CPU write to CRA that
changes CRA3 to a 1
3 1 1 1 Always high as long as CRA3 is High when CRA23 goes high as a result
high. Will be cleared on a CPU of a CPU write to CRA
write to CRA that clears CRA3
PIA clock
CA2
output
B C
At C the PIA brings CA2 low after the Figure 10.27 The PIA
CA1
Peripheral PIA CPU has read the data. This tells the peripheral input handshake mode
CA2 that the data has been accepted. (case 0 in Table 10.8).
4. Case 4 (CRA5 1, CRA4 1, CRA3 1). Now CA2 is no data is being transmitted from an ACIA, the serial output
set to a high level and remains in that state until CRA3 is is at a high level, which is called the mark condition. When a
cleared. Cases 3 and 4 demonstrate the use of CA2 as an character is to be transmitted, the ACIA’s serial output is put
additional output, set or cleared under program control. in a low state (a mark-to-space transition) for a period of one
bit time. The bit time is the reciprocal of the rate at which
successive serial bits are transmitted and is measured in Baud.
10.4.2 The serial interface In the case of a two-level binary signal, the Baud corresponds
to bits/s. The initial bit is called the start bit and tells the
We now describe the serial interface device that connects a receiver that a stream of bits, representing a character, is
computer to a modem or a similar device. Although the serial about to be received. If data is transmitted at 9600 Baud, each
interface was once used to connect PCs to a wide range of bit period is 1/9600 0.1042 ms.
external peripherals, the USB and FireWire interfaces have During the next seven time slots (each of the same dura-
largely rendered the serial interface obsolete in modern PCs. tion as the start bit) the output of the ACIA depends on the
Serial data transmission is used by data transmission sys- value of the character being transmitted. The character is
tems that operate over distances greater than a few meters transmitted bit by bit. This data format is called non-return to
and Chapter 14 will have more to say on the subject of data zero (NRZ) because the output doesn’t go to zero between
transmission. Here we’re more interested in the asynchronous individual bits. After the character has been transmitted, a
communications adapter (ACIA) interface, which connects a further two bits (a parity bit and a stop bit) are appended to
CPU to a serial data link. the end of the character.
The serial interface transfers data into and out of the CPU At the receiver, a parity bit is generated locally from the
a bit at a time along a single wire; for example, the 8-bit value incoming data and then compared with the received parity
101100012 would be sent in the form of eight or more pulses bit. If the received and locally generated parity bits differ, an
one after the other. Serial data transfer is slower than the par- error in transmission is assumed to have occurred. A single
allel data transfer offered by a PIA, but is inexpensive because parity bit can’t correct an error once it has occurred, nor
it requires only a single connection between the serial inter- detect a pair of errors in a character. Not all serial data trans-
face and the external world (apart from a ground-return). mission systems employ a parity bit error detector.
We are not concerned with fine details of the ACIA’s inter- The stop bit (or optionally two stop bits) indicates the end
nal operation, but rather in what it does and how it is used to of the character. Following the reception of the stop bit(s),
transmit and receive serial data. When discussing serial trans- the transmitter output is once more in its mark state and is
mission we often use the term character to refer to a unit of ready to send the next character. The character is composed
data rather than byte, because many transmission systems are of 10 bits but contains only 7 bits of useful information.
designed to transmit information in the form of ISO/ASCII- The key to asynchronous data transmission is that once the
encoded characters. receiver has detected a start bit, it has to maintain synchroniza-
Figure 10.29 demonstrates how a 7-bit character is trans- tion only for the duration of a single character. The receiver
mitted bit by bit asynchronously. During a period in which examines successive received bits by sampling the incoming
Mark
Space
T
Start 7 data bits Parity Stop
bit bit bit
One character
Mark
Space
7 data bits
Figure 10.29 Format of
Start 1 0 1 1 0 0 1 0 Stop an asynchronous serial
bit Parity bit bit character.
10.4 Parallel and serial interfaces 429
signal at the center of each pulse. Because the clock at the one output). The ACIA’s request to send (RTS) output may be
receiver is not synchronized with the clock at the transmitter, set or cleared under software control and is used by the ACIA
each received data bit will not be sampled exactly at its center. to tell the modem that it is ready to transmit data to it.
Figure 10.30 provides the internal arrangement of a typical The two active-low inputs to the ACIA are clear-to-send
ACIA, a highly programmable interface whose parameters (CTS) and data-carrier-detect (DCD). The CTS input is a sig-
can be defined under software control. The ACIA has a single nal from the modem to the ACIA that inhibits the ACIA from
receiver input pin and a single transmitter output pin. transmitting data if the modem is not ready (because the tele-
phone connection has not been established or has been bro-
The ACIA’s Peripheral side pins ken). If the CTS input is high, a bit is set in the ACIA’s status
The ACIA communicates with a peripheral via seven pins, register, indicating that the modem (or other terminal equip-
which may be divided into three groups: receiver, transmitter, ment) is not ready for data.
and modem control. At this point, all we need say is that the The modem uses the ACIA’s DCD input to tell the ACIA
modem is a black box that interfaces a digital system to the that the carrier has been lost (i.e. a signal is no longer being
public switched telephone network and therefore permits received) and that valid data is no longer available at the
digital signals to be transmitted across the telephone system. receiver’s input. A low-to-high transition at the DCD input
A modem converts digital signals into audio (analog) tones. sets a bit in the status register and may also initiate an inter-
We’ll look at the modem in more detail in Chapter 14. rupt if the ACIA is so programmed. In applications of the
Receiver The receiver part of the ACIA has a clock input and ACIA that don’t use a modem, the CTS and DCD inputs are
a serial data input. The receiver clock is used to sample the connected to a low level and not used.
incoming data bits and may be 64, 16, or 1 times that of the
The ACIA’s internal registers
bit rate of the received data; for example, an ACIA operating
at 9600 bits/s might use a 16 receiver clock of 153 600 Hz. The ACIA has four internal registers: a transmitter data regis-
The serial data input receives data from the peripheral to ter (TDR), a receiver data register (RDR), a control register
which the ACIA is connected. Most systems require a special (CR), and a status register (SR). Because the ACIA has a sin-
interface chip between the ACIA and the serial data link to gle register-select input RS, only two internal registers can be
convert the signal levels at the ACIA to the signal levels found directly accessed by the CPU. Because the status and receiver
on the data link. data registers are always read from, and the transmitter data
register and control register are always written to, the ACIA’s
Transmitter The transmitter part of the ACIA has a clock —
R/W input distinguishes between the two pairs of
input from which it generates the timing of the transmitted registers. The addressing arrangement of the ACIA is given in
data pulses. Table 10.9.
Modem control The ACIA communicates with a modem or The control register is a write-only register that defines the
similar equipment via three active-low pins (two inputs and operational properties of the ACIA, particularly the format of
430 Chapter 10 Buses and input/output mechanisms
the transmitted or received data. Table 10.10 defines the con- is initialized by first writing ones into bits CR1 and CR0 of the
trol register’s format. The counter division field, CR0 and control register, and then writing one of the three division
CR1, determines the relationship between the transmitter and ratio codes into these positions. In the majority of systems
receiver bit rates and their respective clocks (Table 10.11). CR1 0 and CR0 1 for a divide by 16 ratio.
When CR1 and CR0 are both set to one, the ACIA is reset The word select field, CR2, CR3, CR4, defines the format of
and all internal status bits, with the exception of the CTS and the received or transmitted characters. These three bits allow the
DCD flags, are cleared. The CTS and DCD flags are entirely selection of eight possible arrangements of number of bits per
dependent on the signal level at the respective pins. The ACIA character, type of parity, and number of stop bits (Table 10.12).
For example, if you require a word with 8 bits, no parity, and 1
—
stop bit, control bits CR4, CR3, CR2 must set to 1, 0, 1.
RS R/W Type of register ACIA register The transmitter control field, CR5 and CR6, determines
the level of the request to send (RTS) output, and the genera-
0 0 Write only Control
tion of an interrupt by the transmitter portion of the ACIA.
0 1 Read only Status
Table 10.13 gives the relationship between these controls bits
1 0 Write only Transmitter data
and their functions. RTS can be employed to tell the modem
1 1 Read only Receiver data that the ACIA has data to transmit.
The transmitter interrupt mechanism can be enabled or
Table 10.9 Register selection scheme of the ACIA. disabled depending on whether you are operating the ACIA in
an interrupt-driven or in a polled data mode. If the transmit-
ter interrupt is enabled, a transmitter interrupt is generated
whenever the transmitter data register (TDR) is empty, signi-
7 6 5 4 3 2 1 0 fying the need for new data from the CPU. If the ACIA’s clear-
Receive Transmitter control Word select Counter division to-send input is inactive-high, the TDR empty flag bit in the
interrupt status register is held low, inhibiting any transmitter interrupt.
enable The effect of setting both CR6 and CR5 to a logical one
requires some explanation. If both these bits are high, a break
Table 10.10 Format of the ACIA’s control register. (space level) is transmitted until the bits are altered under
software control. A break can be used to generate an interrupt
at the receiver because the asynchronous format of the serial
data precludes the existence of a space level for more than
CR1 CR0 Division ratio about 10 bit periods.
The receiver interrupt enable field consists of bit CR7
0 0 1
which, when clear, inhibits the generation of interrupts by the
0 1 16 receiver portion of the ACIA. Whenever bit CR7 is set, a
1 0 64 receiver interrupt is generated by the receiver data register
1 1 Master reset (RDR) flag of the status byte going high, indicating the pres-
ence of a new character ready for the CPU to read. A receiver
Table 10.11 Relationship between CR1, CR0, and the division ratio. interrupt can also be generated by a low-to-high transition at
the data-carrier-detect (DCD) input, signifying the loss of a
carrier. CR7 is a composite interrupt enable bit. It is impossi-
CR4 CR3 CR2 Word length Parity Stop bits Total bits ble to enable either an interrupt caused by the RDR being
empty or an interrupt caused by a positive transition on the
0 0 0 7 Even 2 11 DCD pin alone.
0 0 1 7 Odd 2 11
CR6 CR5 $RTS$ Transmitter interrupt
0 1 0 7 Even 1 10
0 1 1 7 Odd 1 10 0 0 Low Disabled
1 0 0 8 None 2 11 0 1 Low Enabled
1 0 1 8 None 1 10 1 0 High Disabled
1 1 0 8 Even 1 11 1 1 Low Disabled—a break level is placed on
1 1 1 8 Odd 1 11 the transmitter output
Table 10.12 The word select bits. Table 10.13 Function of transmitter control bits CR5, CR6.
10.4 Parallel and serial interfaces 431
Configuring the ACIA Bit 1—transmitter data register empty (TDRE) This flag is
The following 68000 assembly language listing demonstrates the transmitter counterpart of RDRF. A logical 1 in TDRE
how the ACIA is initialized before it can be used to transmit indicates that the contents of the transmitter data register
and receive serial data. (TDR) have been transmitted and the register is now ready for
new data. The IRQ bit is
also set whenever the
TDRE flag is set if the trans-
mitter interrupt is enabled.
The TDRE bit is 0 when
the TDR is full, or when the
CTS input is high, indicat-
ing that the terminal equip-
ment is not ready for data.
The fragment of code
below demonstrates how
the TDRE flag is used when
the ACIA is operated in a
polled output mode.
The status register The
status register has the
same address as the
control register, but is
distinguished from it by
being a read-only regis-
ter. Table 10.14 gives the
format of the status
register. Let’s look at the
function of these bits.
Bit 0—receiver data
register full (RDRF)
When set the RDRF bit
indicates that the
receiver data register is full and a character has been received.
Bit 2—data carrier detect (DCD) The DCD bit is set when-
If the receiver interrupt is enabled, the interrupt request flag,
ever the DCD input is high, indicating that a carrier is not
bit 7, is also set whenever RDRF is set. Reading the data in the
present. The DCD pin is normally employed only in conjunc-
receiver data register clears the RDRF bit. Whenever the DCD
tion with a modem. When the signal at the DCD input makes
input is high, the RDRF bit remains at a logical zero, indicat-
a low-to-high transition, the DCD bit in the status register is
ing the absence of any valid input.
set and the IRQ bit is also set if the receiver interrupt is
The RDRF bit is used to detect the arrival of a character
enabled. The DCD bit remains set even if the DCD input
when the ACIA is operated in a polled input mode.
returns to a low state. To clear the DCD
bit, the CPU must read the contents of
the ACIA’s status register and then the
contents of the data register.
Bit 3—clear to send (CTS) The CTS bit
directly reflects the status of the ACIA’s
CTS input. A low level on the CTS input
indicates that the modem is ready for
data. If the CTS bit is set, the transmitter
data register empty bit is inhibited
(clamped at zero) and no data may be
transmitted by the ACIA.
432 Chapter 10 Buses and input/output mechanisms
10 instructions/s. How many instructions can be used in an 10.25 In the context of the PIA, what is a data direction register
interrupt handling routine if the overall interrupt handling (DDR), and how is it used?
efficiency is to be greater than 70%? 10.26 How does the PIA use its CA1 and CA2 control lines to
10.21 What is DMA and why is it so important in high implement handshaking?
performance systems?
10.27 How are the characters transmitted over a serial data link
10.22 What are the advantages and disadvantages of divided into individual characters and bits?
memory-mapped I/O in comparison with dedicated I/O that
uses special instructions and signals? 10.28 What are the functions of the ACIA’s DCD and CTS
inputs?
10.23 The PIA has six internal registers and two register select
lines. How does the PIA manage to select six registers with only 10.29 What is the difference between a framing error and an
two lines? overrun error?
10.24 Can you think of any other way of implementing a 10.30 The 68K’s status register (SR) contains the value $2601.
register select scheme (other than the one used by the PIA)? How is this interpreted?
Computer peripherals 11
CHAPTER MAP
INTRODUCTION
Humans communicate with each other by auditory and visual stimuli; that is, we speak, gesticulate,
write to each other, and use pictures.You would therefore expect humans and computers to
communicate in a similar way. Computers are good at communicating visually with people; they
can generate sophisticated images, although they are rather less good at synthesizing natural-
sounding speech. Unfortunately, computers can’t yet reliably receive visual or sound input directly
from people. Hardware and software capable of reliably understanding speech or recognizing visual
input does not yet exist—there are systems that can handle speech input and systems that can
recognize handwriting, but the error rate is still too large for general-purpose use.1 Consequently,
people communicate with computers in a different way than they communicate with other people.
The keyboard and video display are the principal input and output devices used by personal
computers. The terms input and output refer here to the device as seen from the CPU; that is, a
keyboard provides an output, which, in turn, becomes the CPU’s input.
The CRT (cathode ray tube) display is an entirely electronic device that’s inexpensive to produce.
It is cheap because it relies on semiconductor technology for its electronics and on tried-and-tested
television technology for its display. By 2000 the more compact but expensive LCD panel was
beginning to replace the CRT display. Less than 4 years later, the trend had accelerated and large, high-
quality, high-resolution LCD displays were widely available. By 2005 CRT displays were in decline.
This chapter looks at keyboards, pointers, displays, and printers. We also look at input devices
that do more than communicate with people; we show how physical parameters from
temperature and pressure to the concentration of glucose in blood can be measured.
The second part of this chapter demonstrates how the digital computer interacts with the
analog world by converting analog values into digital representations and vice versa. At the end of
this chapter we provide a very brief insight into how a computer can be used to control real-world
analog systems with the aid of digital signal processing (DSP).
1
You can buy speech recognition programs but they have to be
trained to match a particular voice. Even then, their accuracy is less than
perfect. Similarly, hand-held computers provide handwriting recogni-
tion provided you write in a rather stylized way.
436 Chapter 11 Computer peripherals
against the pressure of a spring. As it moves down, the membrane switch can be hermetically sealed for ease of
plunger forces two wires together to make a circuit—the cleaning and is well suited to applications in hazardous or
output of this device is inherently binary (on or off). A small dirty environments (e.g. mines). Equally, the membrane
stainless steel snap-disk located between the plunger and base switch suffers all the disadvantages of other types of low-cost
of the switch produces an audible click when bowed down- mechanical switch.
wards by the plunger. A similar click is made when the Another form of mechanical switch employs a plunger
plunger is released. This gives depressing a keyswitch a with a small magnet embedded in one end. As this magnet is
positive feel because of its tactile feedback. pushed downwards, it approaches a reed relay composed of
Figure 11.3(b) describes the membrane switch, which two gold-plated iron contacts in a glass tube. These contacts
provides a very-low-cost mechanical switch for applications become magnetized by the field from the magnet, attract
such as microwave oven control panels. A thin plastic each other, and close the circuit (Fig. 11.3(c)). Because the
membrane is coated with a conducting material and spread contacts are in a sealed tube, the reed relay is one of the most
over a printed circuit board. Either by forming the plastic reliable types of mechanical switches.
membrane into slight bubbles or by creating tiny pits in the
PCB, it is possible to engineer a tiny gap between contacts on Non-mechanical switches
the PCB and the metal-coated surface of the membrane. Non-mechanical switches have been devised to overcome
Pressure on the surface of the membrane due to a finger some of the problems inherent in mechanical switches. Three
pushes the membrane against a contact to close a circuit. The of the most commonly used non-mechanical switches are the
5V Although the mechanical switch has some excellent ergonomic properties, it has
rather less good electrical properties. In particular, the contacts get dirty and
make intermittent contact or they tend to bounce when brought together,
producing a series of pulses rather than a single, clean make. This effect is called
contact bounce. You can eliminate the effects of contact bounce by connecting
R Gate Q the switch to the S input of an RS flip-flop. When the switch first closes, the flip-
G1
flop is set and its Q output goes high. Even if the contacts bounce and S goes low,
the Q output remains high.
Bounce
gnd Gate 1 ON
S G2 Q
switch
0 OFF
1 ON
Q
5V 0 OFF
Plunger
Spring
(a) The switch. (b) The membrane switch. (c) The reed relay.
Figure 11.3 The mechanical switch: (a) basic switch, (b) membrane switch, (c) reed relay.
11.1 Simple input devices 439
+5 V
11.1.2 Pointing devices
Pull-up
Output port resistors Although the keyboard is excellent for inputting text, it can’t
d0 0 0 LSB be used efficiently as a pointing device to select an arbitrary
d1 1 1
This button is pushed point on the screen. PCs invariably employ one of three
d2 1 1 and the corresponding
d3 1 input bit pulled low pointing devices: the joystick, the mouse, and the trackball
1
d4 1 1
(Fig. 11.7). Portable computers normally use either an eraser
d5 1 1 pointer or a trackpad as a pointing device in order to conserve
d6 1 1 space.
d7 1 1 MSB The joystick is so called because it mimics the pilot’s
d7 d6 d5 d4 d3 d2 d1 d0 joystick. A joystick consists of a vertical rod that can be
1 1 1 1 1 1 1 0 moved simultaneously in a left–right or front–back direction.
MSB Input port sees LSB The computer reads the position of the stick and uses it to
11111110
move a cursor on the screen in sympathy. You don’t look at
Figure 11.6 State of the keyboard encoder with one key pressed. the joystick when moving it; you look at the cursor on the
MODERN MICE
Nothing stands still in the world of the PC. The humble mouse interface to the USB serial interface to wireless interfaces that
has developed in three ways. Its sensing mechanism has allow cordless mouse operation.
developed from crude mechanical motion sensors to precise The gyro mouse is the most sophisticated of all computer
optical sensors. Its functionality has increased with the mice and is both cordless and deskless. You can use it in space
addition of two buttons (left click and right click) to permit simply by waving it about (as you might do when giving a pre-
selection followed by a third button and a rotating wheel that sentation to an audience). The gyro mouse employs solid-
let you scroll through menus or documents. Some mice let you state gyroscopic sensors to determine the mouse’s motion in
move the wheel left or right to provide extra capacities. Finally, space. Consequently, you are not restricted to dragging it
the mouse interface has changed from the original PS2 serial across a surface.
light beam off the grid and counts the number of horizontal
and vertical lines crossed as the mouse moves about the pad.
An optical mouse does not require intimate contact and
friction with the surface, although it does require that the
surface have an optical texture—you can’t use a mirror. The
resolution of an optical mouse is higher then that of a
LED
Dal mechanical mouse.5
Second-generation optical mouse technology uses a
light-emitting diode to illuminate the surface underneath the
mouse. Light reflected back from the surface is picked up and
focused on a sensor array (Fig. 11.9). The image from the
array consists of a series of pixels. As the mouse moves, the
Phototransistor image changes unless the surface beneath the mouse is blank.
A special-purpose signal-processing chip in the mouse com-
(a) Ball and disk arrangement
pares consecutive surface images and uses the difference
between them to calculate the movement of the mouse in the
The time difference
x and y planes. A new image is generates once every 1/2000 s.
between the main and
quadrature outputs determines This type of optical mouse has no moving parts to wear out
the direction of rotation. or become clogged with dust. The second-generation optical
mouse is a miracle of modern engineering because it has an
Main
image sensing system, an image processing system, and a
output computer interface in a package that sells for a few dollars.
+V supply between the aircraft and pilot. The force feedback joystick
X motion Y motion
combines a joystick with motors or solenoids that generate a
Slider connected to computer-controlled force on the joystick as you move it.
joystick by mechanical Now, the computer can restore the tactile feedback lacking in
linkages conventional joysticks.
Glass envelope
Heater Cathode
Phosphor
coating
Electron
beam
Grid Focus
electrodes Deflection coils
Conductive coating
The document scanner was once a very expensive device Figure 11.11 describes the construction of the cathode ray
and has now become a very-low-cost but high-precision tube (CRT).6 It is a remarkably simple device that uses a tech-
input device. A document is placed on a glass sheet that forms nology discovered early in the twentieth century. The CRT is
the top of a largely empty box. A light source on tracks is a glass tube from which all the air has been removed. A wire
moved along the length of the document and photo sensors coil, called a heater, is located at one end of the CRT and
read light intensity along a scan line. The scanner is able to becomes red-hot when a sufficiently large current flows
create a high-resolution color image of the document being through it—exactly like the element in an electric fire. The
scanned (typically 2400 dpi). Indeed, the accuracy of some heater raises the temperature of a cylinder, called the cathode,
document scanners is sufficient to give national mints a which is coated with a substance that gives off electrons when
headache because of the potential for forgery. Some bank- it is hot. The negatively charged electrons leaving the surface
notes are designed with color schemes that are difficult to of the cathode are launched into space unimpeded by air
copy precisely by document copiers. molecules because of the high vacuum in the CRT.
When the negatively charged electrons from the CRT’s
cathode boil off into space, they don’t get very far because
11.2 CRT, LED, and plasma displays they are pulled back to the positively charged cathode. To
overcome the effect of the positive charge on the cathode, the
General-purpose computers communicate with people via a surface and sides of the glass envelope at the front of the CRT
screen, which may be a conventional CRT or a liquid crystal are coated with a conducting material connected to a very
display. We begin by describing the cathode ray tube (CRT) high positive voltage with respect to the cathode. The high
display that lies at the heart of many display systems. A CRT positive voltage (over 20 000 V attracts electrons from the
is little more than a special type of the vacuum tube that was cathode to the screen. As the electrons travel along the length
once used in all radios and TVs before they were replaced by of the CRT, they accelerate and gain kinetic energy. When
transistors. they hit phosphors coating the front of the screen, their energy
The way in which human visual perception operates is is dissipated as light. The color and intensity of the light
important to those designing displays; for example, we can depend on chemical characteristics of the phosphor coating
see some colors better than others, we cannot read text if it is and the speed and quantity of the electrons. For the time
too small nor can we read it rapidly if it is too large. Colors being, we will assume that the composition of the phosphor
themselves are described in terms of three parameters: hue is gives out a white light; that is, the display is black and white or
determined by the wavelength of the light, saturation is deter- monochrome.
mined by the amount of white light present in the color, and The beam of electrons from the cathode flows through a
intensity is determined by the brightness of the light. Objects series of cylinders and wire meshes located near the cathode.
on a screen are viewed against background objects—the Using the principle that like charges repel and unlike charges
luminosity of an object in comparison with its background is
called its contrast. All these factors have to be taken into 6
The cathode ray tube was originally invented in Germany by Karl
account when designing an effective display. Braun in 1910 and later developed by Vladimir Zworykin in 1928.
11.2 CRT, LED, and plasma displays 445
attract, various electrical potentials are applied to these at the cost of ever more sophisticated focusing mechanisms.
cylinders and meshes to control the flow of the beam from the The CRT’s screen is not square; its width:height or aspect
cathode to the screen and to focus the electrons to a tight ratio is the same as a television, 4:3.
spot—the smaller the spot, the better the resolution of the dis- The CRT is an analog device employing electrostatic and
play. The cathode and focusing electrodes are called a gun. electromagnetic fields to focus an electron beam to a point on
A wire mesh called a control grid is placed in the path of the a screen. The engineering problems increase rapidly with the
electron beam and connected to a negative voltage with size of the screen and large CRTs are difficult to construct and
respect to the cathode. The stronger the negative voltage on expensive. The weight of the CRT also increases dramatically
the grid, the more the electrons from the cathode are with screen size. In the early 1990s the cost of a 17-inch screen
repelled, and the fewer get through to the screen. By changing was about four times that of a 14-inch screen and a 19-inch
or modulating the voltage on the grid, the number of elec- screen cost over 10 times as much as a 14-inch screen. The
trons hitting the screen and, therefore, the brightness of the CRT was one of the last components of the computer to expe-
spot, can be controlled. rience falling prices. However, by the late 1990s the price of
CRTs had dramatically fallen; not least because of competi-
tion with LCD displays. By 2003, the LCD was beginning to
11.2.1 Raster-scan displays replace the CRT in domestic TV displays.
Two scanning coils at right angles (called a yoke) are placed
around the neck of the CRT. Passing a current through one
coil creates a magnetic field that deflects the beam along the 11.2.2 Generating a display
horizontal axis and passing a current through the other coil
causes a deflection along the vertical axis. These coils let you The next step is to explain how an image is generated.
deflect the beam up–down and left–right to strike any point Figure 11.13 provides a more detailed description of the raster-
on the screen. scan display based on the CRT. A sawtooth waveform is applied
The magnetic field in the coil that deflects the beam in the to the vertical scan coils of a CRT to cause the spot to move from
horizontal axis is increased linearly to force the spot to trace the top of the screen to the bottom of the screen at a constant
out a horizontal line across the face of the CRT. This line is linear rate. When the spot reaches the bottom of the screen, it
called a scan line or a raster. When the beam reaches the right- rapidly flies back to the top again. At the same time, a second
hand side, it is rapidly moved to the left-hand edge, ready for sawtooth waveform is applied to the horizontal scanning coils
the next horizontal scan. to cause the beam to scan from the left-hand side to the right-
While the beam is being scanned in the horizontal direc- hand side before flying back again.A negative pulse is applied to
tion, another linearly increasing current is applied to the the grid during the flyback period to turn off the beam.
vertical deflection coils to move the beam downward. The rate As the beam is scanned across the surface of the screen,
at which the beam moves vertically is a fraction of passing every point, the voltage on the grid can be changed to
the rate at which it moves horizontally. During the time it modify the spot’s brightness. Although at any instant a TV
takes the beam to scan from top to bottom, it makes hundreds screen consists of a single spot, the viewer perceives a com-
of scans in the horizontal plane. A scan in the vertical axis is plete image for two reasons. First, the phosphor coating con-
called a frame. Figure 11.12(a) shows the combined effects of tinues to give out light for a short time after the beam has
the fast horizontal and slow vertical scans—eventually, the struck it, and, second, a phenomenon called the persistence of
beam covers or scans the entire surface of the screen. vision causes the brain to perceive an image for a short time
As the beam scans the surface of the screen, the voltage on after it has been removed.
the grid is varied to change the brightness of the spot to draw
an image. Figure 11.12(b) demonstrates how the letter ‘A’ can
be constructed by switching the beam on and off as it scans Raster scan
the screen. The scanning process is carried so rapidly that the
human viewer cannot see the moving spot and perceives a
continuous image. Typically, the horizontal scan rate is in the
One frame
region of 31 000 lines/s and the vertical scan rate is 50 to
100 fields/s. We will return to the scanning process later when
we describe the system used to store images.
The simplest CRT screen would be a hemisphere, because (a) The raster scan display. (b) Modulating the beam
any point on its surface is a constant distance from the focus- to create an image.
ing mechanism. Such a screen is unacceptable, and, over the
years, the CRT screens have become both flatter and squarer Figure 11.12 The raster scan.
446 Chapter 11 Computer peripherals
A raster-scan display system can be constructed by display). The dot clock is fed into a divide-by-n circuit that pro-
mapping the screen onto memory. As the beam scans the duces a single pulse every time n dots along a row are counted.
physical display screen, the corresponding location in the It also produces a dot number in the range 0, 1, 2, . . . n 1.
computer memory is interrogated and the resulting value The output of the divide-by-n circuit is a pulse at the row (i.e.
used to determine the brightness of the spot. Figure 11.14 raster) rate, which is fed to a divide-by-m circuit.
provides a highly simplified arrangement of a system that The output of the divide-by-m circuit is a pulse at the
generates an n column by m row display. frame rate (i.e. a pulse for each complete scan of the screen).
In Fig. 11.14 a clock called a dot clock produces pulses at the This pulse is fed to the CRT’s control circuits and is used to
dot rate (i.e. it generates a pulse for each dot or pixel on the lock or synchronize the scanning circuits in the CRT unit with
the dot clock. The divide-by-m circuit
produces an output in the range 0, 1,
Deflection
coils Screen
Raster track 2, m 1 corresponding to the current
Grid row. The column and row address
CRT combiner takes the current column
One frame and row addresses from the two
dividers and generates the address of
Brightness the corresponding pixel in the video
memory. The pixel at this address is
Vy Vx fed to the CRT to either turn the beam
on (a white dot), or to turn the beam
t t off (no dot).
Vertical scan Horizontal A real display system differs from
circuit scan circuit that of Fig. 11.14 in several ways.
Probably the most important compo-
Figure 11.13 Details of the raster-scan display.
nent of the display generator is the
Dot number
(column address) Row and column Line number
address combiner (row address)
Screen
Address
m' rows
Video memory
Data
Video to CRT n columns
TV DISPLAYS
Because the computer display grew out of the television, let’s of 3121/2 lines. The total number of lines per frame is
look at some of the details of a TV display. In the USA a TV 2 3121/2 625.
image uses 60 vertical scans a second and each vertical scan A display composed of consecutive fields of odd and even
(called a field) is composed of 2621/2 lines. A frame is made up lines is called an interlaced display and is used to reduce
of two consecutive fields containing 2621/2 odd-numbered the rate at which lines have to be transmitted. However,
lines and 2621/2 even-numbered lines. The total number of interlaced displays are effective only with broadcast TV and
lines per frame is 2 2621/2 525. In Europe there are 50 generate unacceptable flicker when used to display text in a
vertical scans a second and each vertical scan is composed computing environment.
11.2 CRT, LED, and plasma displays 447
video memory (sometimes called VRAM), which holds the Figure 11.16 The video display card.
image to be displayed. Figure 11.15 shows the structure of a
dual-ported video memory. We call this memory dual ported board. High-performance computers use a plug-in display
because it can be accessed by both an external CPU and the card because a card can be optimized for video applications.
display generator simultaneously. The CPU needs to access Figure 11.16 illustrates the organization of a typical display
the video memory in order to generate and modify the image card. Because it’s necessary to transfer large volumes of data
being displayed. between the CPU and the video display, PCs have a special
One of the problems of video display design is the high rate interface slot that provides a very-high-speed data bus
at which its pixels are accessed. Consider a super VGA display between the display card and the CPU; this is called the
with a resolution of 1024 768 pixels and a refresh rate AGP bus.
(frame rate) of 70 Hz. In one second, the system must access It is impossible to cover display cards in any depth in this
1024 768 70 55 050 240 pixels. The time available to book. They are very complex devices that contain their own
access a single pixel is approximately 1/55 000 000 s 18 ns, massively powerful special-purpose processors with 128-bit-
which is too short for typical video memory. In practice even wide internal buses. These processors free the host processor
less time is available to access pixels, because some time is lost on the motherboard from display processing.
to left- and right-hand margins and the flyback. The video display memory forms part of the display card
A practical video display system reads a group of pixels and is not normally part of the processor’s memory space.
from video memory at a time and then sends them to the This means that video and processor memory can be accessed
CRT one at a time. Figure 11.15 shows how the video mem- in parallel. Some low-cost computers integrate the video con-
ory performs this operation. The address from the display troller on the motherboard and use system memory for dis-
generator selects a row of pixels that are loaded into a shift play memory.
register once per row clock. This arrangement means that the The original PC display was 640 480 pixels. Over the
video memory is accessed by the display generator only once years, the average size of mainstream PC displays has
per row, rather than once per pixel. Consequently, the mem- increased. By the 1990s most PCs had 1024 768 displays
ory doesn’t require such a low access time. The individual and Web applications frequently used 800 600 display for-
pixels of a row are read out of the shift register at the dot (i.e. mats. Today, displays with resolutions of 1280 1024 are
pixel) rate. A shift register is capable of much higher speed common and some LCD displays have a resolution of
operation than a memory. 1600 1200.
Modern display systems permit more sophisticated In practical terms, a 640 480 display can present a page
images than the simple on/off dot displays of a few years ago. of rather chunky text. A SXGA display with 1600 1200 pix-
Several video memories (called planes) are operated in els can display two word-processed pages side by side.
parallel, with each memory plane contributing one bit of Figure 11.17 illustrates the growth in the size (resolution) of
the current pixel. If there are q memory planes, the q bits displays for the PC.
can be fed to a q-bit digital-to-analog converter to generate
one of 2q levels of brightness (i.e. a gray scale), or they can be 11.2.3 Liquid crystal and plasma
used to select one of 2q different colors (we discuss color later).
displays
For a long time, the CRT remained the most popular display
PC display systems mechanism. The late 1980s witnessed the rapid growth of a
You can integrate a PC’s display electronics onto the mother- rival to the CRT, the liquid crystal display or LCD. By the
board or you can locate the display subsystem on a plug-in mid-1990s color LCD displays were widely found in laptop
448 Chapter 11 Computer peripherals
POLARIZING MATERIAL
The polarizing materials we use today were first produced by 90 with respect to the other, no light passes through the pair,
Edwin Land in 1932. Dr Land embedded crystals of idoquinine because one filter stops the passage of vertically polarized light
sulfate in a transparent film. These crystals are all oriented in and the other stops the passage of horizontally polarized light.
the same direction and form a grating that allows only light Dr Land marketed his filters under the trade name Polaroid
polarized in a specific direction to pass though the film. If you and later went on to develop the instant picture camera that
place two crossed polarizing films in series with one rotated at took pictures and developed the film while in the camera.
The liquid crystal display mimics the behavior of the CRT; An entire LCD display is made by creating rows and
that is, it creates a pixel that can be switched on or off. All we columns of cells like those of Fig. 11.20. Each cell is selected
have to do is to make a sandwich of a polarizing substance or addressed by applying a voltage to the row and the column
and a liquid crystal—light will pass through it if the liquid in which it occurs. The voltage is connected to each cell by
crystal is polarized in the same plane as the polarizing mater- depositing transparent conductors on the surface of the glass
ial. Otherwise, the two polarizing filters block the transmis- sandwich that holds the liquid crystals.
sion of light. Because an LCD cell can be only on or off, it’s impossible to
We now have all the ingredients of a flat panel display: a achieve directly different levels of light transmission (i.e. you
polarizing filter that transmits light polarized in one plane can display only black or white). However, because you can
only and a liquid crystal cell that can rotate the polarization rapidly switch a cell on and off, you can generate intermedi-
of light by 90 (or more) electronically. Figure 11.20 demon- ate light levels by modulating the time for which a cell trans-
strates the structure and operation of a single-pixel LCD cell. mits light. Typical LCDs can achieve 64 gray levels.
In Fig. 11.20 light is passed first through a polarizer LCD displays can be operated in one of two modes. The
arranged with its axis of polarization at 0, then through the reflective mode relies on the ambient light that falls on the
liquid crystal, and finally through a second polarizer at 90. If cell, whereas the transmissive mode uses light that passes
the liquid crystal does not rotate the plane of polarization of through the cell. Ambient light displays have a very low con-
the light, all the light is blocked by the second polarizer. If, trast ratio (i.e. there’s not a lot of difference between the on
however, an electrostatic field is applied to the two electrodes, state and the off state) and are often difficult to read in poor
the liquid crystal rotates the polarization of the light by 90 and light, but they do consume very little power indeed.
the light passes through the second polarizer. Consequently, a Reflective LCD displays are found in pocket calculators,
light placed behind the cell will be visible or invisible. low-cost toys, and some personal organizers. Displays operat-
ing in a transmissive mode using back-lighting are easier to
read, but require a light source (often a
Direction of
the field fluorescent tube) that consumes a con-
siderable amount of power. Ultimately,
it’s the power taken by this fluorescent
Alignment of ridges at top tube that limits the life of a laptop com-
puter’s battery.
When an electric field is applied
between the plates, the liquid crystal
molecules change their alignment and Plasma displays
line up in the direction of the field
Plasma technology was invented in the
1960s. The plasma display uses a flat
panel that shares many similarities with
the LCD display. Both displays consist
of an array of pixels that are addressed
Alignment of ridges at bottom by x, y coordinates. The LCD cell uses a
liquid crystal light-gate to switch a beam
Figure 11.19 The LCD cell with an applied electric field.
Light Light
Cell Electrode
(a) The liquid crystal cell has no effect on the (b) The liquid crystal cell rotates the polarization
light passing through it. The polarizing filters of the light by 90°. This allows the light to pass Figure 11.20 Displaying a
stop light passing through them. through the lower polarizing filter. pixel.
450 Chapter 11 Computer peripherals
of light on or off, whereas the plasma panel uses a tiny flores- green or blue when bombarded by UV light. By 2003 plasma
cent light to generate a pixel. panel production was rapidly increasing and plasma panels
Each pixel in a plasma display is a tiny neon light. A cell offered a realistic alternative to large CRT displays.
(i.e. pixel) contains a gas at a low pressure. Suppose a voltage Plasma technology offers significant advantages over LCD
can be applied to a pair of electrodes across the cell. At low volt- technology; in particular, plasma displays have a higher con-
ages, nothing happens. As the voltage increases, electrons are trast ratio (i.e. the ratio of black to white) and plasma panels
pulled off the atoms of the gas in the cell by the electric field. As are brighter because they operate by generating light rather
the voltage increases further, these electrons collide with other than by gating it through filters and a liquid crystal.
atoms and liberate yet more electrons. At this point an Figure 11.21 illustrates the structure of a plasma cell. The
avalanche effect occurs and a current passes through the cell. display is constructed from two sheets of glass separated by a
The passage of a current through the gas in a cell generates few hundred microns. Ribs are molded on one of the plates to
light or UV radiation. First-generation plasma display panels provide the cell structure. The phosphors that define the
used cells containing neon, which glows orange-red when colors are deposited on the surface of the glass sheet. When
energized. By about 2000 color plasma panels were beginning the unit is assembled it is filled with xenon at low pressure.
to appear; these use cells with phosphors, which glow red, A display is initiated by applying a voltage across two elec-
trodes to break down the gas and start the discharge. Once the
cell has been activated, a lower voltage is
Electrode applied to the keep-alive electrode to main-
tain the discharge.
Face plate Although plasma panels have advantages
over LCD displays, they also have disadvan-
tages. They consume more power than LCDs,
Rib Discharge which can cause problems with unwanted
Rib
heat. The phosphors gradually degrade and
Phosphor the display slowly loses its brightness over a
few years. Finally, the individual cells sufferer
Back plate Data electrode
a memory effect. If a cell is continuously ener-
gized for a long time, it can lose some inten-
Figure 11.21 The plasma display cell. sity. This means that a static picture can be
burnt onto the screen; you see this effect with
plasma displays in airports where the
Contrast
Dark Living ghostly details of some flights are perma-
ratio Office Outdoors
room room nently burnt onto the screen.
500
The contrast ratio of a display is the
ratio of the lightest and darkest parts of
the display. Although many regard the
400 plasma panel as having a better contrast
ratio than an LCD, the situation is more
complicated. Contrast ratio is affected
300 not only by the transmissivity of an LCD
LCD or the on/off light ratio of a plasma cell,
but also by ambient light. Figure 11.22
200 Plasma gives the contrast ratios of LCDs and
display plasma displays as a function of ambient
light. This demonstrates that the plasma
100 display is far superior in a dark room but
is less good in bright light.
ORGANIC DISPLAYS
Conventional electronics uses semiconductors fabricated from technology were in automobile instrument panels, digital
silicon or compounds such as gallium arsenide. Organic watches, and small, low-power displays in mobile phones.
semiconducting materials have been discovered that have The OLED uses organic materials that have weak intermole-
many interesting properties. Unfortunately, organic cular bonds that give them the properties of both semicon-
semiconducting materials are often difficult to work with. ductors and insulators. The organic molecules of an OLED are
They suffer from low electron and hole mobility and are sandwiched between conductors (anode and cathode). When
limited to low-speed applications. From a manufacturing a current flows through the molecules of the OLED, electron
standpoint, they are sensitive to high temperatures, cannot be and hole charge carriers recombine to give off light in a
soldered into conventional circuits, and degrade when process called fluorescence.
exposed to atmospheric moisture and oxygen. OLEDs can be constructed in the form of TOLEDs
On the other hand, organic electronics have remarkable (transparent OLEDs) for use in automobile or aircraft wind-
commercial potential. They offer the promise of flexible, shields to provide ‘head-up’ displays, or in the form of
paper-thin circuits and displays because they can be deposited FOLEDs (flexible OLEDs), which can be bent or rolled into
on flexible plastic foils rather than the heavy and fragile glass any shape.
surfaces required by LCDs. OLEDs (organic light-emitting By vertically stacking TOLEDs in layers with each
diodes) have far lower power consumption than LCDs. layer having a different color, you can create the SOLED
The OLED pioneered by Kodak in the mid-1980s, is already (stacked OLED), which forms the basis of a multicolor
in large-scale production. The first applications of this display.
y y = 0.5x
ing every pixel that passes through the line. In a practical sys-
software. There are two types of image: the bit-mapped image tem, relatively few pixels will lie exactly on the line and it is
and the parameterized image. All images in the video memory necessary to select pixels close to the line.
are bit mapped in the sense that each pixel in the display cor- Jack Bresenham invented a classic line-drawing algorithm
responds to a pixel in the video memory (strictly speaking, in 1965. A straight line can be expressed as ay bx c,
perhaps we should use the term pixel mapped). Photographs where x and y are variables, and a, b, and c constants that
and TV pictures are examples of bit-mapped images. define the line. The line’s slope is given by b/a and the point at
A parameterized image is defined in terms of an algorithm; which the line goes through the x origin is given by y c/a.
for example, you might describe a line as running from point For the sake of simplicity, consider a line that goes through
(4, 12) to point (30, 45), or as the equation y 4x 25 for the origin, so that ay bx, where a 1 and b 0.5.
9 x 70. The graphics software is responsible for taking Figure 11.24 illustrates how the line corresponding to this
the parameterized image and converting it into a bit-mapped equation goes through some pixels (shown black in
image in the display’s memory. We now look at the parame- Fig. 11.24). All other pixels are either above or below the line.
terized image because it can be specified with relatively few If the equation of the line is rearranged in the form
bits and it can easily be manipulated. ay bx 0, the pixels above the line (light in Fig. 11.24) sat-
Figure 11.23 demonstrates how a line is mapped onto a isfy the relation ay bx 0, and those below the line (dark in
display by evaluating the equation of the line and then select- Fig. 11.24) satisfy the relation aybx 0.
452 Chapter 11 Computer peripherals
The Bresenham algorithm draws lines with a slope x2, x2. At each step in drawing the line we increment the value
m b/a in the range 0 to 1. This algorithm evaluates the sign of the x coordinate by x_step. The corresponding change in
of aybx at regular intervals. By monitoring the sign of the y is x_step * (y2 y1)/(x2 x1).
function, you can determine whether you are above or below The Bresenham algorithm eliminates this calculation by
the line. The line is drawn from its starting point from, say, either making or not making a fixed step along the y axis.
left to right. Suppose we select a pixel somewhere along this
line. Bresenham’s algorithm tells us how to go about selecting
the next pixel. Figure 11.25 demonstrates that the new pixel is
selected to be either the pixel directly to the right of the
current pixel or the pixel both above and to the right of
the current pixel.
The algorithm evaluates the value of the function at the mid-
point between the two candidates for the new pixel along the
line. The pixel selected to be the next pixel is the one that lies
closest to the line, as Fig. 11.25 demonstrates. The details of the
Bresenham algorithm are more complex than we’ve described.
The algorithm must handle lines that don’t pass through the
origin and lines that don’t have slope in the range 0 to 1.
The following fragment of pseudocode implements the
Bresenham line-drawing algorithm. Assume that a straight
line with a slope between 0 and 1 is to be drawn from x1, y1 to
If the line’s slope is greater than 1, we can use the same
algorithm by simply swapping x and y.
Next pixel
Antialiasing
The Bresenham and similar algorithms
generate lines that suffer from a step-like
Line
Midpoint Midpoint appearance due to the nature of the line-
following mechanism. These steps are
Current pixel often termed jaggies and spoil the line’s
appearance. The effect of finite pixel
size and jaggies is termed aliasing, a
Current pixel Next pixel term taken from the world of signal
Line
processing to indicate the error intro-
(a) Select next pixel up and right. (b) Select next pixel right.
duced when analog signals are sampled
Figure 11.25 Searching along a line. at too low a rate to preserve fidelity.
Figure 11.26 demonstrates how we can
reduce the effects of aliasing.
The antialiasing technique in Fig. 11.26 uses pixels
with gray-scale values to create a line that appears less jagged
to the human eye. A pixel that is on the line is made fully black.
Pixels that are partially on the line are displayed as less than
100% black. When the human eye sees the line from a dis-
tance, the eye–brain combination perceives a line free of jag-
gies. That is, the brain averages or smoothes the image.
Now that we’ve looked at how images are created on a
screen, we examine how they are printed on paper.
parts, they are less reliable than the purely electronic devices dispensers. A similar printing mechanism uses black paper
such as CRT and LCD displays. coated with a thin film of shiny aluminum. When a needle
Like all the other subsystems of a PC, the printer has seen a electrode is applied to the surface and a large current passed
remarkable drop in its price over the last three decades. In through it, the aluminum film is locally vaporized to reveal
1976 a slow, unreliable dot matrix printer cost about $800 the dark coating underneath.
and by 2005 you could buy a laser printer for $150 and a Another method of printing involves spraying a fine jet of
photo-quality inkjet printer for about $60. ink at the paper. As this technique also includes the way in
Printers come in a wider range than other display devices which the character is selected and formed, it will be dealt
because there are more printer technologies. Mechanical with in detail later.
printers must perform the following basic functions. The hardware that actually prints the character is called the
print head. There are two classes of print head: the single print
1. Move the paper to a given line. head and the multiple print head found in line printers.
2. Move the print-head to a given point along a line. Typewriters employ a fixed print head and the paper and
3. Select a character or symbol to be printed. platen move as each new character is printed. A fixed print
head is unsuitable for high-speed printing, as the platen and
4. Make a mark on the paper corresponding to that
paper have a large mass and hence a high inertia, which
character.
means that the energy required to perform a high-speed car-
The first and last of these functions are relatively easy riage return would be prohibitive. Because the mass of the
to explain and are dealt with first. Depending on the print head is very much less than that of the platen, most
application, paper is available in single sheet, continuous roll, printers are arranged so that the paper stays where it is and
or fan-fold form. Paper can be moved by friction feed, in the print head moves along the line.
which the paper is trapped between a motor-driven roller One way of moving the print head is to attach it to a nut on
and pressure rollers that apply pressure to the surface of the a threaded rod (the lead screw). At the end of the rod is a step-
paper. As the roller (or platen) moves, the paper is dragged ping motor, which can rotate the rod through a fixed angle at
along with it. An alternative paper feeding mechanism is the a time. Each time the rod rotates the print head is moved left
tractor or sprocket feed where a ring of conical pins round or right (depending on the direction of rotation). In another
the ends of the platen engage in perforations along the arrangement the print head is connected to a belt, moved by
paper’s edge. As the platen rotates, the paper is accurately and the same technique as the paper itself. The belt passes
precisely pulled through the printer. The rise of the PC has between two rollers, one of which moves freely and one of
seen the decline of fan-fold paper. Today’s printers found in which is controlled by a stepping motor.
the home or small office use plain paper. Some banks and
similar institutions still employ fan-fold paper to print state-
ments (and other pre-printed forms). 11.3.2 The inkjet printer
Inkjet printers were originally developed for professional
applications and the technology migrated to low-cost PC
11.3.1 Printing a character
applications. The original continuous inkjet owes more to the
Printers are described by the way in which marks on the CRT for its operation than the impact printer. A fine jet of ink
paper are made; for example, the matrix printer, the inkjet is emitted from a tiny nozzle to create a high-speed stream of
printer, and the laser printer. The earliest method of marking ink drops. The nozzle is vibrated ultrasonically so that the ink
paper used the impact of a hard object against an ink-coated stream is broken up into individual drops. As each drop
ribbon, to make an imprint in the shape of the object. This is leaves the nozzle it is given an electrical charge, so that the
how the mechanical office typewriter operates. The tremen- stream of drops can be deflected electrostatically, just like the
dous reduction in the cost of laser and inkjet printers in the beam of electrons in a CRT. By moving the beam, characters
early 1990s rendered impact printers obsolete, except in can be written on to the surface of the paper. The paper is
specialized applications. arranged to be off the central axis of the beam, so that when
Non-impact printers form characters without physically the beam is undeflected, the ink drops do not strike the paper
striking the paper. The thermal printer employs special paper and are collected in a reservoir for re-use.
coated with a material that turns black or blue when heated to Continuous inkjet printers are high-speed devices, almost
about 110C. A character is formed by heating a combination silent in operation, and are used in high-volume commercial
of dots within a matrix of, typically, 7 by 5 points. Thermal applications. The original inkjet printer was very expensive
printers are very cheap, virtually silent in operation, and are and was regarded with suspicion because it had suffered a
used in applications such as printing receipts in mobile ticket number of problems during its development. In particular,
454 Chapter 11 Computer peripherals
they were prone to clogging of the nozzle. Many of the early Ink is forced through the holes by creating a shock wave in the
problems have now been overcome. reservoir that expels a drop of ink though the nozzle to
the paper.
Drop-on-demand printing One means of creating a shock wave is to place a thin film of
The modern drop-on-demand inkjet printer is much simpler piezoelectric crystal transducer in the side of the reservoir (see
than its continuous jet predecessor. In fact, it’s identical to a Fig. 11.27(a)). When an electronic field is applied across a
dot matrix printer apart from the print head itself. The print piezoelectric crystal, the crystal flexes. By applying an electrical
head that generates the inkjet also includes the ink reservoir. pulse across such a crystal in a print head, it flexes and creates
When the ink supply is exhausted the head assembly is the shock wave that forces a single drop of ink through one of
thrown away and a new head inserted. Although this looks the holes onto the paper (see Fig. 11.27(b)). Note that there is a
wasteful, it reduces maintenance requirements and increases separate crystal for each of the holes in the print head.
reliability. Some inkjet printers do have permanent print
heads and just change the ink cartridge.
In the 1980s inkjet printers had maximum resolution of
300 dpi (dots per inch) and by the late 1990s inkjet printers
with resolution of over 1400 dpi were available at remarkably
Exit point
low cost. In 2004 you could buy inkjet printers with a resolu- Piezoelectric Ink
drop Transducer
tion of 5700 dpi that could print photographs that were transducer flexes and
indistinguishable from conventional color photographs. expels ink
drop
Later we will look at the color inkjet printer, which created Ink reservoir
the mass market in desktop digital photography.
The drop-on-demand print head uses multiple nozzles, one (a) Structure of head assembly (b) Voltage pulse applied to
for each of the dots in a dot matrix array. The head comes into (only one nozzle shown). transducer to eject a drop
contact with the paper and there’s no complex ink delivery of ink.
and focusing system. The holes or capillary nozzles through
which the ink flows are too small to permit the ink to leak out. Figure 11.27 Structure of the ink jet.
Dot matrix printer A dot matrix printer forms characters from of plastic or metal and is very lightweight, giving it a low
a matrix of dots in much the same way as a CRT. The dots are inertia. A typical daisy wheel has 96 spokes, corresponding to
generated by a needle pressing an inked ribbon onto the paper the upper and lower case subsets of the ISO/ASCII code.
or the needles may be used with spark erosion techniques or The daisy wheel rotates in the vertical plane in front of the
may be replaced by heating elements in a thermal printer. The ribbon. As the wheel rotates, each of the characters passes
dot matrix printer was very popular in the 1970s and 1980s between a solenoid-driven hammer and the ribbon. When the
when it offered the only low-cost means of printing. desired character is at a print position, the hammer forces the
Cylinder, golf-ball and daisy-wheel printers The cylinder print spoke against the ribbon to mark the paper.
head is a metallic cylinder with symbols embossed around it.The Line printer A line printer prints a whole line of text at one go.
ribbon and paper are positioned immediately in front of the Line printers are expensive, often produce low quality output,
cylinder, and a hammer is located behind it.The cylinder is and are geared to high-volume, high-speed printing.
rotated about its vertical axis and is moved up or down until the A metal drum extends along the entire width of the paper
desired symbol is positioned next to the ribbon.A hammer, in front of the ribbon. The character set to be printed is
driven by a solenoid, then strikes the back of the cylinder, forcing embossed along the circumference of the drum. This character
the symbol at the front onto the paper through the inked ribbon. set is repeated, once for each of the character positions, along
The golf-ball head was originally used in IBM electric the drum. A typical line printer has 132 character positions
typewriters. Characters are embossed on the surface of a and a set of 64 characters. As the drum rotates, the rings of
metallic sphere. The golf-ball rotates in the same way as a characters pass over each of the 132 print positions, and a
cylinder, but is tilted rather than moved up or down to access complete set of characters passes each printing point once per
different rows of characters. The golf-ball is propelled towards revolution. A mark is made on the paper by a hammer hitting
the ribbon and the paper by a cam mechanism, rather than by the paper and driving it into the head through the ribbon. By
a hammer striking it at the back. controlling the instant at which the hammer is energized, any
The daisy-wheel printer has a disk with slender petals particular character may be printed. As there is one hammer
arranged around its periphery. An embossed character is per character position, a whole line may be printed during the
located at the end of each of these spokes. The wheel is made course of a single revolution of the drum.
11.3 The printer 455
XEROGRAPHY
Xerography has a long history. In 1935 a patent attorney, the light (i.e. the writing). Finally, a sheet of waxed paper was
Carlton Chester, had an idea for a cheap copying process that placed on the powder-covered plate and pressed against it.
didn’t require the wet and messy chemicals then used in A copy of the writing on the glass plate was now impressed
photography. While looking for a dry process that allowed on the wax paper.
photocopying, Chester turned his attention to the It took until 1944 for Chester to find someone who was
phenomenon of photoconductivity (i.e. the relationship interested in his invention—the Battelle Memorial Institute.
between light and the electrical conductivity of materials). He Battelle’s scientists rapidly discovered that selenium had far
was awarded a patent on electrophotography in 1937. better photoconductive properties than sulfur and that a
Chester’s first experiments used a metal plate covered with fine-pigmented resin could easily be fused onto the surface of
sulfur (a photoconductive material).The sulfur was electrically paper to create the copy.
charged by rubbing it, and then a glass plate was placed over it. Battelle develop the process further in conjunction with the
Chester wrote on the glass. In the next step, a bright light was Haloid Company. They decided that Chester’s electrophotog-
shone on the assembly for several seconds.The effect of the raphy was too cumbersome a term and asked a professor of
light was to cause the sulfur to conduct and permit the electro- Greek to come up with a better name. He suggested ‘dry
static charge to leak away to the metal plate under the sulfur. writing’ because the process did not involve liquids. The
The glass was removed and a fine power dusted on the sul- corresponding Greek word was xerography. Haloid changed its
fur-coated plate. This powder clung to the regions that name to Haloid Xerox in 1958 and then to the Xerox
retained their charge because they hadn’t been illuminated by Corporation in 1961.
Some inkjet printers employ a fine wire behind the nozzle the heart of a laser printer lies a precisely machined drum,
to instantaneously heat the ink to 300C, well above its boil- which is as wide as the sheet of paper to be printed. The secret
ing point, which creates bubble that force out a tiny drop. of the drum lies in its selenium coating.7 Selenium is an
These printers are called bubble jet printers. electrical insulator with an important property—when
Although inkjet printers are capable of high resolution selenium is illuminated by light, it becomes conductive.
with over 1000 dpi, the ink drops spread out on paper due to A photocopier works by first charging up the surface of the
the capillary action of fibers on the paper’s surface (this effect drum to a very high electrostatic potential (typically 1000 V
is called wicking). Specially coated paper considerably with respect to ground). By means of a complex arrangement
reduces the effect of wicking, although such paper is expen- of lenses and mirrors, the original to be copied is scanned by a
sive. Canon’s photo paper has a four-layered structure with a very bright light and the image projected onto the rotating
mirror-finished surface. The outer surface provides an drum. After one rotation, the drum contains an invisible
ink-absorption layer, consisting of ultrafine inorganic image of the original document. If the image is invisible we are
particles. By instantly absorbing the ink, this layer prevents entitled to ask ourselves, ‘What form does this image take?’
ink from spreading and preserves the round ink dots. The Black regions of the source document reflect little light and
second layer reflects light. The third layer is the same material the corresponding regions on the drum receive no light. The
used in conventional photographic paper. The bottom layer selenium coating in these regions is not illuminated, doesn’t
is a back coating, which counteracts the stresses placed on the become conducting, and therefore retains its electrical charge.
paper by the upper layers, to prevent curling. Light regions of the document reflect light onto the drum,
causing the coating to become conducting and to lose its
charge. In other words, the image on the drum is painted with
11.3.3 The laser printer an electrostatic charge, ranging from high voltage (black) to
zero voltage (white).
The dot matrix printer brought word processing to the masses One of the effects of an electrostatic charge is its ability to
because it produced acceptable quality text at a low cost. The attract nearby light objects. In the next step the drum is
laser printer has now brought the ability to produce high-quality rotated in close proximity to a very fine black powder called
text and graphics to those who, only a few years ago, could the toner. Consequently, the toner is attracted to those parts
afford no more than a medium-quality dot matrix printer. In of the drum with a high charge. Now the drum contains a
fact, the quality of the laser printer is sufficiently high to enable true positive image of the original. The image is a positive
a small office to create artwork similar to that once produced by image because black areas on the original are highly charged
the professional typesetter; that is, desktop publishing (DTP). and pick up the black toner.
The laser printer is just a photocopier specially modified to
accept input from a host computer. The principle of the pho- 7
Modern drums don’t use selenium; they use complex organic sub-
tocopier and the laser printer is both elegant and simple. At stances that have superior photo-electric properties.
456 Chapter 11 Computer peripherals
The drum is next rotated in contact with paper that has an Unlike the photocopier, the laser printer has no optical
even higher electrostatic charge. The charge on the paper imaging system. The image is written directly onto the
causes the toner to transfer itself from the drum to the paper. drum by means of an electromechanical system. As the drum
Finally, the surface of the paper is heat-treated to fix the toner rotates, an image is written onto it line by line in very much the
on to it. Unfortunately, not all toner is transferred from the same way that a television picture is formed in a cathode ray
drum to the paper. Residual toner is scraped off the drum by tube.
rotating it in contact with a very fine blade. Eventually, the Figure 11.28(a) illustrates the organization of the laser
drum becomes scratched or the selenium no longer functions scanner and Fig. 11.28(b) provides details of the scanning
properly and it must be replaced. In contrast with other print- mechanism. A low-power semiconductor laser and optical
ers, the laser printer requires the periodic replacement of system produces a very fine spot of laser light. By either vary-
some of its major components. Low-cost laser printers some- ing the intensity of the current to the laser or by passing the
times combine the drum and the toner, which means that the beam through a liquid crystal whose opacity is controlled
entire drum assembly is thrown away once the toner has been electronically (i.e. modulated), the intensity of the light spot
exhausted. This approach to printer construction reduces the falling on the drum can be varied.
cost of maintenance while increasing the cost of consumables. The light beam strikes a multi-sided rotating mirror. As the
mirror turret rotates, the side currently in the path of the light
beam sweeps the beam across the surface of the selenium-
coated drum. By modulating the light as the beam sweeps
across the drum, a single line is
Cleaning blade drawn. This scanning process is
(a)
(removes unused toner) rather like a raster-scan mechanism
found in a CRT display. After a line
Corona wire has been drawn, the next mirror in
(chares drum) the rotating turret is in place and a
new line is drawn below the previ-
ous line, because the selenium
Printed paper Heater Light from drum has moved by one line.
hopper optical system The combined motions of the
Drum rotating mirror turret and the
rotating selenium drum allow the
laser beam to scan the entire sur-
Toner face of the selenium drum. Of
course, the optical circuits required
to perform the scanning are very
precise indeed. The resolution
Feeder paper
hopper imposed by the optics and the
laser beam size provided low-cost
Mirror
first-generation laser printers with a
(b) resolution of about 300 dots per
inch. Such a resolution is suitable for
Laser light Drum moderately high-quality text but
source is not entirely suitable for high-
Modulator quality graphics. Second-generation
laser printers with resolutions of
600 or 1200 dpi became available in
the mid-1990s.
Direction
of scan
Not all laser printers employ the
same optical arrangement, because
the rotating mirror turret is
complex and requires careful
alignment. An alternative tech-
Rotating mirror drum
nique designed by Epson uses an
Figure 11.28 The laser printer. incandescent light source behind
11.4 Color displays and printers 457
a stationary liquid crystal shutter. The liquid crystal shutter and the new term photorealistic was coined to describe that
has a linear array of 2000 dots, each of which can be turned they were almost able to match the quality of color pho-
on and off to build up a single line across the drum. By writ- tographs. Before we discuss color printers we need to say a
ing a complete line in one operation, the only major moving little about the nature of color.
part involved in the scanning process is the photosensitive
drum itself. Technically, a laser printer without a laser scan- 11.4.1 Theory of color
ner isn’t really a laser printer. However, the term laser printer
Light is another type of electromagnetic radiation just like
is used to describe any printer that generates an image by
X-rays and radio waves. The eye is sensitive to electromag-
using an electrostatic charge to deposit a fine powder (the
netic radiation in the visible spectrum and light waves of dif-
toner) on paper.
ferent frequencies are perceived as different colors. This
Other ways of avoiding the complex rotating drum mirror
visible spectrum extends from violet to red (i.e. wavelengths
turret are a linear array of light-emitting diodes (LEDs) in an
from 400 nm to 700 nm). Frequencies lower than red are
arrangement similar to the liquid crystal shutter, or a CRT
called infra-red and higher than violet are called ultra-violet.
projection technique that uses a CRT to project a line onto
Both these frequencies are invisible to the human eye,
the photosensitive drum.
although they play important roles in our life.
Laser printers can print dot-map pictures; that is, each
A single frequency has a pure color or hue and we perceive
pixel of the picture is assigned a bit in the printer’s memory.
its intensity in terms of is brightness. In practice, we see few
A linear resolution of 300 dpi requires 300 300 90 000
pure colors in everyday life. Most light sources contain visible
dots/square inch. A sheet of paper measuring 11 inches by
radiation over a broad spectrum. If a light source contains
8 inches (i.e. 88 square inches) can hold up to 88 90
approximately equal amounts of radiation across the
000 7 720 000 dots or just under 1 Mbyte of storage.
entire visual spectrum we perceive the effect as white light.
Having introduced the principles of monochrome displays
In practice light often consists of a mixture of white light
and printers, we are going to look at how color displays and
together with light containing a much narrower range of fre-
printers are constructed.
quencies. The term saturation describes the ratio of colored
light to white light; for example, pink is unsaturated red at
11.4 Color displays and printers about 700 nm plus white light. An unsaturated color is some-
times referred to as a pastel shade.
It’s been possible to print color images for a long time, Most light sources contain a broad range of frequencies
although color printers were astronomically expensive until (e.g. sunlight and light from an incandescent lamp). Sources
relatively recently. Low-cost printers began to appear in the that generate a narrow band of visible frequencies are gas dis-
early 1990s (largely based on inkjet technology) although the charge lamps and LEDs; for example, the sodium light used
quality was suitable only for draft work. By the late 1990s to illuminate streets at night emits light with two very closely
high-quality low-cost color printers were widely available spaced wavelengths at about 580 nm (i.e. yellow).
COLOR TERMINOLOGY
Hue This describes the color of an object. The hue is largely CMY The cyan, magenta, yellow color space is used in
dependent on the dominant wavelength of light emitted from situations in which color is applied to a white background
or reflected by an object. such as paper. For example, an object appears yellow
Saturation This describes the strength of a color. A color may because it absorbs blue but reflects red and green. Suppose
be pure or it may be blended with white light; for example, you wanted to create blue using a CMY color space. Blue is
pink is red blended with white. white light with red and green subtracted. Because green is
absorbed by cyan and red is absorbed by magenta, combin-
Luminance This measures the intensity of light per unit area ing cyan and magenta leads to the absorption of green and
of its source. red; that is, blue.
Gamma This expresses the contrast range of an image. HSB The HSB model defines light in the way we perceive it
Color space This provides a means of encoding the color.The (H hue or color, S saturation, B brightness or intensity).
following color spaces are used to define the color of objects. Pantone matching system This is an entirely arbitrary and a
RGB The red, green, blue color space defines a color as the proprietary commercial system.A very large number of colors
amount of its red, blue, and green components. This color are defined and given reference numbers.You define a color
space is based on the additive properties of colors. by matching it against the colors in the Pantone system.
458 Chapter 11 Computer peripherals
Color LCDs
15-inch 270 200 mm 640 480 60
800 600 75 Color LCDs operate on a similar principle to the color CRT
1024 768 96
and generate a color pixel from three cells with three primary
colors. The individual cells of a color LCD display include
17-inch 315 240 mm 640 480 51 red, green, or blue filters to transmit light of only one color.
800 600 63 As in the case of the color CRT, the three primary colors are
1024 768 85 clustered to enable other colors to be synthesized by combin-
21-inch 385 285 mm 640 480 42 ing red, green, or blue. Color LCD displays are divided into
800 600 52 two types: active and passive. Both active and passive displays
1024 768 85 employ the same types of LCD cells—the difference lies in the
ways in which the individual cells of a matrix are selected.
1280 1024 84
The so-called passive liquid crystal matrix of Fig. 11.31
1600 1200 106
(this arrangement applies to both monochrome and color
displays) applies a large pulse to all the cells (i.e. pixels) of a
Table 11.2 Monitor size, image size, and resolution. given row. This pulse is marked 2V in Fig. 11.31 and is cur-
rently applied to row 2. A smaller pulse that may be either
positive or negative is applied to each of the columns in the
array. The voltage from the row and the column pulses are
applied across each cell in a row, and are summed to either
polarize the cell or to leave it unpolarized.
+V
–V This arrangement displays an entire row of
pixels at a time. Once a row has been drawn, a
pulse is applied to the next row, and so on.
row 0
The voltage across this cell is Each cell is connected to one row line and
2V – V = V
to one column line. In Fig. 11.32 a pulse of
row 1
level 2V is applied to one terminal of all the
cells in row 2 of the matrix. A pulse of
The voltage across this cell is
2V row 2
2V + V = 3V
level V or V is then applied in turn to
each of the column cells, 0, 1, 2, and 3. The
Selected cells voltage across each cell in the matrix must be
row 3
either 0 (neither row nor column selected), V
col 0 col 1 col 2 col 3 (row selected with 2V, column selected
with V), or 3V (row selected with 2V, col-
Figure 11.31 The passive matrix. umn selected with V). The matrix is
designed so that the 3V pulse across a cell is
sufficient to polarize the liquid crystal and
turn the cell on.
The passive matrix suffers from cross-talk
Column line
caused by the pulse on one row leaking into
cells on adjacent rows. Furthermore, if the
Detail matrix has m rows, each row is driven (i.e.
accessed) for only 1/m of the available time.
Row line TFT TFT TFT These limitations produce a display that has
Cell Cell Cell low contrast, suffers from smearing when mov-
Transistor ing images are displayed, and has a less bright
TFT TFT TFT Cell
switch image than the TFT active matrix alternative
Cell Cell Cell
to be described next. Although passive
matrix displays were popular in the 1990s,
improvements in active matrix manufacturing
Figure 11.32 The active matrix. technology have rendered them obsolete.
460 Chapter 11 Computer peripherals
A better arrangement is the active matrix of Fig. 11.32; the magenta and cyan dots. The saturation can be controlled by
cell structure is exactly the same as that of a passive display, leaving some of the underlying white paper showing through.
only the means of addressing a cell is different. A transistor, Adding the three subtractive primaries together doesn’t
which is simply an electronic switch, is located at the junction of produce a satisfactory black; it creates a dark muddy looking
each row and column line; that is, there is one transistor for color. Although the human eye is not terribly sensitive to slight
each cell. The transistor can be turned on or off by applying a color shifts, it is very sensitive to any color shift in black (black
pulse to its row and column lines. However, the electrical must be true black). Printers use a four-color model CMYK,
capacitance of each cell is able to store a charge and maintain where K indicates black. Including a pure black as well as the
the cell in the on or off condition while the matrix is address- three subtractive primaries considerably improves the image.
ing another transistor. That is, a transistor can be accessed and Printing color is much more difficult than displaying it on
turned on or off, and that transistor will maintain its state until a CRT. Each of the red, green, and blue beams can be modu-
the next time it is accessed. The active matrix array produces a lated to create an infinite range of colors (although, in
sharper and more vivid picture. The lack of cross-talk between practice, a digital system is limited to a finite number of
adjacent cells means that the active matrix suffers less smearing discrete colors). When you print an image on paper, you have
than the passive array equivalent. relatively little control over the size of the dot. Moreover, it’s
The transistors that perform the switching are not part of a not easy to ensure that the dots created from the different
silicon chip but are laid down in thin films on a substrate— subtractive primaries are correctly lined up (or registered).
hence the name TFT (thin film transistor). It takes You can generate different levels or shades of a color by dither-
3 1024 768 thin film transistors to make a XVGA active ing (a technique that can also be applied to black and white
matrix display, and, if just a few of these transistors are faulty, printers to create shades of gray).
the entire display has to be rejected. The manufacturing yield Dithering operates by dividing the print area into an array
of good arrays is not 100%, which means that the cost of a of, say, 2-by-2 matrices of 4 dots. Figure 11.33 provides a sim-
TFT active matrix array is considerably higher than the ple example of dithering in black and white. Because the dots
passive equivalent. in the matrices are so small, the eye perceives a composite
light level and the effect is to create one of five levels of gray
from black to white.
11.4.3 Color printers Dithering isn’t free. If you take a 3 3 matrix to provide
Color printers don’t employ the same RGB (red, green, blue) 10 levels of intensity, the effective resolution of an image is
principle used by the color CRT. Suppose we look at an object divided by three; for example, a 300 dpi printer can provide a
that we call red, which is illuminated by white light. The red resolution of 300 dpi or a resolution of 100 dpi with a 10-level
object absorbs part of the white light and reflects some of the gray scale. In other words, there’s a trade-off between resolu-
light to the observer. If all the light is reflected we call the tion and the range of tones that can be depicted.
object white and if all the light is absorbed we call the object The principle of dithering can be extended to error diffusion
black. However, if all frequencies are absorbed except red, we where dots are placed at random over a much larger area than
call if the object red. In other words, if we wish to print images the 2 by 2 matrix. This technique is suited to printing areas of
we have to consider what light is absorbed rather than what color that require subtle changes of gradation (e.g. skin tones).
light is generated (as in a CRT). An alternative approach to dithering that provides more
The RGB model is called additive because a color is created tones without reducing resolution is to increase the number
by adding three primary colors. The CMY (cyan, magenta, of colors. This technique was introduced by some manufac-
yellow) color model used in printing is called subtractive turers to provide the photorealism required to print the out-
because a color is generated by subtracting the appropriate put from digital cameras. One enhanced subtractive inkjet
components from white light. Cyan (blue–green) is the printer uses six inks: cyan, magenta, yellow, light magenta,
absence of red, magenta the absence of green, and yellow the light cyan, and black. The lighter colors make it possible to
absence of blue. Mixing equal amounts of cyan, magenta, and render skin tones more realistically. Another printer uses
yellow subtracts all colors from white light to leave black. To cyan, magenta, yellow, two types of black, plus red and blue.
create a color such as purple the printer generates a pattern of
Color inkjet printers
The color inkjet printer is virtually the same as the black and
0% black 25% black 50% black 75% black 100% black white counterpart. The only difference lies in the multiple
heads. Typical color printers use a separate black cartridge
and a combined color cartridge. Because the head and ink
reservoirs form a combined unit, the cartridge has to be
Figure 11.33 Dithering. thrown away when the first of the color inks runs out. Some
11.5 Other peripherals 461
printers use separate print heads and reservoirs and only the impregnates the paper with the wax. Dye sublimation can
ink cartridge need be replaced. create very-high-quality images on special paper. The cost of
Inkjet printer ink can be dye based or pigment based. A dye the consumables (waxed sheets and special paper) make sub-
is a soluble color dissolved in a solvent and is used by most limation printing much more expensive than inkjet printing.
printers. A pigment is a tiny insoluble particle that is carried in
suspension. Pigment-based inks are superior to dyes because The phase-change printer
pigments are more resistant to fading and can provide more The phase-change printer falls somewhere between the inkjet
highly saturated colors. Pigment-based inks have less favorable printer and the thermal wax printer. The basic organization is
characteristics from the point of view of the designer; for that of the inkjet printer. The fundamental difference is that
example, the pigments can separate out of the liquid. the print head contains a wax that is heated to about 90C to
Inkjet printers can be prone to banding, an effect where keep it in liquid form. The wax is bought in the form of sticks,
horizontal marks appear across the paper due to uneven ink which are loaded into the print head.
distribution from the print head. The print head itself uses a piezo-electric crystal to create a
Apart from its cost, the advantage of color inkjet printing pressure wave that expels a drop of the molten wax onto the
is the bright, highly saturated colors. However, good results paper. The drops freeze on hitting the paper, causing them to
can be achieved only with suitable paper. The type of plain adhere well without spreading out. You can print highly satu-
paper used in laser printers gives poor results. The drops of rated colors on plain paper. Because the paper is covered with
ink hit the paper and soak into its surface to mix with adja- lumpy drops, some phase-change printers pass the paper
cent drops. This mixing effect reduces the brightness and through pressure rollers to flatten the drops.
quality of the colors.
By about 2000, advances in inkjet printing, ink technology, The color laser
and inkjet papers and the advent of the digital camera had Color laser printers are almost identical to monochrome
begun to wipe out the large photographic industry based on printers. Instead of using a black toner, they use separate
traditional silver halide photographic paper, the optical camera, toners in each of the subtractive primary colors. The image is
and the developing, enlarging, and printing process. Moreover, scanned onto a drum using a toner with the first primary
the availability of digital-image processing programs such as color and then transferred to paper. The same process is
Photoshop gave amateurs a level of control over the photo- repeated three more times using a different color toner after
graphic image that only professionals had a few years ago. each scan. Advances in color laser technology have produced
color lasers that cost as much today in real terms as mono-
Thermal wax and dye sublimation printers chrome lasers did a decade ago. However, the consumables
The thermal wax printer is rather like the dot matrix printer for color lasers (i.e. three tones plus a block tone) are still
with heat-sensitive paper. The print head extends the length expensive.
of the paper and contains a matrix of thousands of tiny pixel-
size heaters. Instead of a ribbon impregnated with ink, a sheet
of material coated with colored wax is placed between the 11.5 Other peripherals
head and the paper. When the individual elements are heated
to about 80C, the wax is melted and sticks to the paper. An We’ve looked at several peripherals found in a personal com-
entire line of dots is printed at a time. The paper must make puter. We now describe some of the peripherals that can be
three or more passes under the print head to print dots in connected to a digital computer. Computers are used mainly
each of the primary (subtractive) colors. The sheet contain- in embedded control systems—the PC is just the tip of a very
ing the wax consists of a series of bands of color. large iceberg. Embedded controllers take information from
Dye sublimation is similar to the thermal wax technique the outside world, process it, and control something. We
but is more expensive and is capable of a higher quality result. begin by looking at some of the sensors that can measure
Electrical elements in the print head are heated to 400C, properties such as temperature and pressure.
which vaporizes the wax. These special waxes undergo
sublimation when heated; that is, they go directly from the
11.5.1 Measuring position and
solid state to the gaseous state without passing through
the liquid state.
movement
By controlling the amount of heating, the quantity of wax The most fundamental measurement is that of position; for
transferred to the paper can be modified making it possible to example, the position of the gas pedals in a car or the position
generate precise colors without having to resort to techniques of the arm in a crane. Position is measured either as rotation
such as dithering. Unlike the thermal wax process, which (i.e. the position of a dial or knob) or as a linear, position (i.e.
deposits drops of wax on the paper, dye sublimation the position along a line).
462 Chapter 11 Computer peripherals
AC source
Primary
+V Three coils
coil
Secondary
coil
Vout
radiation proportional to the fourth power of the body’s tem- exposure to light, the effect was quickly exploited to create
perature (Stefan’s law) and the wavelength of the radiation with photography. In the early 1900s Max Planck suggested that
the greatest amplitude falls as the temperature rises (Wein’s law). light consists of individual packets containing a discrete
Once the temperature rises sufficiently, the radiation falls into amount of energy called photons. When a photon hits an atom,
the visible band and we say that the object has become red hot. an electron may be knocked out of its orbit round the nucleus.
Real materials don’t have ideal black body radiation char- If this atom is metallic, the movement of electrons generates a
acteristics. If the emissivity 9 of a body is less than that of a current that flows in the metal. Some light detectors operate by
black body, it is called a gray body. If the emissivity varies with detecting the flow of electrons when light interacts with the
temperature, it is called a non-gray body. atoms of a material.
The temperature of a body can be measured by examining The photodiode is a semiconductor photosensor compris-
the radiation it radiates. This temperature measurement ing a junction between two differently doped regions of
technique is non-contact because it does not affect or disturb silicon. The photons of light falling on the junction create a
the temperature of the object whose temperature is being current in the device. These devices are sensitive to light in
recorded. Moreover, you can measure the temperature of the region 400 nm to 1000 nm (this includes infra-red as well
moving objects. In recent years this technique has been used as visible light). Another means of measuring light intensity
to measure human body temperatures by measuring the exploits the photovoltaic effect in silicon and selenium.
radiation produced inside the ear. Light intensity can also be measured by the photoresistor.
Certain substances such as cadmium sulfide change their
11.5.3 Measuring light electrical resistance when illuminated.
V(t) The analog signal V(t) The digital signal take one of an infinite number of val-
is continuous in is discrete in ues within a given range. For example,
time and value time and value the temperature of a room changes
1 from one value to another by going
through an infinite number of incre-
ments on its way. Similarly, air pres-
sure, speed, sound intensity, weight,
and time are all analog quantities.
When computers start to control their
Time 0 Time
environment, or generate speech or
Figure 11.42 Analog and digital signals.
music, or process images, we have to
understand the relationship between
the analog and digital worlds.
a charge between the electrodes. By measuring the current We first examine analog signals and demonstrate how they
flow you can determine the oxygen concentration are captured and processed by a digital computer. Then we
(Fig. 11.41(b)) in the liquid surrounding the cell. look at the hardware that converts analog signals into digital
This technique can be extended to detect more exotic form, and digital values into analog signals.
molecules such as glucose. A third membrane can be used to A full appreciation of the relationship between and analog
surround the membrane containing the electrodes. Between and digital signals and the transformation between them
the outer and inner membranes is a gel of glucose oxidase requires a knowledge of electronics; this is particularly true
enzyme that reacts with glucose and oxygen to generate glu- when we examine analog-to-digital and digital-to-analog
conic acid. The amount of glucose in the liquid under test is converters. Readers without an elementary knowledge of
inversely proportional to the amount of oxygen detected electronics may wish to skip these sections.
(Fig. 11.41(c)).
Because these techniques were developed in the 1960s the
number of detectors has vastly increased and the size of the 11.6.1 Analog signals
probes reduced to the point at which they can be inserted
into veins. In Chapter 2 we said that a signal is said to be analog if it falls
between two arbitrary levels, Vx and Vy, and can assume any
one of an infinite number of values between Vx and Vy. If the
11.6 The analog interface analog signal, V(t), is time dependent, it is a continuous func-
tion of time, so that its slope, dV/dt, is never infinite, which
We now look at analog systems and their interface to the would imply an instantaneous change of value. Figure 11.42
digital computer. In an analog world, measurable quantities illustrates how both an analog voltage and a digital voltage
are not restricted to the binary values 0 and 1; they may vary with time.
11.6 The analog interface 467
Analog signals are processed by analog circuits. The princi- infinitesimally short period. Of course, what they really mean
pal feature of an analog circuit is its ability to process an ana- is that they will arrive at approximately 9.0 a.m. In other
log signal faithfully, without distorting it—hence the words, if we measure an analog quantity and specify it to a
expression hi-fidelity. A typical analog signal is produced at precision sufficient for our purposes (i.e. quantization), the
the output terminals of a microphone as someone speaks error between the actual analog value and its measured value
into it. The voltage varies continuously over some finite is unimportant. Once the analog value has been measured, it
range, depending only on the loudness of the speech and on exists in a numeric form that can be processed by a computer.
the physical characteristics of the microphone. An amplifier The conversion of an analog quantity into a digital value
is used to increase the amplitude of this time-varying signal requires two separate operations; the extraction of a sample
to a level suitable for driving a loudspeaker. If the voltage gain value of the signal to be processed and the actual conversion
of the amplifier is A, and the voltage from the microphone of that sample value into a binary form. Figure 11.43 gives the
V(t), the output of the amplifier is equal to A V(t). The out- block diagram of an analog signal acquisition module. As the
put signal from the amplifier, like the input, has an infinite analog-to-digital converter (ADC) at the heart of this mod-
range of values, but within a range A times that of the signal ule may be rather expensive, it is not unusual to provide a
from the microphone. number of different analog channels, all using the same ADC.
Because digital signals in computers fall into two ranges The cost of an ADC also depends on its speed of conversion.
(e.g. 0 to 0.4 V for logical 0 and 2.4 to 5.0 V for logical 1 levels Each analog channel in Fig. 11.43 begins with a transducer
in LS TTL logic systems), small amounts of noise and cross- that converts an analog quantity into an electrical value.
talk have no effect on digital signals as long as the noise is less Transducers are almost invariably separate from the signal
than about 0.4 V. Life is much more difficult for the analog acquisition module proper. Sometimes the transducer is a
systems designer. Even small amounts of noise in the millivolt linear device, so that a change in the physical input produces
or even microvolt region can seriously affect the accuracy of a proportional change in the electrical output. All too often,
analog signals. In particular, the analog designer has to worry the transducer is highly non-linear and the relationship
about power-line noise and digital noise picked up by analog between the physical input and the voltage from the trans-
circuits from adjacent digital circuits. ducer is very complex; for example, the output of a trans-
ducer that measures temperature might be V V0et/kT. In
such cases it is usual to perform the linearization of the input
11.6.2 Signal acquisition in the digital computer after the signal has been digitized. It is
At first sight it might appear that the analog and digital possible to perform the linearization within the signal acqui-
worlds are mutually incompatible. Fortunately a gateway sition module by means of purely analog techniques.
exists between the analog and digital worlds called quantiza- The electrical signal from the transducer is frequently very
tion. The fact that an analog quantity can have an infinite tiny (sometimes only a few microvolts) and must be amplified
range of values is irrelevant. If somebody says they will arrive before further processing in order to bring it to a level well
at 9.0 a.m., they are not being literal—9.0 a.m. exists for an above the noise voltages present in later circuits. Amplification
Digital
output
Physical Sample d0
Transducer Amplifier Filter Channel 1
variable and hold d1
circuit d2
Physical Transducer Amplifier Filter Channel 2 Analog-
variable to-digital
n-channel converter
analog
multiplexer dm –1
Physical
Transducer Amplifier Filter Channel n
variable START STOP
Analog signal processing
(n channels) System
SAMPLE
control
logic Control
input
Channel select
is performed by an analog circuit called an op-amp (opera- from typically 4 to 16 or more. Several types of analog-
tional amplifier). Some transducers have an internal amplifier. to-digital converter are discussed at the end of this section.
After amplification comes filtering, a process designed to We now look at the relationship between the analog signal
restrict the passage of certain signals through the circuit. and the analog-to-digital conversion process.
Filtering blocks signals with a frequency above or below a
cut-off point; for example, if the signal from the transducer Signal quantization
contains useful frequency components only in the range 0 to Two fundamental questions have to be asked when consider-
20 Hz (as one might expect from, say, an electrocardiogram), ing any analog-to-digital converter. Into how many levels or
it is beneficial to filter out all signals of a higher frequency. values should the input signal be divided and how often
These out of band signals represent unwanted noise and have should the conversion process be carried out? The precise
no useful effect on the interpretation of the electrocardio- answer to both these questions requires much mathematics.
gram. Moreover, it is necessary for the filter to cut out all fre- Fortunately, they both have simple conceptual answers and in
quencies above one-half the rate at which the analog signal is many real situations a rule-of-thumb can easily be applied.
sampled. The reasons for this are explained later. We look at how analog signals are quantized in value and then
The outputs of the filters are fed to an electronic switch how they are quantized or sampled in time.
called a multiplexer, which selects one of the analog input When asked how much sugar you want in a cup of coffee,
channels for processing. The multiplexer is controlled by the you might reply: none, half a spoon, one spoon, one-and-
digital system to which the signal acquisition module is con- a-half spoons, etc. Although a measure of sugar can be
nected. The only purpose of the multiplexer is to allow one quantized right down to the size of a single grain, the practi-
analog-to-digital converter to be connected to several inputs. cal unit chosen by those who add sugar to coffee is the
The analog output of the multiplexer is applied to the input half-spoon. This unit is both easy to measure out and offers
of the last analog circuit in the acquisition module, the sample reasonable discrimination between the quanta (i.e. half-
and hold (S/H) circuit. The sample and hold circuit takes an spoons). Most drinkers could not discriminate between, say,
almost instantaneous sample of the incoming analog signal and 13/27 and 14/27 of a spoon of sugar. As it is with sugar, so it is
holds it constant while the analog-to-digital converter (ADC) is with signals. The level of quantization is chosen to be the
busy determining the digital value of the signal. minimum interval between successive values that carries
The analog-to-digital converter (ADC) transforms the meaningful information. You may ask, ‘Why doesn’t everyone
voltage at its input into an m-bit digital value, where m varies use an ADC with the greatest possible resolution?’ The answer
is perfectly simple. The cost of an ADC
Binary code rises steeply with resolution. A 16-bit
111
ADC is very much more expensive
than an 8-bit ADC (assuming all other
parameters to be equal). Therefore,
110 engineers select the ADC with a resolu-
tion compatible with the requirements
101 of the job for which it is intended.
The maximum Let’s look at an ideal 3-bit analog-to-
change of input digital converter that converts a voltage
100 required to produce
a 1 bit change in the into a binary code. As the analog input
output is 1.0 V to this ADC varies in the range 0 V to
011 7.5 V, its digital output varies from 000
Analog Analog- d0 3-bit
to-digital d1 to 111. Figure 11.44 provides a transfer
input
010 converter d2 output function for this ADC.
Consider the application of a linear
001
voltage ramp input from 0.0 V to 7.5 V
to this ADC (a ramp is a signal that
Analog increases at a constant rate). Initially
000 input (V) the analog input is 0.0 V and the digi-
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5
tal output 000. As the input voltage
rises, the output remains at 000 until
Initially, the input is 0 V When the input reaches 0.5 V,
and the output code 000 the output code jumps to 001 the input passes 0.5 V, at which point
the output code jumps from 000 to
Figure 11.44 The transfer function of an ideal 3-bit A/D converter. 001. The output code remains at 001
11.6 The analog interface 469
until the input rises above 1.5 V. Clearly, for each 1.0 V change expresses the power of the signal. The RMS value of a signal is
in the input, the output code changes by one unit. obtained by squaring it, taking the average, and then taking
Figure 11.44 shows that the input can change in value by up the square root of the average. The RMS of the quantization
to 1 V without any change taking place in the output code. noise of an analog-to-digital converter is equal to Q/兹12.
The resolution of an ADC, Q, is the largest change in its Increasing the resolution of the converter reduces the ampli-
input required to guarantee a change in the output code and is tude of the quantization noise as Table 11.3 demonstrates.
1.0 V in this example. The resolution of an ADC is expressed A figure-of-merit of any analog system is its signal-to-noise
indirectly by the number of bits in its output code, where ratio, which measures the ratio of the wanted signal to the
resolution Vmaximum/2n 1. For example, an 8-bit ADC unwanted signal (i.e. noise). The signal-to-noise ratio (SNR)
with an input in the range 0 V to 8.0 V has a resolution of of a system is expressed in units called decibels, named after
8.0 V/255 0.03137 V 31.37 mV. Table 11.3 gives the basic Graham Bell, a pioneer of the telephone. The SNR ratio of
characteristics of ADCs with digital outputs ranging from 4 to two signals is defined as 20log(Vsignal/Vnoise). The signal-
16 bits. The figures in Table 11.3 represent the optimum values to-noise ratio of an ideal n-bit ADC is given by
for perfect ADCs. In practice, real ADCs suffer from imperfec-
tions such as non-linearity, drift, offset error, and missing SNR (in dB) = 20log(2nQ )/Q/ 兹12
codes, which are described later. Some ADCs are unipolar and 20log(2n) 10log(12)
handle a voltage in the range 0 to V and some are bipolar and 6.02n + 10.8
handle a voltage in the range V/2 to V/2.
This expression demonstrates that the signal-to-noise
The column labeled value of Q for 10 V FS in Table 11.3
ratio of the ADC increases by 6.02 dB for each additional bit
indicates the size of the step (i.e. Q) if the maximum input of
of precision. Table 11.3 gives the signal-to-noise ratio of
the ADC is 10 V. The abbreviation ‘FS’ means full-scale.
ADCs from 4 to 16 bits. An 8-bit ADC has a signal-to-noise
Figure 11.45 provides a graph of the difference or error
ratio similar to that of some low-quantity audio equipment,
between the analog input of a 3-bit ADC and its digital out-
whereas a 10-bit ADC approaches the S/N ratio of high-
put. Suppose that the analog input is 5.63 V. The correspond-
fidelity equipment.
ing digital output is 110, which represents 6.0 V; that is, the
Another figure-of-merit of an analog system is its dynamic
digital output corresponds to the quantized input, rather
range. The dynamic range of an ADC is given by the ratio of
than the actual input. The difference between the actual input
its full-scale range (FSR) to its resolution, Q, and is expressed
and the idealized input corresponds to an error of 0.37 V.
in decibels as 20log(2n) 20nlog2 6.02n. Table 11.3 also
Figure 11.45 shows that the maximum error between the
gives the dynamic range of the various ADCs. Once again you
input and output is equal to Q/2. This error is called the
can see that a 10- to 12-bit ADC is suitable for moderately
quantization error.
high-quality audio signal processing. Because of other
The output from a real ADC can be represented by the out-
impairments in the system and the actual behavior of a real
put from a perfect ADC whose input is equal to the applied
ADC, high-quality audio signal processing is normally done
signal plus a noise component. The difference between the
with a 16-bit ADC.
input and the quantized output (expressed as an analog
value) is a time-varying signal between Q/2 and Q/2 and
is called the quantization noise of the ADC. Sampling a time-varying signal
Because the quantization noise is a random value, engineers What is the minimum rate at which a signal should be
characterize it by its RMS (root mean square)—the RMS value sampled to produce an accurate digital representation of it?
Resolution (bits) Discrete states Binary weight Value of Q for 10 V FS SNR (dB) Dynamic range (dB)
Binary code
111
110
101
100
011
010
001
000 Analog
input (V )
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5
Quantization error
+Q/2
0 Analog
input (V )
–Q/2
We need to know the minimum rate at which a signal must be continuous signal containing no frequency components
sampled, because we want to use the slowest and cheapest higher than fc is sampled at a rate of at least 2fc, then the orig-
ADC that does the job we require. inal signal can be completely recovered from the sampled
Intuitively, we would expect the rate at which a signal must value without distortion’. This minimum sampling rate is
be sampled to be related to the rate at which it is changing; for called the Nyquist rate.
example, a computer controlling the temperature of a swim- The highest frequency component in the signal means just
ming pool might need to sample the temperature of the water that and includes any noise or unwanted signals present
once every 10 minutes. The thermal inertia of such a large together with the desired signal. For example, if a signal con-
body of water doesn’t permit sudden changes in temperature. tains speech in the range 300 to 3000 Hz and noise in the
Similarly, if a microcomputer is employed to analyze human range 300 to 5000 Hz, it must be sampled at least 10 000 times
speech with an upper frequency limit of 3000 Hz, it is reason- a second. One of the purposes of filtering a signal before
able to expect that the input from a microphone must be sampling it is to remove components whose frequencies are
sampled at a much greater rate than 3000 times a second, higher than the signals of interest, but whose presence would
simply because in the space of 1/3000 second the signal can nevertheless determine the lower limit of the sampling rate.
execute a complete sine wave. If a signal whose maximum frequency component is fc,
A simple relationship exists between the rate at which a sig- is sampled at less than 2fc times a second, some of the high-
nal changes and the rate at which it must be sampled if it is frequency components in it are folded back into the spectrum of
to be reconstituted from the samples without any loss of the wanted signal. In other words, sampling a speech signal in
information content. The Sampling Theorem states ‘If a the range 300 to 3000 Hz containing noise components up to
11.6 The analog interface 471
Amplitude Amplitude
(V) (V)
Frequency (f ) Frequency (f )
fc fc fs – fc fs fs+fc
(a) Spectrum of input signal. (b) Spectrum of sampled signal.
Amplitude Amplitude
(V ) (V )
Frequency (f ) Frequency (f )
fc fs – fc fc fs fs+fc
(a) Spectrum of input signal. (b) Spectrum of sampled signal
(shaded region indicates overlap in spectra).
Figure 11.47 Sampling a signal at slightly less than the Nyquist rate.
5000 Hz at only 6000 times a second would result in some of perceives this as a clockwise rotation. Now suppose the wagon
this noise appearing within the speech band. This effect is called is moving rapidly and the wheel rotates 350 between each
frequency folding and, once it has occurred, there is no way in frame. The eye perceives this as a 10 counterclockwise rotation.
which the original, wanted, signal can be recovered. It is difficult to appreciate the full implications of the sam-
Figures 11.46 and 11.47 illustrate the effect of sampling an pling theorem without an understanding of the mathematics
analog signal at both below and above the Nyquist rate. In Fig. of sampling and modulation. However, all we need say here is
11.46 the input signal consists of a band of frequencies from that the overlap in spectra caused by sampling at too low a
zero to fc, sampled at a rate equal to fs times a second, where fs frequency results in unwanted noise in the sampled signal.
is greater than 2fc. The spectrum of the sampled signal contains Another way of looking at the relationship between a sig-
components in the frequency range fsfc to fs fc that do not nal and its sampling rate is illustrated by Figs 11.48 and 11.49.
fall within the range of the input signal. Consequently, you can Figure 11.48(a) gives the continuous input waveform of an
recover the original signal from the sampled signal. analog signal and Fig. 11.48(b) its sampled form. These sam-
In Fig. 11.47 the input signal has a maximum frequency pled amplitudes are, of course, stored in a digital computer
component of fc and is sampled at fsi, where fs 2fc. Some numerically. Figure 11.48(c) shows the output of a circuit,
energy in the region fs fc to fc falls in the range of the input called a filter, fed from the digital inputs of Fig. 11. 48(b). The
frequency and is represented by the gray region in simplest way of describing this circuit is to say that it joins up
Fig. 11.47. This situation results in frequency folding and a loss the dots of the sampled signal to produce a smooth output. As
of information; that is, you cannot recover the original you can see, the reconstituted analog signal is virtually a copy
information from the sampled signal. of the original analog signal.
The classic example of sampling at too low a rate is the Figure 11.49 is similar to Fig. 11.48, except that the input sig-
wagon wheel effect seen in movies. A cine film runs at 24 nal is sampled at less than 2fc. A glance at the sampled values of
frames/s and each frame samples the image. If the spokes of Fig. 11.49(b) is enough to show that much of the detail in the
a rotating wheel are sampled (i.e. photographed) at too low input waveform has been lost. When this sampled signal is
a rate, the wheel appears to move backward. Why? Suppose a reconstituted into a continuous signal (Fig. 11.49(c)) its fre-
wheel rotates 10 clockwise between each frame. The eye quency is not the same as the input signal. The erroneous signal
472 Chapter 11 Computer peripherals
V V
Input
t voltage
δV
(a) Input signal.
During the time
V that the input
is sampled, it
changes by δV
t
t
ta Aperture time defines the
period required to measure
the signal
(b) Sampled signal.
V Figure 11.50 The effect of a finite measurement time on the
A/D conversion process.
That is, the input changes by 0.05 V during the period that relatively small for an analog-to-digital converter, a sample
the A/D conversion is taking place. Consequently, there’s lit- and hold circuit can achieve an aperture time of 50 ns with
tle point in using an ADC with a resolution of better than little effort. We look at the sample and hold circuit in more
0.05 V. This resolution corresponds to 1 in 100, and a 7-bit detail later.
ADC would be suitable for this application.
In order to get a feeling for the importance of aperture
time, let’s consider a data acquisition system in processing
11.6.3 Digital-to-analog conversion
human speech. Suppose a system with an 8-bit analog-to- Beginning with digital-to-analog converters (DACs) may
digital converter is required to digitize an input with an upper seem strange. It’s more logical to discuss analog-to-digital
frequency limit of 4000 Hz. We need to know the maximum (ADC) conversion first and then deal with the inverse
aperture time necessary to yield an accuracy of one least sig- process. There are two reasons for disregarding this natural
nificant bit in the digitized output. Assuming a sinusoidal sequence. The first is that the DAC is less complex than the
input, V(t) Vsin t, the amplitude uncertainty is given by corresponding ADC, and the second is that some analog-to-
V ta · d(V sin t)/dt ta ··V· cos t digital converters, paradoxically, have a digital-to-analog
converter at their heart.
The differential of sin t is cos t, where is defined as Conceptually, the DAC is a simple device. To convert a
2f. The maximum rate-of-change of V(t) occurs at the zero- binary value into analog form, all we have to do is to generate
crossing of the waveform when t 0 (i.e. the maximum value an analog value proportional to each bit of the digital word
of cos t is 1). Therefore, the worst case value of ␦V is and then add these values to give a composite analog sum.
V ta ·V· Figure 11.51 illustrates this process. An m-bit digital signal is
and latched by m D flip-flops and held constant until the next
value is ready for conversion. The flip-flops constitute a
V/V ta · ta ·2·f
digital sample and hold circuit. Each of the m bits operates an
We can substitute 1/256 for ␦V/V and 4000 Hz for f in the electronic switch that passes either zero or Vi volts to an
above equation to calculate the desired aperture time as analog adder, where Vi is the output of the ith switch. The
follows: output of this adder is
V/V 1/256 ta2f ta 2 3.142 4000 V d0V0 d1V1 . . . dm1Vm1
ta 1/(256 2 3.142 4000) s 0.146 s
The m{di} in this equation represent binary values 0 or 1
An aperture time of 0.146 s (i.e. 146 ns) is very small, and the {Vi} represent binary powers of the form (1, 1/2, 1/4,
although not too small to be achieved by the some ADCs. 1/8, . . .).
Fortunately, we can use a sample and hold circuit to capture a Figure 11.52 gives a possible (but not practical) implemen-
sample of the input and hold it constant while a relatively tation of a digital-to-analog converter. The total current
slow and cheap ADC performs the conversion. Of course, flowing into the inverting terminal of the operational amplifier
even a sample and hold circuit is itself subject to the effects of is equal to the linear sum of the currents flowing through the
aperture uncertainty. Although an aperture time of 1 s is individual resistors (the panel describes how the operational
m-bit latch
d0
D Q Switch
C
V0
d1
D Q Switch
Parallel C
digital
input V1
Analog Analog
adder output
dm–1
D Q Switch
Latch C
Vm–1 Figure 11.51 The digital-
to-analog converter.
474 Chapter 11 Computer peripherals
Rf
m digital R 2R 4R 2m–1R
switches
–
Vo
Analog
+ output
Vref 1 1 1 1 Operational
amplifier
0 0 0 0
amplifier works). As each of the resistors in Fig. 11.52 can be where di represents the state of the ith switch. The voltage at
connected to ground or to a precisely maintained reference the output terminal of the operational amplifier is given by
voltage, Vref, the current flowing through each resistor is either
Vo 2Vref Rf /R [dm1 21 dm2 22
zero or Vref/2iR, where i 0, 1, 2, . . . , m1. The total current
flowing into the operational amplifier is given by … d0 2m]
Real digital-to-analog converters implement the m
Vref m1 dmi1
R 兺
i0 2i
switches, typically, by field-effect transistors (a field-effect
transistor behaves as a fast electronic switch—the voltage at
11.6 The analog interface 475
R R R 2R Rf
R 2R 2R 2R
–
Vo
Analog
+ output
1 1 1 Operational
Vref
amplifier
0 0 0
Isb d0 d1 d2 dm –1
its gate determines whether the path between the other two flowing into the inverting terminal of the operational ampli-
terminals is open or closed). By switching the control gate of fier is equal to the sum of all the currents from the shunt (i.e.
these transistors between two logic levels, the resistance 2R) resistors, weighted by the appropriate binary value.
between their source and drain terminals is likewise switched A digital-to-analog converter based on the R–2R ladder
between a very high value (the off or open state) and a very has three advantages over the type described in Fig. 11.54.
low value (the on or closed state). A perfect field-effect tran-
1. All resistors have a value of either R or 2R, making it easy
sistor switch has off and on values of infinity and zero, respec-
to match resistors and to provide a good measure of
tively. Practical transistor switches have small but finite
temperature tracking between resistors. Furthermore, the
on-resistances that degrade the accuracy of the DAC.
residual on-resistance of the transistor switches can
Although the circuit of Fig. 11.52 is perfectly reasonable
readily be compensated for.
for values of m below six, larger values create manufacturing
difficulties associated with the resistor chain. Suppose a 2. By selecting relatively low values for R in the range
10-bit DAC is required. The ratio between the largest and 2.5 k to 10 k, it is both easy to manufacture the DAC
smallest resistor is 210:1 or 1024:1. If the device is to be and to achieve a good response time because of the low
accurate to one LSB, the precision of the largest resistor must impedance of the network.
be at least one-half part in 1024, or approximately 0.05%. 3. Due to the nature of the R–2R ladder, the operational
Manufacturing resistors to this absolute level of precision is amplifier always sees a constant impedance at its input,
difficult and costly with thin-film technology, and virtually regardless of the state of the switches in the ladder, which
impossible with integrated circuit technology. improves the accuracy of the operational amplifier circuit.
The R–2R ladder The R–2R ladder forms the basis of many commercially
available DACs. Real circuits are arranged slightly differently
An alternative form of digital-to-analog converter is given in
to that of Fig. 11.53 to reduce still further the practical prob-
Fig. 11.53, where the DAC relies on the R–2R ladder
lems associated with a DAC.
(pronounced R two R). This DAC is so called because all
resistors in the ladder have either the value R or 2R. Although
it’s difficult to produce highly accurate resistors over a wide DACs based on the potentiometric network
range of values, it is much easier to produce pairs of resistors Another form of digital-to-analog converter is called the
with a precise 2:1 ratio in resistance. potentiometric or tree network. Figure 11.54 describes a 3-bit
As the current from the reference source, Vref, flows down arrangement of such a network where a chain of n resistors is
the ladder (from left to right in Fig. 11.53), it is divided at each placed in series between the reference supply and ground.
junction (i.e. the node between the left R, right R, and 2R resis- The value of n is given by 2m, where m is the resolution of the
tors) into two equal parts, one flowing along the ladder to the DAC. In the example of Fig. 11.54, m 3 and n 8. An 8-bit
right and one flowing down the 2R shunt resistor. The net- DAC requires 256 resistors in series. The voltage between
work forms a linear circuit and we can apply the Superposition ground and the lower end of the ith resistor is given by
Theorem. This theorem states that, in a linear system, the effect
V VrefiR/nR Vrefi/n for i 0 to n 1.
is the sum of all the causes. Consequently, the total current
476 Chapter 11 Computer peripherals
Vref
R
S07
S13
R
S06
S21
R
S05
S12
R
S04 –
Vo
Analog
R + output
S03 Operational
amplifier
S11
R
S02
S20
R
S01
S10
R
S00
gnd
The value of the resistors, R, does not appear in this equa- Vout Actual output
tion. All that matters is that the resistors are of equal value.
Because the flow of current through the resistors is constant, Ideal output
the effects of resistor heating found in some forms of R–2R For any given code
Offset
ladder are eliminated. the output voltage has
a constant offset error.
The switch tree serves only to connect the input terminal
of the operational amplifier to the appropriate tap (i.e. node) X
in the resistor network. In fact, this switching network is (input code)
nothing but an n:1 demultiplexer. Moreover, because the
Figure 11.55 The constant offset error.
switches do not switch a current (as in the case of the R–2R
network), the values of their on and off resistances are rather
less critical. caused by errors that originate in the DAC’s analog circuits.
A DAC based on a switch tree is also inherently monotonic. Figures 11.55 to 11.59 give five examples of errors in DACs.
That is, as the digital input increases from 00 . . . 0 to 11 . . . 1, We have drawn the outputs of Figs 11.55 to 11.59 as straight
the analog output always increases for each increment in the lines for convenience—in practice they are composed of steps
input. because the input is a binary code.
Before we look at analog-to-digital converters, we need to In Fig. 11.55, the DAC’s output voltage differs from its ideal
say something about errors in digital-to-analog converters. value by a constant offset. If the input is a binary value X, the
output is equivalent to that of a perfect DAC plus a constant
Errors in DACs error signal e; that is Vout KX e. A constant error is easy to
Real DACs differ from the ideal DACs described above. deal with because it can be trimmed out by adding a compensat-
Differences between input code and output voltages are ing voltage of equal magnitude but of opposite sign to the error.
11.6 The analog interface 477
V X
(output voltage) Actual output (output code)
111 Actual output
Ideal output 110
101 As the input voltage increases,
For any given input code the output steps through the
the error in the output 100 codes 000 to 111. However, in
Output error voltage is proportional Missing code this case, the output step from
to the input code. 011 010 to 100 misses the code 011.
X 010 There is no input voltage that
generates the code 011.
(input code) 001
000 Vin
Figure 11.56 The gain error.
Figure 11.60 The missing code.
Vout
Ideal output corrected by passing the DAC’s output through an amplifier
with a gain factor of l/k.
Actual output
Real DACs suffer from both offset and gain errors as illus-
For any given input code
the error in the output trated in Fig. 11.57. The combined offset and gain errors can
voltage is proportional
to the input code plus both be removed separately by injecting a negative offset and
a constant offset. passing the output of the DAC through a compensating
X
(input code) amplifier as we’ve just described.
A more serious error is the non-linear response illustrated
Figure 11.57 The combined effect of offset and gain errors. in Fig. 11.58 where the change in the output, Q, for each step
in the input code is not constant. The error between the input
Vout code and the output voltage is a random value. Non-linear
Actual output errors cannot easily be corrected by simple circuitry. Many
Ideal output DACs are guaranteed to have a maximum non-linearity less
than one-half Q, the quantization error; i.e. the DAC’s output
The error between the error is always less than Q/2 for any input.
input code and the
output code is non-linear Figure 11.59 illustrates a non-monotonic response, a form
of nonlinearity in which the output voltage does not always
X increase with increasing input code. In this example, the ana-
(input code) log output for the code 011 is less than that for the code 010.
Non-monotonic errors can be dangerous in systems using
Figure 11.58 The non-linear error.
feedback. For example, if an increasing input produces a
decreasing output, the computer controlling the DAC may
Vout move the input in the wrong direction.
Actual output
111 Analog-to-digital converters suffer from similar errors to
Ideal output
110 DACs—only the axes of the graphs in Figs 11.55 to 11.59 are
changed. An interesting form of an ADC error is called the
The error between the
101 input code and the missing code where the ADC steps from code X to code X 2
001 010 100
output code is non-linear without going through code X 1. Code X 1 is said to be a
and non-monotonic.
011 missing code, because there is no input voltage that will gener-
X ate this code. Figure 11.60 demonstrates the transfer function
000 (input code)
Input code 011 of an ADC with a missing code. As the input voltage to the
produces a lower
output than code 010 ADC is linearly increased, the output steps through its codes
one by one in sequence. In Fig. 11.60 the output jumps from
Figure 11.59 Non-monotonicity. 010 to 100 without passing through 011.
Analog
input Operational
R amplifier
Vin –
Vo
Analog
gnd + output
Diode bridge The output follows the input
when Vhold is 0 or holds (freezes)
Vhold the input when Vhold = 1.
Figure 11.61 The sample
and hold circuit.
Analog
input
Vin
Vref
to a priority encoder that generates a 3-bit output correspond- The control logic uses
ing to the number of logical 1s in the input. the error signal to generate
Analog input an m-bit digital value
The parallel A/D converter is very fast and can digitize ana- Vin – Ve
log signals at over 30 million samples per second. High con- Control logic
+
version rates are required in real-time signal processing in Error signal dm–1 m-bit
applications such as radar data processing and image process- d1 digital
d output
ing. As an illustration of the speeds involved consider digitiz- 0
Vout Local digital-
ing a television picture. The total number of samples required to-analog
to digitize a TV signal with 500 pixels/line in real-time is convertor
samples pixels per line lines per field fields per second Figure 11.66 The feedback ADC.
500 3121/2 50 7812500 samples per
second (UK)
class of converter. A local digital-to-analog converter trans-
500 2651/2 60 7875500 samples per
forms an m-bit digital value, D d0, d1, . . . , dm1, into an
second (USA)
analog voltage, Vout. The value of the m-bit digital word D is
Because the flash converter requires so many comparators, determined by the block labeled control logic in one of the
it is difficult to produce with greater than about 8-bit preci- ways to be described later.
sion. Even 6-bit flash ADCs are relatively expensive. Vout from the DAC is applied to the inverting input of an
operational amplifier and the analog input to be converted is
The feedback analog-to-digital converter applied to its non-inverting input. The output of the opera-
The feedback analog-to-digital converter, paradoxically, uses tional amplifier corresponds to an error signal, Ve, and is
a digital-to-analog converter to perform the required conver- equal to A times (Vout Vin), where A is the gain of the
sion. Figure 11.66 illustrates the basic principle behind this amplifier. This error signal is used by the control logic
11.6 The analog interface 481
Code
1111
1111
1110
1110
1101
1101
1100
1100
1011
1011
1010
1010
1001
1001
1000
1000
0111
0111
0110
0110
0101
0101
0100
0100
0011
0011
0010
0010
0001 0001
0000
been set and then retained or cleared. After the LSB has been Figure 11.71 takes the form of a decision tree that shows every
dealt with in this way, the process is at an end and the final possible sequence of events that can take place when an ana-
digital output may be read by the host microprocessor. log signal is converted into a 4-bit digital value. The path
Figure 11.70 illustrates the operation of a 4-bit successive taken through the decision tree when 0.6400 V is converted
approximation A/D converter whose full-scale input is nom- into digital form is shown by a heavy line.
inally 1.000 V. The analog input to be converted into digital Figure 11.72 illustrates the structure of a 68K-controlled
form is 0.6400 V. As you can see, a conversion is complete successive approximation A/D converter. The microprocessor
after four cycles. is connected to a memory mapped D/A converter that
Figure 11.71 provides another way of looking at the suc- responds only to a write access to the lower byte of the base
cessive approximation process described in Fig. 11.70. address chosen by the address decoder. The analog output
484 Chapter 11 Computer peripherals
of the converter is compared with the unknown analog input Figure 11.74 shows how the output from the integrator,
in a comparator, whose output is gated onto data line D15, Vout, ramps upward linearly during phase 2 of the conversion
whenever a read access is made to the upper byte of the base process. At the start of phase 2, a counter is triggered that
address. The software to operate the A/D converter of counts upwards from 0 to its maximum value 2n 1. After a
Fig. 11.72 is fixed period T1 2n/fc where fc is the frequency of the
The integrating analog-to-digital converter converter’s clock, the counter overflows (i.e. passes its maxi-
mum count). The electronic switch connected to the integra-
The integrating, or more specifically, the dual-slope integrat- tor then connects the integrator’s input to Vref, the negative
ing analog-to-digital converter, transforms the problem of reference supply. The output of the integrator now ramps
measuring an analog voltage into the more tractable problem downwards to 0, while the counter runs up from 0.
of measuring another analog quantity—time. An integrating Eventually, the output of the integrator reaches zero and the
operational amplifier circuit converts the analog input into a conversion process stops—we’ll assume that the counter con-
charge stored on a capacitor, and then evaluates the charge by tains M at the end of this phase.
measuring the time it takes to discharge the capacitor. The Readers without a knowledge of basic electronics may skip
block diagram of a dual-slope integrating A/D converter is the following analysis of the dual slope integrating ADC.
given in Fig. 11.73 and its timing diagram in Fig. 11.74. At the end of phase 2 the capacitor is charged up to a level
A typical integrating converter operates in three phases:
auto-zero, integrate the unknown analog signal, and integrate
the reference voltage. The first phase, auto-zero, is a feature of
1
CR
冕
Vin dt
many commercial dual-slope converters, which reduces any
The voltage rise during the second phase is equal to the fall
offset error in the system. As it isn’t a basic feature of the dual-
in the third phase because the output of the integrator begins
slope process, we won’t deal with it here. During the second at zero volts and ends up at zero volts. Therefore, the follow-
phase of the conversion, the unknown analog input linearly ing equation holds:
charges the integrating capacitor C. In this phase, the input of
the electronic switch connects the integrator to the voltage to
be converted, Vin.
1
CR
冕 V dt CR1 冕 V
t1
t2
in
t3
t2
ref dt
11.6 The analog interface 485
Vin
Analog input Assuming that t1 0, t2 2n/fc,
t3 t2 M/fc, we can write
– 0 if Vin >Vout
冤 冥 冤 冥
Vout + 2n/fc 2n/fc
d00–d07 1 1
Data V t V t M/fc
Memory-mapped CR in 0 CR ref 2n/fc
68K
CPU digital-to-analog
converter or
CS
Vin2n VrefM
R/W
fc fc
LDS Write VrefM
UDS Vin
2n
AS This remarkable result is depen-
Read
dent only on the reference voltage and
A01–A23 Address
decoder two integers, 2n and M. The values of
C and R and the clock frequency, fc, do
d15 DTACK not appear in the equation. Implicit in
Sign (Vin – Vout) the equation is the condition that fc is
constant throughout the conversion
Memory map
process. Fortunately, this is a reason-
16 bits able assumption even for the simplest
of clock generators.
don’t care The dual-slope integrating A/D con-
d15 d14 . . . . .d08 d07 . . . . . . d00 verter is popular because of its very low
cost and inherent simplicity. Moreover, it
is exceedingly accurate and can provide
12 or more bits of precision at a cost
MSB LSB
read-only write-only below that of 8-bit ADCs. Because this
converter requires no absolute reference
Figure 11.72 The circuit of a successive approximation A/D converter. other than Vref, it is easy to fabricate the
entire device in a single integrated circuit.
Integrator The conversion time is variable and
Analog C takes 2n M clock periods in total.
input Electronic switch
A 12-bit converter with a 1 s clock has
Vin
R a maximum conversion time of
Comparator
– 2 2n/fc seconds, because the maxi-
Vint
– mum value of N is 2n. Using these fig-
+ Vcomp ures, the maximum conversion time is
Vref + equal to 2 4096 1 s, or 8.192 ms,
Switch which is very much slower than most
control
forms of feedback A/D converter.
gnd Because the analog input is inte-
grated over a period of 2n/fc seconds,
START noise on the input is attenuated.
Control logic Sinusoidal input signals, whose peri-
EOC Clock
(end of conversion) ods are submultiples of the integra-
tion period, do not affect the output
of the integrator and hence the mea-
Counter sured value of the input. Many high-
precision converters exploit this
property to remove any noise at the
Digital output power line frequency. Integrating
converters are largely used in instru-
Figure 11.73 The integrating A/D converter. mentation such as digital voltmeters.
486 Chapter 11 Computer peripherals
the automatic pilot of an aircraft. At any instant the location Figure 11.77 demonstrates the operation of such a system.
and altitude of an aircraft is measured, together with its per- The temperature of the room rises and eventually the heater
formance (heading, speed, rate of climb, rate of turn, and is turned off. Because of the heater’s thermal inertia the
engine power). All these values are converted into digital room, the temperature will continue to rise after the current
form and fed into a computer that determines the best posi- has been cut off. Eventually, the room begins to cool and the
tion for the throttle, elevator, aileron, and rudder controls. heater is turned on and the temperature starts rising again.
The digital output from the computer is applied to digital-to- This type of on–off control system is also called a bang–bang
analog converters, whose analog outputs operate actuators control system to indicate its crude approach—bang the sys-
that directly move the appropriate control surfaces. tem goes on and bang it goes off. There is no intermediate
Figure 11.76 describes a primitive control system. The point between on and off, and the room is never at the correct
input is an analog value that is digitized and processed by the temperature because it’s either slightly too hot or too cold.
computer. Real control systems are often much more sophis- A better method of controlling the temperature of a room
ticated than that of Fig. 11.76—consider the problem of over- is to measure the difference between the desired temperature
shoot. Suppose you apply a new demand input to a system and the actual temperature and use this value to determine
such as banking an aircraft’s wings. The aircraft rolls into the how much power is to be fed to the heater. The colder the
bank and attempts to attain the angle requested. However, the room, the more power sent to the heater. If the room is close
mechanical inertia of the aircraft might cause it to roll past to its desired temperature, less power if fed to the heater. This
(i.e. overshoot) the point it was aiming for. A practical con- is an example of a proportional control system. As the room
trol system should also take account of rapidly changing con- temperature approaches its desired setpoint value, the power
ditions. fed to the heater is progressively reduced; that is, the current
Let’s look at how control systems have evolved from the sim- supplied to the heater is K(tsetpoint troom).
plest possible mechanisms. The crudest control mechanism is The proportional control system can be improved further
found in central heating systems where the desired tempera- by taking into account changes in the variable you are trying to
ture or setpoint is obtained from a control unit on the wall. The control. Suppose you’re designing a camera with an automatic
demand input is compared with the actual temperature focusing mechanism for use at sporting events. The camera
measured by a sensor. If it is colder than the setpoint, the heater measures the distance of the subject from the camera using the
is turned on. Otherwise the heater is turned off. difference between the current point of focus and the desired
point of focus to drive the motor that performs the focusing.
Suppose the subject suddenly changes direction, speeds up,
Error or slows down. A proportional control system can’t deal with
Compound signal
x e K System Output this situation well. If the subject is in focus and then begins
input y
accelerating away, a proportional control signal can’t apply a
Feedback path large correction until the target is out of focus. What we need
is a control signal that doesn’t depend on the magnitude of the
Figure 11.76 The control system. error but on the rate at which the error is changing.
A differential control system uses the rate of change of the
error as a control signal; for example, a camera with auto-
Sensor focusing can use any rapid change in the subject’s position to
Comparator Heater
control the focusing motor—even if the subject is approxi-
mately in focus and there’s no proportional error. A differen-
Control
tial control system must also incorporate proportional
control because if the subject were out of focus but not mov-
Temperature ing there would be no differential feedback signal.
Set point If we call the error between the setpoint in a control system
and its output e, the control input in a proportional plus deriv-
ative (i.e. differential) control system is given by
time
Heater y K1e K2de/dt,
Feedback FOR i 1 TO k DO
Output Mi
Command Output ENDFOR
input Amplifier System
The digitally stored music is reconverted
Feedback into analog form by sending it to the output
Integrator port connected to a DAC. This algorithm
Differentiator does nothing other than retrieve the stored
Feedback music. In the next example, the samples
from the array are amplified by a scalar fac-
Figure 11.78 The derivative and integral control system. tor A. By changing the value of A, the amplitude (i.e. the
loudness) of the music can be altered. Now we have a digital
control and derivative control to minimize the difference volume control with no moving parts that can be pro-
between their trajectories. However, once their trajectories grammed to change the sound level at any desired rate.
are closely (but not exactly) matched, there is neither a pro-
FOR i 1 TO k DO
portional error signal nor a derivative error signal to force
Output A * Mi
exact tracking. What we need is a mechanism that takes ENDFOR
account of a persistent small error.
An integral control system adds up the error signal over a We can average consecutive samples to calculate the loud-
period of time. The integral correction term is K3 ∫edt Even ness of the signal and use it to choose a value for A. The fol-
the smallest error eventually generates a control signal to fur- lowing expression shows how we might average the loudness
ther reduce the error. Integral control ensures that any drift over a period of k samples.
over time is corrected.
冪兺m
k1
1
A high-performance controller might combine propor- Loudness 2
i
k i0
tional control, rate-of-change control, and integral control as
Fig. 11.78 demonstrates. This system is called a PID (propor- Suppose we choose the scale factor A to make the average
tional, integral, and derivative) controller. In Fig. 11.78 the power of the signal approximately constant. When the music
box marked differentiator calculates the rate of change of the is soft, is the volume is increased, and when it is loud, the vol-
system output being controlled. ume is decreased. This process is called compressing the music
The equation for a PID can be expressed in the form and is particularly useful for listeners with impaired hearing
y K1e K2de/dt K3 e dt 冕 who cannot hear soft passages without turning the volume
up so far that loud passages are distorted.
The control signal y now depends and the size of the error In the next example, the signal fed to the loudspeaker is
between the desired and actual outputs from the controller, composed of two parts. Mi represents the current value,
the rate at which the error is changing, and the accumulated and BMij the value of the signal j samples earlier, scaled
error over a period. by a factor B. Normally the factor B is less than unity.
We can’t go into control theory here but we should men- Where do we get a signal plus a delayed, attenuated value?
tion several important points. Designing a PID system is not These features are found in an echo and are of interest to
easy. You have to choose the amounts of proportional, deriv- the makers of electronic music. By very simple processing,
ative, and integral feedback as well as the time constant of the we are able to generate echoes entirely by digital tech-
integrator. If the system is not correctly designed it can niques. Analog signal processing requires complex and
become unstable and oscillate. inflexible techniques. Synthesizing an echo by analog tech-
In the final part of this section we look at how digital sig- niques requires you to first convert the sound into vibra-
nals are processed by the computer. tion by a transducer. A spring is connected to the
transducer and the acoustic signal travels down it to a
microphone at the other end. The output of the micro-
11.7.2 Digital signal processing phone represents a delayed version of the original signal—
Let’s begin with a simple example of signal processing. the echo. The length of the delay is increased by using a
Suppose music from a microphone is quantized, converted longer spring. In the digital version, simply modifying the
into a sequence of digital values by an ADC, fed into a com- value of j changes the delay.
puter, and stored in an array, M. We can read consecutive FOR i i+1 TO k DO
digital values from the array and use a DAC to convert them Output= Mi B * Mi-j
into an analog signal that is fed to a loudspeaker. Consider the ENDFOR
following algorithm.
11.7 Introduction to digital signal processing 489
The final example of signal processing represents the linear Let’s give see what happens when we give the filter coeffi-
transversal equalizer that implements a general-purpose dig- cients C0 the value 0.6 and C1 the value 0.4, and make the
ital filter. In audio terms, a digital filter acts as tone controls or input series X 0, 0, 1, 1, 1, 1, . . . 1, which corresponds to a
an equalizer. We are going to look at this topic in a little more step function. The output sequence is given by
detail next.
y0 0.6 x0 0.4 x1 0.6 0 0.4 0.0 0.0
FOR i 1 TO k DO y1 0.6 x1 0.4 x0 0.6 0 0.4 0.0 0.0
a K4 * Mi-4 y2 0.6 x2 0.4 x1 0.6 1 0.4 0.0 0.6
b K3 * Mi-3 y3 0.6 x3 0.4 x2 0.6 1 0.4 1.0 1.0
c K2 * Mi-2 y4 0.6 x4 0.4 x3 0.6 1 0.4 1.0 1.0
d K1 * Mi-1
e K0 * Mi The output sequence is a rounded or smoothed step func-
Output a b c d e tion (i.e. when the input goes from 0 to 1 in one step, the out-
ENDFOR put goes 0.0, 0.6, 1.0). This type of circuit is called a low-pass
The output is a fraction of the current sample plus filter because sudden changes in the input sequence are
weighted fractions of the previous four samples. Let’s look at diminished by averaging consecutive values. Real digital fil-
this operation in a little more detail. ters have many more delays and coefficients. Consider the
output of a filter with four delay units given by
Digital filters yi C0 ·xi C1 ·xi1 C2 ·xi2 C3 ·xi3 C4 ·xi4
An important application of digital signal processing is the
If we use this filter with coefficients 0.4, 0.3, 0.2, 0.1 and
digital filter. A digital filter behaves like an analog filter—it
subject it to a step input, we get
can pass or stop signals whose frequencies fall within certain
ranges. Consider an analog signal, X, that has been digitized y0 0.4 x0 0.3 x1 0.2 x2 0.1 x3
and its successive values are 0.4 0 0.3 0 0.2 0 0.1 0 0.0
x0, x1, x2, x3, . . . , xi1, xi, xi+1, . . . y1 0.4 x1 0.3 x0 0.2 x1 0.1 x2
0.4 1 0.3 0 0.2 0 0.1 0 0.4
Now suppose we generate a new sequence of digital values, y2 0.4 x2 0.3 x1 0.2 x0 0.1 x1
Y, whose values are y0, y1, y2, . . . , where, 0.4 1 0.3 1 0.2 0 0.1 0 0.7
yi C0 ·xi C1 ·xi1 y3 0.4 x3 0.3 x2 0.2 x1 0.1 x0
0.4 1 0.3 1 0.2 1 0.1 0 0.9
An element in the output series, yi, is given by a fraction of
y4 0.4 x4 0.3 x3 0.2 x2 0.1 x1
the current element from the input series C0xi plus a fraction
0.4 1 0.3 1 0.2 1 0.1 1 1.0
of the previous element C1xi 1 of the input series.
Figure 11.79 illustrates this operation. The symbol Z 1 is In this case, the output is even more rounded (i.e. 0.0, 0.4,
used to indicate a 1-unit delay (i.e. the time between two suc- 0.7, 0.9, 1.0).
cessive samples of a signal). In other words the operation A more interesting type of filter is called a recursive filter
xi Z1 is equivalent to delaying signal xi by one time unit— because the output is expressed as a fraction of the current
similarly Z2 delays xi by two time units. This notation input and a fraction of the previous output. In this case, the
belongs to a branch of mathematics called Z transforms. output sequence for a recursive filter with a single delay unit
is given by
yi C0 ·xi C1 ·yi1.
Delay element
Input sequence
Figure 11.80 shows the structure of a recursive filter.
xi Z –1 Suppose we apply the same step function to this filter that we
used in the previous examples. The output sequence is given by
Coefficient
C0 C1 y0 0.6 x0 0.4 y1 y0 0.6 0 0.4 0 0.0
y1 0.6 x1 0.4 y0 y1 0.6 0 0.4 0 0.0
y2 0.6 x2 0.4 y1 y2 0.6 1 0.4 0 0.6
y3 0.6 x3 0.4 y2 y3 0.6 1 0.4 0.6 0.84
yi Output sequence
y4 0.6 x4 0.4 y3 y4 0.6 1 0.4 0.84 0.936
Summer y5 0.6 x5 0.4 y4 y5 0.6 1 0.4 0.936 0.9744
y6 0.6 x6 0.4 y5 y6 0.6 1 0.4 0.9744 0.98976
Figure 11.79 The digital filter. y7 0.6 x7 0.4 y6 y7 0.610.40.98976 0.995904
490 Chapter 11 Computer peripherals
0 1 2 3 4 5 6 7 8
time
Figure 11.81 plots the input and output series for the
recursive filter of Fig. 11.80. As you can see, the output series Figure 11.81 Response of the filter of Fig. 11.80 to a step input.
(i.e. the yi) rises exponentially to 1. The effect of the operation
C0 xi C1 yi 1 on a digital sequence is the same as that of
a low-pass analog filter on a step signal. You can see that the Output
recursive digital filter is more powerful than a linear digital 1.0
0.9
filter. By changing the constants in the digital equation we can
0.8 Input
change the characteristics of the digital filter. Digital filters 0.7
are used to process analog signals and to remove noise. 0.6
The opposite of a low-pass filter is a high-pass filter, which 0.5
passes rapid changes in the input sequence and rejects slow 0.4
changes (or a constant level). Consider the recursive digital 0.3
0.2
filter defined by
0.1 Output
yi C0 ·xi C1 ·yi1.
0 1 2 3 4 5 6 7 8
All we have done is change the sign of the constant C1 and time
subtracted a fraction of the old output from a fraction of the
new input. In this case, a constant or slowly changing signal is Figure 11.82 Response of a high-pass filter to a step input.
subtracted from the output. Consider the previous example
with a step input and coefficients C0 0.6 and C1 0.4:
The technique used to recover signals from noise is called
y0 0.6 x00.4 y1 y0 0.6 00.4 0 0.0
correlation. We met correlation when we discussed the wave-
y1 0.6 x10.4 y0 y1 0.6 00.4 0 0.0
forms used to record data on disks—the more unalike the
y2 0.6 x20.4 y1 y2 0.6 10.4 0 0.60
waveforms used to record 1s and 0s, the better. Correlation is
y3 0.6 x30.4 y2 y3 0.6 10.4 0.6 0.36
a measure of how similar two waveforms or binary sequences
y4 0.6 x40.4 y3 y4 0.6 10.4 0.36 0.4176
are. Correlation varies from 1 to 0 to 1. If the correlation
y5 0.6 x50.4 y4 y5 0.6 10.4 0.4176 0.43296
is 1, the signals are identical. If the correlation is 0, the two
y6 0.6 x60.4 y5 y6 0.6 10.4 0.43296 0.426816
signals are unrelated. If the correlation is 1, one signal is the
y7 0.6 x70.4 y6 y7 0.6 10.4 0.426816 0.42927
inverse of the other.
In this case the step function dies away as Fig. 11.82 Two signals can be correlated by taking successive samples
demonstrates. from each of the series, multiplying the pairs of samples, and
then averaging the sum of the product. Consider now the cor-
Correlation relation function of two series X x0, x1, x2, x3, x4 and Y
One of the most important applications of digital signal y0, y1, y2, y3, y4.
processing is the recovery of very weak signals that have The correlation between X and Y is given by 1/5(x0 y0,
been corrupted by noise. Signals received from satellites and x1 y1, x2 y2, x3 y3).
deep space vehicles are often so weak that there is An example of the use of correlation is the effect of rain-
considerably more noise than signal—anyone listening to fall in the mountains on crop growth in the plain. Simply
such a signal on a loudspeaker would hear nothing correlating the sequence of rainfall measurements with
more than the hiss of white noise. Modern signal processing crop growth doesn’t help because there’s a delay between
techniques enable you to extract signals from noise when rainfall and plant growth. We can generate several correla-
the signal level is thousands of times weaker than the noise. tion functions by correlating one sequence with a
11.7 Introduction to digital signal processing 491
delayed version of the other sequence. Now we have a is corrupted by. samples of random noise, the noise is not
sequence of correlation functions that depend on the correlated with the sequence and the correlation function
delay between the sequences, and we can express the kth is low. Noisy data from, say a satellite, is correlated with the
correlation value as same sequence used to generate the data. This operation is
performed by correlating the incoming data with the
Ck 兺xi · yik appropriate sequence and varying the delay k. A sequence
of correlation values are recorded and compared with a
Suppose that X 1, 2, 3, 1, 4, 2, 0, 1 and Y 0, 0, 1, 1, threshold value. If the correlation is above the threshold, it
0, 1, 1, 0, 0, 0: is assumed that the sequence is present in the received
signal.
C0 1 0 20 3 1 11 40 21 01 1 0 6
Here we have done little more than mention a few examples
C1 10 21 3-1 10 41 21 00 10 5
of digital signal processing. The techniques we have described
C2 11 21 30 11 41 20 00 10 4
can be used in both the audio and visual domains. Processing
These results don’t seem very interesting until we apply this the signals that represent images allows us to, for example,
technique to a real situation. Suppose a transmitter uses the sharpen blurred images or to remove noise from them, or to
sequence 0.25, 0.5, 1.0, 0.5, 0.25 to represent a logical 1; emphasize their edges.
that is a 1 is transmitted as the sequence of values 0.25, 0.5,
1.0, 0.5, 0.25. Suppose we receive this signal without noise ■ SUMMARY
and correlate it with the sequence representing a 1. That is,
In this chapter we’ve looked at some of the many
C0 0.25 0.25 0.5 0.50 1 1.0 peripherals that can be connected to a computer. We began
0.5 0.5 0.25 0.25 with the two most important peripherals from the point of
1.625 view of the average PC user, the input device (keyboard and
C1 0.25 0.00 0.5 0.25 mouse) and the output device (CRT, LCD, and plasma display).
1 0.5 0.5 1.0 0.25 0.5 0.0 0.25 We have looked at the construction of input/output devices
1.25 and have described how printers work. In particular, we have
C2 0.25 0.00 0.5 0.00 1 0.25 0.5 0.5 demonstrated how computers handle color displays.
Some devices receive or generate analog signals. We have
0.25 1.0 0.0 0.5 0.0 0.25
examined how analog signals from sensors are processed by the
0.75
computer. We have provided a brief discussion of how analog
As you can see, the greatest correlation factor occurs signals can be converted into digital form and vice versa and the
when the sequence is correlated with itself. If the sequence problems of sampling a time-varying signal.
492 Chapter 11 Computer peripherals
We have also briefly introduced some of the devices that (c) For this signal explain how you would go about calculating
enable us to control the World around us: temperature, pressure, the minimum rate at which it must be sampled.
and even rotation sensors.
11.13 What is a sample and hold circuit and how is it used in
ADC systems?
■ PROBLEMS 11.14 One of the most important applications of microproces-
sors in everyday systems is the controller. Describe the structure
11.1 Why are mechanical switches unreliable?
of a three-term PID (proportional, integral, derivative) control
11.2 Imagine that keyboards did not exist (i.e. you are free from system and explain how, for example, it can be used in tracking
all conventional design and layout constraints) and you were systems.
asked to design a keyboard for today’s computers. Describe the
11.15 Find further examples of the use of digital signal
layout and functionality of your keyboard.
processing.
11.3 Why are optical mice so much better than mechanical
11.16 What is the meaning of x0, x1, x2, x3, x4, . . . , xI, . . . , xn in
mice?
the context of digital filter.
11.4 A visual display has a resolution of 1600 1200 pixels. If
11.17 A digital filter is defined by the equation yn 0.2xn
the display is updated at 60 frames a second, what is the aver-
0.1 xn 1 0.4 yn 1 0.3 yn 2 where yn is the nth
age rate at which a pixel must be read from memory?
output and xn is the nth input.
11.5 Most displays are two-dimensional. How do you think (a) What is the meaning of this equation in plain English?
three-dimensional displays can be constructed? Use the (b) What is the difference between yn and yn1?
Internet to carry out your research. (c) How is the above equation represented diagrammatically?
11.6 How do dot printers (for example, the inkjet printer) (d) Does the above equation represent a recursive filter?
increase the quality of an image without increasing the number (e) Describe the circuit elements in the diagram.
of dots? (f) If the input sequence x0, x1, x2, x3, x4, . . . is 0.0, 0.1, 0.2, 0.3,
0.4, . . . , what is the output sequence?
11.7 What is the difference between additive and subtractive (g) What does this filter do?
color models?
11.18 A recursive digital filter is described by the expression
11.8 Use the Internet or a current computer magazine to
yn c0 ·xn c1 ·xn1 c2 ·yn1
calculate the ratio of the cost of a 17-inch monitor to a basic
color printer. What was the value of this ratio 12 months ago? where the output of the filter is the sequence y0, y1, y2, . . . ,
yn 1, yn and the input is the sequence x0, x1, . . . , xn 1, xn. The
11.9 Why does an interlaced CRT monitor perform so badly
terms c0, c1, and c2 are filter coefficients with the values
when used as a computer monitor?
c0 0.4, c1 0.1, and c2 0.3.
11.10 Describe, with the aid of diagrams, how an integrating (a) What is the meaning of a recursive filter?
analog-to-digital converter operates. Explain also why the (b) Draw a block diagram of the structure of this filter (note
accuracy of an integrating converter depends only on the that the delay element is represented by Z 1).
reference voltage and the clock. (c) Draw a graph of the output sequence from this filter
11.11 What is a tree network (when applied to DACs) and what corresponding to the input sequence given by the step
is its advantage over other types of DAC (e.g. the R–2R ladder). function 0, 0, 1, 1, 1, 1, . . . 1.
(d) In plain English, what is the effect of this filter on the step
11.12 A triangular-wave generator produces a signal with a input?
peak-to-peak amplitude of 5 V and a period of 200 s. This
analog signal is applied to the input of a 14-bit A to D converter. 11.19 Why is speech not used more widely as a form of
computer input?
(a) What is the signal-to-noise ratio that can be achieved by
the converter? 11.20 Suppose that speech were used as a form of computer
(b) What is the minimum input change that can be reliably input. Do you think that all languages would have the same
resolved? degree of accuracy (i.e. the number of errors in the input
stream) or would some languages work better with speech
recognition software than others?
Computer memory 12
CHAPTER MAP
INTRODUCTION
Memory systems are divided into two classes: immediate access memory and secondary
storage. We begin with the high-speed immediate access main store based on semiconductor
technology and demonstrate how memory components are interfaced to the CPU. Then we
look at magnetic and optical secondary stores that hold data not currently being processed by
the CPU. Secondary stores have gigantic capacities but are much slower than immediate access
stores.
Over the years, memory systems have been subject to three trends, a reduction in their cost, an
increase in their capacity, and an increase in their speed. Figure 12.1 (from IBM) demonstrates just
how remarkably memory costs have declined over a decade for both semiconductor and magnetic
memory. Fifteen years has witnessed a reduction of costs by three orders of magnitude.
100
DRAM/Flash
HDD DRAM Flash
10
1
Price/Mbyte ($)
0.1
1" HDD
0.01
Mobile/server HDD
Speed Capacity
low
On-chip
Registers
5 ns Cache Random access 512 kbytes
memory
In the computer
50 ns
Main store
1 Mbytes
10 ms
Hard disk (magnetic) Serial
access 500 Gbytes
memory
100 ms CD-ROM and DVD (optical)
External
Tape (magnetic)
high 1000 Gbytes
slow 100 s (1 Tbyte)
Optical storage systems
are smaller than magnetic
storage (600 Mbytes to 14 Gbytes)
High speed A memory’s access time should be very low, high levels of vibration—the military are very keen on this
preferably 0.1 ns, or less. aspect of systems design.
Small size Memory should be physically small. Five hundred Low cost Memory should cost nothing and, ideally, should be
thousand megabytes (i.e. 500 Gbytes) per cubic centimeter given away free with software (e.g. buy Windows 2015® and
would be nice. get the 500 Gbytes of RAM needed to run it free).
Low power The entire memory system should run off a watch Figure 12.2 illustrates the memory hierarchy found in
battery for 100 years. many computers. Memory devices at the top of the hierarchy
Highly robust The memory should not be prone to errors; are expensive and fast and have small capacities. Devices at
a logical one should never spontaneously turn into a logical the bottom of the hierarchy are cheap and store vast amounts
zero or vice versa. It should also be able to work at tempera- of data, but are abysmally slow. This diagram isn’t exact
tures of 60C to 200C in dusty environments and tolerate because, for example, the CD-ROM has a capacity of
12.1 Memory hierarchy 495
600 Mbytes and (from the standpoint of capacity) should conventional hard disks. The CD-ROM is used to distribute
appear above hard disks in this figure. software and data. Writable CD drives and their media are
Internal CPU memory lies at the tip of the memory hierar- more expensive and are used to back up data or to distribute
chy in Fig. 12.2. Registers have very low access times and are data. The CD-ROM was developed into the higher capacity
built with the same technology as the CPU. They are expen- DVD in the 1990s.
sive in terms of the silicon resources they take up, limiting the Magnetic tape provides an exceedingly cheap serial access
number of internal registers and scratchpad memory within medium that can store 1000 Gbytes on a tape costing a few
the CPU itself. The number of registers that can be included dollars. The average access time of tape drives is very long in
on a chip has increased dramatically in recent years. comparison with other storage technologies and they are
Immediate access store holds programs and data during used only for archival purposes. Writable CDs and DVDs
their execution and is relatively fast (10 ns to 50 ns). Main have now replaced tapes in many applications.
store is implemented as semiconductor static or dynamic By combining all these types of memory in a single com-
memory. Up to the 1970s ferrite core stores and plated wire puter system, the computer engineer can get the best of all
memories were found in main stores. Random access mag- worlds. You can construct a low-cost memory system with a
netic memory systems are now obsolete because they are performance only a few percent lower than that of a memory
slow, they are costly, they consume relatively high power, and constructed entirely from expensive high-speed RAM. The
they are physically bulky. Figure 12.2 shows the two types of key to computer memory design is having the right data in
random access memory, cache and main store. the right place at the right time. A large computer system may
The magnetic disk stores large quantities of data in a small have thousands of programs and millions of data files.
space and has a very low cost per bit. Accessing data at a par- Fortunately, the CPU requires few programs and files at any
ticular point on the surface is a serial process and a disk’s one time. By designing an operating system that moves data
access time, although fast in human terms, is orders of mag- from disk into the main store so that the CPU always (or
nitude slower than immediate access store. A disk drive can nearly always) finds the data it wants in the main store, the
store 400 Gbytes (i.e. 238 bytes) and has an access time of system has the speed of a giant high-speed store at a tiny frac-
5 ms. In the late 1990s an explosive growth in disk technology tion of the cost. Such an arrangement is called a virtual
took place and low-cost hard disks became available with memory because the memory appears to the user as, say, a
greater storage capacities than CD-ROMs and tape systems. 400 Gbyte main store, when in reality there may be a real
The CD-ROM was developed by the music industry to main memory of only 512 Mbytes and 400 Gbytes of disk
store sound on thin plastic disks called CDs (compact disks). storage. Figure 12.3 summarizes the various types of memory
CD-ROM technology uses a laser beam to read tiny dots currently available.
embedded on a layer within the disk. Unlike hard disks, Before we begin our discussion of storage devices proper,
CD-ROMs use interchangeable media, are inexpensive, and we define memory and introduce some of the terminology
store about 600 Mbytes, but have longer access times than and underlying concepts associated with memory systems.
Memory
Primary Secondary
(random access) (sequential access)
12.3.5 Magnetism The laser creates a tiny beam of light that illuminates a
correspondingly tiny dot that has been produced by
The most common low-cost, high-capacity storage semiconductor fabrication techniques. These dots store
mechanism uses magnetism. An atom consists of a nucleus information rather like the holes in punched cards and
around which electrons orbit. The electrons themselves have paper tape.
a spin that can take one of two values, called up and down.
Electron spin generates a magnetic field and the atom can be
in one of two possible magnetic states. Atoms themselves are
continually vibrating due to thermal motion. In most sub- 12.4 Semiconductor memory
stances, the spin axes of the electrons are randomly oriented,
because of the stronger thermal vibrations of the atoms and Semiconductor random access memory is fabricated on
there is no overall magnetic effect. A class of materials exhibit silicon chips by the same process used to manufacture micro-
ferromagnetism, in which adjacent electrons align their spin processors. Without the availability of low-cost semiconduc-
axes in parallel. When all the atoms in the bulk material are tor memory, the microprocessor revolution would have been
oriented with their spins in the same direction, the material delayed had microprocessors used the slow, bulky, and
is magnetized. Because we can magnetize material with its expensive ferrite core memory of 1960s and 1970s main-
electron spins in one of two states and then detect these frames. The principal features of semiconductor memory are
states, magnetic materials are used to implement memory. its high density and ease of use.
Up to the 1960s, immediate access memories stored data in
tiny ferromagnetic rings called ferrite cores (hence the term
core stores). Ferrite core stores are virtually obsolete today
and the most common magnetic storage device is the
12.4.1 Static semiconductor
hard disk. memory
Static semiconductor memory is created by fabricating an
array of latches on a single silicon chip. It has a very low access
12.3.6 Optical time, but is about four times more expensive than dynamic
memory because it requires four transistors per bit unlike the
The oldest mechanism used to store data is optical DRAM’s cell, which uses one transistor. Static RAM is easy to
technology. Printed text is an optical memory because use from the engineer’s point of view and is found in small
ink modifies the optical (i.e. reflective) properties of the memories. Some memory systems use static memory devices
paper. The same mechanism stores digital information in because of their greater reliability than dynamic memory.
barcodes. More recently, two technologies have been Large memories are constructed with dynamic memory
combined to create high-density optical storage devices. because of its lower cost.
Figure 12.4 illustrates a 4M CMOS
0 V +5 V semiconductor memory. The acronym
Address bus Data bus
CMOS means complementary metal
A0Vss Vcc D0 oxide semiconductor and indicates the
A1 D1
A2 D 2
semiconductor technology used to
A3 D3 manufacture the chip. The 4M denotes
A4 D4
A5 D5 Data is fed into or the memory’s capacity in bits; that is,
A6 D6 received from the RAM 222 bits. Power is fed to the memory via
A D7 on its 8-bit data bus
19 lines select 7 its Vss and Vcc pins.
A8
one of 219 A9 The chip in Fig. 12.4 is interfaced to
locations A10 CS Chip select the computer via its 32 pins, of which
A11 OE Output enable Memory control 19 are the address inputs needed to
A12 Read/write inputs
A13 R/W select one of 219 524 288 (i.e. 512K)
A14 Memory unique locations. Eight data lines
A15
A16 transfer data from the memory during
CS enables a read or write access
A17 OE enables the data bus drivers a read cycle and receive data from
A18 R/W selects a read or a write cycle the processor during a write cycle.
Electrical power is fed to the chip
Figure 12.4 The 512K 8 static RAM. via two pins. The three control
12.4 Semiconductor memory 499
MEMORY ORGANIZATION
Memory components are organized as n words by m bits 256K chip can be arranged as 64K locations, each containing
(the total capacity is defined as m n bits). Bit-organized 4 bits. The device in Fig. 12.4 is byte organized as 512K words
memory components have a 1-bit width; for example, a of 8 bits and is suited to small memories in microprocessor
bit-organized 256K chip is arranged as 256K locations each of systems in which one or two chips may be sufficient for
one bit. Some devices are nibble organized; for example, a all the processor’s read/write memory requirements.
pins, CS, OE, and R/ W determine the operation of the Suppose the memory device is connected to a CPU with
memory component as follows. a 32-bit address bus that can access 232 locations. This
RAM has 19 address inputs and provides only a fraction
of the address space that the CPU can access (i.e. 512 kbytes
Pin Name Function out of 4 Gbytes). Extra logic is required to map this block of
RAM onto an appropriate part of the processor’s address
CS Chip select When low, CS selects the chip
for a memory access space. The high-order address lines from the CPU (in this
case, A19 to A31) are connected to a control logic block that
R/ W Read/not write When high R/ W indicates a uses these address lines to perform the mapping operation.
read cycle; when low it Essentially, there are 4G/512K 232/219 213 slots into
indicates a write cycle which the RAM can be mapped. We’ll explain how this is
OE Output enable When low in a read cycle, OE done later.
allows data to be read from the
chip and placed on the data bus
12.4.2 Accessing memory—timing
In order for the chip to take part in a read or write
diagrams
operation, its CS pin must be in a low state. Whenever CS is The computer designer is interested in the relationship
inactive-high, the memory component ignores all signals at between the memory and the CPU. In particular, the memory
its other pins. Disabling the memory by turning off its inter- must provide data when the CPU wants it, and the CPU must
nal tri-state bus drivers permits several memories to share the provide data when the memory wants it. The engineer’s
same data bus as long as only one device is enabled at a time. most important design tool is the timing diagram. A timing
The R/W input determines whether the chip is storing the diagram is a cause-and-effect diagram that illustrates the
data at its eight data pins (R/ W 0), or is transferring data sequence of actions taking place during a read or write cycle.
to these pins (R/ W 1). The output enable pin, OE, turns The designer is concerned with the relationship between
on the memory’s tri-state bus drivers during a read cycle and information on the address and data buses, and the memory’s
off at all other times. Some chips combine OE with CS and control inputs. Figure 12.6 shows the simplified timing
R/W so that the output data buffers are automatically diagram of a static RAM memory chip during a read cycle.
enabled when CS 0 and R/W 1. The timing diagram illustrates the state of the signals
Address decoding and read/write electronics is located on involved in a memory access as a function of time. Each sig-
the chip, simplifying the design of modern memory systems. nal may be in a 0 or a 1 state and sloping edges indicate a
Figure 12.5 demonstrates how this device can be connected change of signal level. The timing diagram of the address bus
to a CPU. Because the chip is 8 bits wide (i.e. it provides 8 bits appears as two parallel lines crossing over at points A and B.
at a time), two chips would be connected in parallel in a sys- The two parallel lines mean that some of the address lines
tem with a 16-bit data bus, and four chips in a system with a may be high and some low; it’s not the actual logical values of
32-bit data bus. the address lines that interest us, but the time at which the
The CPU’s data bus is connected to the memory’s data contents of the address bus become stable for the duration of
pins and the CPU’s address bus is connected to the mem- the current memory access cycle. We haven’t drawn the R/W
ory’s address pins. The memory’s CS, R/W, and OE control line because it must be in its electrically high state for the
inputs are connected to signals from the block labeled duration of the entire read cycle.
‘Control logic’. This block takes control signals from the CPU Let’s walk through this diagram from left to right. At point A
and generates the signals required to control a read or a write in Figure 12.6, the contents of the address bus have changed
cycle. from their previous value and are now stable; that is, the old
500 Chapter 12 Computer memory
Address bus
Data bus
Low-order
address
lines
Address Data Address Data Address Data
High-order
address
lines
CPU Memory Memory
block 1 block 2
CPU
Switch
Row
Address DRAM array
address DRAM
address
Column Address
DS R/W address multiplexer RAS CAS R/W
Control from
CPU DRAM control signals
derived from CPU
control outputs
MPLX The multiplexer feeds
control either the row address
or the column address
RAS to the DRAM
MPLX
CAS
Clock
Timing and control
Address 1
Address valid
DRAM reliability
from CPU 0
Dynamic memory suffers from two
Address 1 peculiar weaknesses. When a memory
Row address Column address
at DRAM 0 cell is accessed and the inter-electrode
capacitor charged the dynamic memory
1 draws a very heavy current from the
RAS 0
power supply causing a voltage drop
along the power supply lines. This volt-
1
CAS age drop can be reduced by careful lay-
0
out of the circuit of the memory system.
1 Another weakness of the dynamic
R/W 0 memory is its sensitivity to alpha parti-
cles. Semiconductor chips are encapsu-
Data from 1 Data valid lated in plastic or ceramic materials that
CPU 0
contain tiny amounts of radioactive
R/W must be low material. One of the products of
and data valid before radioactive decay is the alpha particle
CAS goes active-low (helium nucleus), which is highly ioniz-
ing and corrupts data in cells through
Figure 12.11 The write-cycle timing diagram of a dynamic RAM. which it passes.
DRAM ORGANIZATION
DRAM FAMILIES
Over the years, the access time of DRAMs has declined, but early and thereby reducing the overall access time by
their performance has improved less than that of the CPU. about 15%.
Manufacturers have attempted to hide the DRAM’s Synchronous DRAM (SDRAM) The SDRAM is operated in a
relatively poor access time by introducing enhanced DRAM burst mode and several consecutive locations are accessed
devices. sequentially; for example 5-1-1-1 SDRAM provides the first
Fast page mode DRAM (FPD) This variation lets you data element in five clock cycles but the next three elements
provide a row address and then access several data elements are provided one clock cycle after each other.
in the same row just by changing the column address. Access Double data rate synchronous DRAM (DDR DRAM) This is a
time is reduced for sequential addresses. version of SDRAM where the data is clocked out on both the
Extended data out DRAM (EDO) An EDO provides a rising and falling edges of its clock to provide twice the data
small improvement by starting the next memory access transfer rate.
When an alpha particle passes through a DRAM cell, a soft A random soft error that corrupts a bit once a year in a PC
error occurs. An error is called soft if it is not repeatable (i.e. is an irritation. In professional and safety-critical systems
the cell fails on one occasion but has not been permanently the consequences of such errors might be more severe. The
damaged). The quantity of alpha particles can be reduced by practical solution to this problem lies in the type of error-
careful quality control in selecting the encapsulating mater- correcting codes we met in Chapter 3. For example, five check
ial, but never reduced to zero. By the way, all semiconductor bits can be appended to a 16-bit data word to create a 21-bit
memory is prone to alpha-particle errors—it’s just that DRAM code word. If one of the bits in the code word read back from
cells have a low stored energy/bit and are more prone to these the DRAM is in error, you can calculate which bit it was and
errors than other devices. correct the error.
12.4 Semiconductor memory 505
12.4.4 Read-only semiconductor the corresponding transistor into a 1 state. A PROM is pro-
grammed by passing a current pulse to melt it and change the
memory devices state of the transistor from a 1 to a 0. For obvious reasons,
As much as any other component, the ROM (read-only these links are often referred to as fuses. A PROM cannot be
memory) was responsible for the growth of low-cost PCs in reprogrammed because if you fuse a link it stays that way. The
the 1980s when secondary storage mechanisms such as disk PROM has a low access time (5 to 50 ns) and is largely used as
drives were still very expensive. In those days a typical operat- a logic element rather than as a means of storing programs.
ing system and BASIC interpreter could fit into an 8- to
64-kbyte ROM. Although PCs now have hard disks, ROMs EPROM
are still found in diskless palm-top computers and personal The EPROM is an erasable programmable read-only memory
organizers. All computers require read-only memory to store that is programmed in a special machine. Essentially, an
the so-called bootstrap program that loads the operating sys- EPROM is a dynamic memory with a refresh period of tens
tem from disk when the computer is switched on (called the of years. Data is stored in an EPROM memory cell as an
BIOS (basic input/output system)). electrostatic charge on a highly insulated conductor. The
ROMs are used in dedicated microprocessor-based con- charge can remain for periods in excess of 10 years without
trollers. When a microcomputer is assigned to a specific task, leaking away.
such as the ignition control system in an automobile, the soft- We don’t cover semiconductor technology, but it’s worth
ware is fixed for the lifetime of the device. A ROM provides looking at how EPROMs operate. All we need state is that
the most cost-effective way of storing this type of software. semiconductors are constructed from pure silicon and that
Semiconductor technology is well suited to the production the addition of tiny amounts of impurities (called dopants)
of high-density, low-cost, read-only memories. We now changes the electrical characteristics of silicon. Silicon doped
describe the characteristics of some of the read-only memo- with an impurity is called n-type or p-type silicon depending
ries in common use: mask-programmed ROM, PROM, on how the impurity affects the electrical properties of the
EPROM, flash EPROM, and EEPROM. silicon.
Figure 12.12 illustrates an EPROM memory cell consisting
Mask-programmed ROM of a single NMOS field effect transistor. A current flows in the
Mask-programmed ROM is so called because its contents N channel between the transistor’s positive and negative
(i.e. data) are permanently written during the manufacturing terminals, Vdd and Vss. By applying a negative charge to a gate
process. A mask (i.e. stencil) projects the pattern of connec- electrode, the negatively charged electrons flowing through
tions required to define the data contents when the ROM is the channel are repelled and the current turned off. The tran-
fabricated. A mask-programmed ROM cannot be altered sistor has two states: a state with no charge on the gate and a
because the data is built into its physical structure. It is the current flowing through the channel, and a state with a
cheapest type of read-only semiconductor memory when charge on the gate that cuts off the current in the channel.
manufactured in bulk. These devices are used only when A special feature of the EPROM is the floating gate that
large numbers of ROMs are required because the cost of is insulated from any conductor by means of a thin layer of
setting up the mask is high. The other read-only memories silicon dioxide—an almost perfect insulator. By placing or not
we describe next are all user programmable and some are placing a charge on the floating gate, the transistor can by
reprogrammable. turned on or off to store a one or a zero in the memory cell.
PROM
A PROM (programmable read-only mem- Vgg
ory) can be programmed once by the user
in a special machine. A transistor is a
Silicon dioxide insulator
switch that can pass or block the passage of Silicon select gate
Vss Vdd
the current through it. Each memory cell
Silicon floating gate
in a PROM contains a transistor that can
be turned on or off to store a 1 or a 0. The n+ inplant p-type inplant n+ inplant
transistor’s state (on or off) is determined
by the condition of a tiny metallic link that p-type silicon substrate
connects one of the transistor’s inputs to a
fixed voltage. When you buy a PROM, it is
filled with all 1s because each link forces Figure 12.12 The structure of an EPROM memory cell.
506 Chapter 12 Computer memory
If the floating gate is entirely insulated, how do we put a Fowler–Nordheim tunnelling and is a quantum mechanical
charge on it in order to program the EPROM? The solution is effect. When a voltage in the range 12 to 20 V is applied across
to place a second gate close to the floating gate but insulated the insulating layer, electrons on the floating gate are able to
from it. By applying typically 12 to 25 V to this second gate, tunnel through the layer, even though they don’t have enough
some electrons cross the insulator and travel to the floating energy to cross the barrier.
gate (in the same way that lightning crosses the normally A flash EPROM is divided into sectors with a capacity of
non-conducting atmosphere). typically 1024 bytes. Some devices let you erase a sector or the
You can program an EPROM, erase it, and reprogram it whole memory and others permit only a full chip erase. Flash
many times. Illuminating the silicon chip with ultra-violet EPROMs can’t be programmed, erased, and reprogrammed
light erases the data stored in it. Photons of ultra-violet light without limit. Repeated write and erase cycles eventually
hit the floating gate and cause the stored charge to drain away damage the thin insulating layer. Some first-generation flash
through the insulator. The silicon chip is located under a EEPROMs are guaranteed to perform only 100 erase/write
quartz window that is transparent to ultra-violet light. cycles, although devices are now available with lifetimes of at
EPROMs are suitable for small-scale projects and for least 10 000 cycles.
development work in laboratories because they can be pro-
grammed, erased, and reprogrammed by the user. The disad- EEPROM
vantage of EPROMs is that they have to be removed from a The electrically erasable and reprogrammable ROM (EEPROM
computer, placed under a ultra-violet light to erase them, and or E2PROM) is similar to the flash EPROM and can be
then placed in a special-purpose programmer to reprogram programmed and erased electrically. The difference between
them. Finally, they have to be re-inserted in the computer. the EEPROM and the flash EPROM is that the flash EPROM
EPROMs have largely been replaced by flash EPROMs. uses Fowler–Nordheim tunneling to erase data and hot elec-
tron injection to write data, whereas pure EEPROMs use the
Flash EPROM tunneling mechanism to both write and erase data. Table 12.1
The most popular read-only memory is the flash EPROM, illustrates the difference between the EPROM, flash EPROM,
which can be erased and reprogrammed electronically. Until and EEPROM.
recently, typical applications of the flash EPROM were the EEPROMs are more expensive than flash EPROMs and gen-
personal organizers and system software in personal comput- erally have smaller capacities. The size of the largest state-of-the-
ers (e.g. the BIOS in PCs). Today, the flash EPROM is used to art flash memory is usually four times that of the corresponding
store images in digital cameras and audio in MP3 players. EEPROM. Modern EEPROMs operate from single 5 V supplies
When flash memories first appeared, typical capacities were and are rather more versatile than flash EPROMs. Like the flash
8 Mbytes. By 2005 you could buy 12-Gbyte flash memories. memory, they are read-mostly devices, with a lifetime of 10 000
The structure of an EPROM memory cell and a flash erase/write cycles. EEPROMs have access times as low as 35 ns
EPROM cell are very similar. The difference lies in the thick- but still have long write cycle times (e.g. 5 ms).
ness of the insulating layer (silicon oxynitride) between the The differences between a read/write RAM and an EEPROM
floating gate and the surface of the transistor. The insulating are subtle. The EEPROM is non-volatile, unlike the typical
layer of a conventional EPROM is about 300 Å thick, whereas semiconductor RAM. Second, the EEPROM takes much
a flash EPROM’s insulating layer is only 100 Å thick. Note longer to write data than to read it. Third, the EEPROM can
that 1 Å 1 1010 m (or 0.1 nanometers). be written to only a finite number of times. Successive erase
When an EPROM is programmed, the charge is trans- and write operations put a strain on its internal structure and
ferred to the floating gate by the avalanche effect. The voltage eventually destroy it. Finally, EEPROM is much more expen-
difference between the gate and the surface of the transistor sive than semiconductor RAM. The EEPROM is found in
causes electrons to burst through the oxynitride insulating special applications where data must be retained when the
layer in the same way that lightning bursts through the power is off. A typical application is in a radio receiver that
atmosphere. These electrons are called hot electrons because can store a number of different frequencies and recall them
of their high levels of kinetic energy (i.e. speed). The charge when the power is re-applied.
on the floating gate is removed during exposure to ultra-
violet light which gives the electrons enough energy to cross
the insulating layer. 12.5 Interfacing memory to a CPU
A flash EPROM is programmed in exactly the same way
as an EPROM (i.e. by hot electrons crashing through the We now look at how the semiconductor memory
insulator). However, the insulating layer in a flash EPROM is components are interfaced to the microprocessor. Readers
so thin that a new mechanism is used to transport electrons who are not interested in microprocessor systems design may
across it when the chip is erased. This mechanism is known as skip this section.
12.5 Interfacing memory to a CPU 507
An 32-bit bus transfers 32 bits This memory must be able Figure 12.13 CPU, bus, and
(i.e. 4 bytes) of data at a time to supply 32 bits of data. memory organization.
12.5.1 Memory organization the memory accesses are carried out automatically. Memory
A microprocessor operates on a word of width w bits and components are normally 1, 4, or 8 bits wide. If you use
communicates with memory over a bus of width b bits. 4-bit-wide memory devices in a 68K system, you have to
Memory components of width m bits are connected to the arrange them in groups of four because a memory block
microprocessor via this bus. In the best of all possible worlds, must provide the bus with 16 bits of data. Figure 12.13 shows
the values of w, b, and m are all the same. This was often true the organization of 8-bit, 16-bit, and 32-bit systems.
of 8-bit microprocessors, but is rarely true of today’s high- A memory system must be as wide as the data bus. That is,
performance processors. Consider the 68K microprocessor, the memory system must be able to provide an 8-bit bus with
which has an internal 32-bit architecture and a 16-bit data 8 bits of data, a 16-bit bus with 16 bits of data, and a 32-bit
bus interface. When you read a 32-bit value in memory, the bus with 32 bits of data, etc. Consider the following examples.
processor automatically performs two 16-bit read cycles. Example 1 An 8-bit computer with an 8-bit bus uses
The programmer doesn’t have to worry about this, because memory components that are 4 bits wide. Two of these
508 Chapter 12 Computer memory
4K × 4 4K × 4 4K × 4 4K × 4
4K locations of 16 bits = 8 kbytes
1M × 8 1M × 8
1M locations of 16 bits = 2 Mbytes
64K × 16 Figure 12.14 16-bit
64K locations of 16 bits = 128 kbytes memory organization.
devices are required to supply 8 bits of data; each chip spanned by address lines A0 to A15. We are not going to use the
supplies 4 bits. 68K because it has a 23-bit address bus, a 16-bit data bus, and
Example 2 The amount of data in a block of memory, in special byte selection logic. These features of the 68K make it
bytes, is equal to the width of the data bus (in bytes) multi- more powerful than earlier 8-bit processors, but they do get
plied by the number of locations in the block of memory. A in the way of illustrating the basic principles. We provide
16-bit computer with a 16-bit bus uses memory components several 68K-based examples later in this chapter.
that are 1 bit wide. Sixteen of these devices are required to Consider the situation illustrated by Fig. 12.15, in which
supply 16 bits of data at a time. two 1K 8 memory components are connected to the
address bus of an 8-bit microprocessor. This processor has
Example 3 An 8-bit computer uses memory components
16 address lines, A0 to A15. Ten address lines, A0 to A9, from the
organized as 64K 4 bits; that is, there are 64K 216 differ-
CPU are connected to the corresponding address inputs of
ent addressable locations in the chip. Two of these chips are
the two memory components, M1 and M2. Whenever a loca-
required to provide the CPU with 8 data bits. The total size of
tion (one of 210 1K) is addressed in M1, the corresponding
the memory is 64 kbytes.
location is addressed in M2. The data outputs of M1 and M2
Example 4 A 16-bit computer uses memory components are connected to the system data bus. Because the data out-
that are 64K 4 bits. Four of these chips must be used to puts of both memory devices M1 and M2 are connected
provide the CPU with 16 bits of data. Therefore, each of the together, the data bus drivers in the memory components
64K locations provide 16 bits of data or 2 bytes (i.e. each of must have tri-state outputs. That is, only one of the memory
the 4 chips provides 4 of the 16 bits). The total size of the components may put data onto the system data bus at a time.
memory is 2 bytes 64K 128 kbytes. Both memory devices in Fig. 12.15 have a chip-select input
Example 5 A 16-bit computer uses 64K 16-bit memory (CS1 for block 1 and CS2 for block 2). Whenever the chip-
components, Only one of these chips is required to provide select input of a memory is active-low, that device takes part
16 bits of data (2 bytes). Therefore, each chip provides in a memory access and puts data on the data bus if R/W 1.
2 64K 128 kbytes. When CS1 or CS2 is inactive (i.e. in a high state) the appro-
priate data bus drivers are turned off, and no data is put on
Figure 12.14 demonstrates memory organization by show-
the data bus by that chip.
ing how three 16-bit-wide blocks of memory can be con-
Let CS1 be made a function of the address lines A10 to A15,
structed from 4-bit-wide, 8-bit-wide, and 16-bit-wide
so that CS1 f1(A15, A14, A13, A12, A11, A10). Similarly, let
memory components.
CS2 be a function of the same address lines, so that
CS2 f2(A15, A14, A13, A12, A11, A10). Suppose we choose
12.5.2 Address decoders functions f1 and f2 subject to the constraint that there are no
If the memory in a microprocessor system were constructed values of A15, A14, A13, A12, A11, A10 that cause both CS1 and
from memory components with the same number of CS2 to be low simultaneously. Under these circumstances, the
uniquely addressable locations as the processor, the problem conflict between M1 and M2 is resolved, and the memory
of address decoding would not exist. For example, an 8-bit map of the system now contains two disjoint 1K blocks of
CPU with address lines, A00 to A31, would simply be con- memory. There are several different strategies for decoding
nected to the corresponding address input lines of the mem- A10 to A16 (i.e. choosing functions f1 and f2). These strategies
ory component. Microprocessor systems often have memory may be divided into three groups: partial address decoding,
components that are smaller than the addressable memory full address decoding, and block address decoding.
space. Moreover, there are different types of memory: read/
write memory, read-only memory, and memory-mapped Partial address decoding
peripherals. We now look at some of the ways in which mem- Figure 12.16 demonstrates how two 1 kbyte blocks of mem-
ory components are interfaced to a microprocessor. ory are connected to the address bus in such a way that both
In order to simplify the design of address decoders we will blocks of memory are never accessed simultaneously. The
assume an 8-bit microcontroller with a 16-bit address bus conflict between M1 and M2 is resolved by connecting CS1
12.5 Interfacing memory to a CPU 509
A15
A14 16-bit address
bus
A9
A0 to A9 select
A1 one of 210 = 1024
A0 possible locations
in the memory
A0 A1 A9 A14A15 A0 A1 A9 A0 A1 A9
Address Address Address
Memory Memory
CPU block M1 CS1 block M2 CS2
1K × 8 1K × 8
EXAMPLE 1
An 8-bit microprocessor with a 16-bit address bus The lowest address is 10100000000000002 and the highest
accesses addresses in the range 101xxxxxxxxxxxxx2 address is 10111111111111112. This corresponds to the range
(where bits A15, A14, A13 marked 101 are selected by the A00016 to BFFF16.
address decoder and the xs refer to locations within the Three address lines are decoded to divide the address space
memory block). spanned by A0 to A15 into eight blocks. The size of one block is
64K/8 8K. You could also calculate the size of the block
What range of addresses does this block correspond to?
because you know it is spanned by 13 address lines and
How big is this block? 213 8K.
EXAMPLE 2
An 8-bit microprocessor with a 16-bit address bus addresses a (a) The number of chips required is (memory block)/
block of 32 kbytes of ROM. (chip size) 32K/8K 4.
(a) How many memory components are required if the (b) Each chip has 8K 213 locations, which are
memory is composed of 8 kbyte chips? accessed by the 13 address lines A0 to A12 from the
(b) What address lines from the processor select a location in processor.
the 32 kbyte ROM? (c) Address lines A0 to A12 from the CPU select a location in
(c ) What address lines have to be decoded to select the chip leaving A13 to A15 to be decoded.
the ROM? (d) The memory blocks are
(d ) What is the range of memory locations provided by each 000016 to 1FFF16
of the chips (assuming that the memory blocks are 200016 to 3FFF16
mapped contiguously in the region of memory space 400016 to 5FFF16
starting at address 000016)? 600016 to 7FFF16.
directly to A15 of the system address bus and by connecting locations can be accessed. Address lines A10 to A14 take no
CS2 to A15 via an inverter. M1 is selected whenever A15 0, part in the address-decoding process and consequently have
and M2 is selected whenever A15 1. Although we have dis- no effect on the selection of a location within either M1
tinguished between M1 and M2 for the cost of a single or M2.
inverter, a heavy price has been paid. Because A15 0 selects Figure 12.17 gives the memory map of the system corre-
M1 and A15 1 selects M2, it follows that either M1 or M2 sponding to Fig. 12.16. Memory block M1 is repeated 32
will always be selected. Although the system address bus times in the lower half of the memory space and M2 is
can specify 216 64K unique addresses, only 2K different repeated 32 times in the upper half of the memory space
510 Chapter 12 Computer memory
E1 E2 E3 C B A Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7
1 1 0 X X X 1 1 1 1 1 1 1 1
1 1 1 X X X 1 1 1 1 1 1 1 1
1 0 0 X X X 1 1 1 1 1 1 1 1
1 0 1 X X X 1 1 1 1 1 1 1 1
0 1 0 X X X 1 1 1 1 1 1 1 1
0 1 1 X X X 1 1 1 1 1 1 1 1
0 0 0 X X X 1 1 1 1 1 1 1 1
0 0 1 0 0 0 0 1 1 1 1 1 1 1
0 0 1 0 0 1 1 0 1 1 1 1 1 1
0 0 1 0 1 0 1 1 0 1 1 1 1 1
0 0 1 0 1 1 1 1 1 0 1 1 1 1
0 0 1 1 0 0 1 1 1 1 0 1 1 1
0 0 1 1 0 1 1 1 1 1 1 0 1 1
0 0 1 1 1 0 1 1 1 1 1 1 0 1
0 0 1 1 1 1 1 1 1 1 1 1 1 0
Device Size Address Range A15 A14 A13 A12 A11 A10 A9 A8 A7 A6 A5 A4 A3 A2 A1 A0
ROM1 4K 0000–0FFF 0 0 0 0 x x x x x x x x x x x x
ROM2 4K 1000–1FFF 0 0 0 1 x x x x x x x x x x x x
ROM3 4K 2000–2FFF 0 0 1 0 x x x x x x x x x x x x
ROM4 4K 3000–3FFF 0 0 1 1 x x x x x x x x x x x x
RAM 8K 4000–5FFF 0 1 0 x x x x x x x x x x x x x
P1 32 6000–601F 0 1 1 0 0 0 0 0 0 0 0 x x x x x
P2 32 6020–603F 0 1 1 0 0 0 0 0 0 0 1 x x x x x
. . . . . . . . . . . . . . . . . . .
P8 32 60E0–60FF 0 1 1 0 0 0 0 0 1 1 1 x x x x x
peripherals don’t occupy a 4K block, we have used address ability to perform most of the address decoding with a single
lines A8 to A11 to select a second 3-line to 8-line decoder that chip. The PROM address decoder saves valuable space on the
decodes the peripheral address space. microprocessor board and makes the debugging or modifica-
tion of the system easier. Because PROMs consume more
Address decoding with the PROM power than modern devices, they’ve largely been replaced by
Address decoding is the art of generating a memory compo- CMOS programmable array logic devices.
nent’s chip-select signal from the high-order address lines. The PROM’s n address inputs select one of 2n unique
An alternative to logic synthesis techniques is the program- locations. When accessed, each of these locations puts a word
mable read-only memory (PROM) look-up table. Instead of on the PROM’s m data outputs. This word is the value of the
calculating whether the current address selects this or that various chip-select signals themselves; that is, the processor’s
device, you just read the result from a table. The PROM was a higher-order address lines directly look up a location in the
popular address decoder because of its low access time and its PROM containing the values of the chip selects.
12.5 Interfacing memory to a CPU 513
This decoder divides the Let’s look at a very simple example of a PROM-
lower 32 kbytes of memory based address decoder. Table 12.4 describes a
into eight 4K blocks. 16-location PROM that decodes address lines A12
Y0 0000 ROM1 to A15 in an 8-bit microcomputer. Address lines
A14 C Y1 1000 ROM2 A12 to A15 are connected to the PROM’s A0 to A3
A13 B Y2 2000 ROM3
A12 A Y3 3000 ROM4
address inputs. Whenever the CPU accesses its
Y4 4000 64K memory space, the contents of one (and only
0 E Y5 5000 RAM
Select signals one) of the locations in the PROM are read.
A15 E Y6 6000
1 E to memory Suppose that the processor reads the contents of
Y7 7000 and peripherals.
memory location E124. The binary address of
E Y0 6000 Peripheral 1
A11 Y1 6020 Peripheral 2
this location is 11100001001001002 whose four
E
A10 Y2 6040 Peripheral 3 higher-order bits are 1110. Memory location
A9 E Y3 6060 Peripheral 4 1110 in the PROM is accessed and its contents
A8
Y4 6080 Peripheral 5 applied to the PROM’s data pins D0 to D7 to give
A7 C
A6 B Y5 60A0 Peripheral 6 the values of the eight chip selects CS0 to CS7. In
A5 A Y6 60C0 Peripheral 7
Address lines from Y7 60E0 Peripheral 8
this case, the device connected to D5 (i.e. CS5) is
the CPU used by the selected. Figure 12.22 demonstrates how the
address decoder This decoder divides the
region 6000 to 60FF into PROM-based address decoder is used. This is a
eight 256-byte blocks.
simplified diagram-in practice we would have to
Figure 12.20 Circuit of an address decoder for Table 12.3. ensure that the PROM was enabled only during a
valid memory access (for example, by using the
processor’s data strobe to enable the decoder).
Table 12.4 divides the CPU’s memory
space into 16 equal-sized blocks. Because
0000 the processor has a 64 kbyte memory space,
ROM1 Selected by Y0 each of these blocks is 64K/16 4 kbytes.
1000 Consequently, this address decoder can
ROM2 Selected by Y1 select 4-kbyte devices. If we wanted to select
2000 devices as small as 1 kbyte, we would
ROM3 Selected by Y2 require a PROM with 64 locations (and six
3000 address inputs). If you examine the D4
ROM4 Selected by Y3 (CS4) output column, you find that there
Address space
for which A15 = 0 4000 are two adjacent 0s in this column. If the
Selected by Y4 processor accesses either the 4K range 6000
5000
8K RAM to 6FFF or 7000 to 7FFF, CS4 goes low.
Selected by Y5 We have selected an 8K block by putting a 0
6000 in two adjacent entries. Similarly, there are
Peripheral Selected by Y6
space four 0s in the CS5 column to select a
7000 4 4K 16K block.
6000 Peripheral 1
Unused As we have just observed, the PROM can
6020 Peripheral 2
8000 select blocks of memory of differing size.
6040 Peripheral 3
In a system with a 16-bit address bus, a
6060 Peripheral 4
PROM with n address inputs (i.e. 2n bytes)
Address space 32 kbytes 6080 Peripheral 5
for which A15 = 1 (unused) can fully decode a block of memory with a
60A0 Peripheral 6 minimum size of 216/2n 216n bytes.
60C0 Peripheral 7 Larger blocks of memory can be decoded by
FFFF
60E0 Peripheral 8 increasing the number of active entries (in
6100
Unused our case, 0s) in the data column of the
6FFF PROM’s address/ data table. The size of the
block of memory decoded by a data output
Figure 12.21 Memory map for the system of Table 12.3 is equal to the minimum block size multi-
and Fig. 12.20. plied by the number of active entries in the
appropriate data column.
514 Chapter 12 Computer memory
Address bus
Data bus
High-order CS CS
address Address D0
lines CS0
D1 CS1
D2 CS2
PROM D3 CS3
D4 CS4 Select signals to
D5 CS5 other memory
D6 CS6 devices.
D7 CS7 Figure 12.22 Simplified
The PROM’s data circuit of a PROM-based
outputs directly decoder corresponding to
provide chip-select
signals Table 12.4.
Inputs Outputs
A15 A14 A13 A12 CPU name CS0 CS1 CS2 CS3 CS4 CS5 CS6 CS7
A3 A2 A1 A0 PROM name D0 D1 D2 D3 D4 D5 D6 D7
Range
0 0 0 0 0000 to 0FFF 0 1 1 1 1 1 1 1
0 0 0 1 1000 to 1FFF 1 0 1 1 1 1 1 1
0 0 1 0 2000 to 2FFF 1 1 0 1 1 1 1 1
0 0 1 1 3000 to 3FFF 1 1 1 0 1 1 1 1
0 1 0 0 4000 to 4FFF 1 1 1 1 1 1 1 1
0 1 0 1 5000 to 5FFF 1 1 1 1 1 1 1 1
0 1 1 0 6000 to 6FFF 1 1 1 1 0 1 1 1
0 1 1 1 7000 to 7FFF 1 1 1 1 0 1 1 1
1 0 0 0 8000 to 8FFF 1 1 1 1 1 1 1 1
1 0 0 1 9000 to 9FFF 1 1 1 1 1 1 1 1
1 0 1 0 A000 to AFFF 1 1 1 1 1 1 1 1
1 0 1 1 B000 to BFFF 1 1 1 1 1 1 1 1
1 1 0 0 C000 to CFFF 1 1 1 1 1 0 1 1
1 1 0 1 D000 to DFFF 1 1 1 1 1 0 1 1
1 1 1 0 E000 to EFFF 1 1 1 1 1 0 1 1
1 1 1 1 F000 to FFFF 1 1 1 1 1 0 1 1
Today, the systems designer can also use programmable The structure of 68K-based memory systems
logic elements such as PALs and PLAs to implement address To conclude this section on memory organization, we look at
decoders. Moreover, modern microprocessors now include how memory components are connected to a 68K micro-
sufficient RAM, flash EPROM, and peripherals on-chip to processor with its 16-Mbyte memory space and 16-bit data
make address decoding unnecessary. bus. Because the 68K has 16 data lines d00 to d15, memory
12.6 Secondary storage 515
Address bus
Data bus
d08 to d15 d00 to d07
EXAMPLE 3
Draw an address decoding table to satisfy the following 68K memory map
RAM1 00 0000 to 00 FFFF
RAM2 01 0000 to 01 FFFF
I/O_1 E0 0000 to E0 001F
I/O_2 E0 0020 to E0 003F
Address lines
Device Range 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
RAM1 00 0000 to 00 FFFF 0 0 0 0 0 0 0 0 x x x x x x x x x x x x x x x x
RAM2 01 0000 to 01 FFFF 0 0 0 0 0 0 0 1 x x x x x x x x x x x x x x x x
I/O_1 E0 0000 to E0 001F 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 x x x x x
I/O_2 E0 0020 to E0 003F 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 x x x x x
EXAMPLE 4
A 68K microprocessor system implements the following memory blocks:
(a) 1 Mbyte of ROM using 256K x 16-bit chips
(b) 8 Mbytes of DRAM using 2M x 4-bit chips
Construct a suitable address-decoding table and design an address decoder for this system.
A 16-bit-wide chip provides 2 bytes of data per location. Therefore, a single 256K x 16-bit ROM provides 512 kbytes of data.
We need two of these chips to provide 1-Mbyte.A 1-Mbyte block of data contains 220 bytes and is spanned by address lines A00 to A19.
In a 68K-based system address lines A20 to A23 must be decoded to select this block. Assume that the block of ROM is located at
address 00 0000 and that A23, A22, A21, A20 0, 0, 0, 0. This 1-Mbyte block is composed of two 512-kbyte sub-blocks. Therefore one
of these sub-blocks is selected when A19 0 and the other when A19 1.
The 8 Mbytes of DRAM are spanned by A00 to A22 (i.e. 223 bytes). This block of memory must be on an 8-Mbyte boundary
(i.e. 00 0000 or 80 0000 in a 68K-based system). Because 00 0000 is occupied by ROM, we’ll put the DRAM at 80 0000 for which
A23 1. This block is composed of 2M-location by 4-bit-wide devices. Four 4-bit-wide chips are required to provide 16 bits
(2 bytes) of data. The amount of data provided by these four chips per location is 2M locations 2 bytes 4M.We need two
of these sub-blocks to get 8 Mbytes. The first sub-block is selected by A22 0 and the second by A22 1.
16 bits
00 0000
80 0000
2M × 4 2M × 4 2M × 4 2M × 4 4 Mbytes
8 Mbytes
DRAM 8 Mbytes
2M × 4 2M × 4 2M × 4 2M × 4 4 Mbytes
FF FFFF
The next step is to construct an address-decoding table. A memory block must be on a boundary equal to its own size. The fol-
lowing address decoding table shows address lines A23 to A00. Although the 68K lacks an A00 line, it’s easier to add an A00 line to
the table so that we can operate in bytes.
Device Range A23 A22 A21 A20 A19 A18 ... A00
ROM 1 00 0000 to 07 FFFF 0 0 0 0 0 x ... x
ROM 2 08 0000 to 0F FFFF 0 0 0 0 1 x ... x
DRAM 1 80 0000 to BF FFFF 1 0 x x x x ... x
DRAM 2 C0 0000 to FF FFFF 1 1 x x x x ... x
If you didn’t treat the 8-Mbyte block of DRAM as a single block, but as two separate 4 Mbyte blocks, you could put each of these
4-Mbyte sub-block on any 4 Mbyte boundary. The following address decoding table is also a legal solution.
Device Range A23 A22 A21 A20 A19 A18 ... A00
ROM 1 00 000 to 07 FFFF 0 0 0 0 0 x ... x
ROM 2 00 000 to 0F FFFF 0 0 0 0 1 x ... x
DRAM 1 40 000 to 07 FFFF 0 1 x x x x ... x
DRAM 2 80 000 to BF FFFF 1 0 x x x x ... x
12.6 Secondary storage 517
EXAMPLE 5
Design an address decoder using a PROM to implement the following 68K memory map.
We begin by working out the sub-blocks of memory required from the size of the specified memory components.
(a) A pair of 1M 8-bit chips gives 2 Mbytes. We need two sub-blocks to get 4 Mbytes.
(b) Four 4M 4-bit chips gives 8 Mbytes. This provides all our needs.
(c) A pair of 512K 8-bit chips gives 1 Mbyte. This provides all our needs.
Each line in the PROM must select a block equal to the smallest block to be decoded; that is, 1 Mbyte. The PROM must
decode A23 to A20. In the following table, D0 from the PROM selects ROM1, D1 selects ROM2, D2 selects RAM2, and D3
selects RAM1.
Device Range A23 A22 A21 A20 D0 D1 D2 D3
ROM1 ROM2 RAM2 RAM1
ROM 1 00 0000 to 0F FFFF 0 0 0 0 0 1 1 1
ROM 1 10 0000 to 1F FFFF 0 0 0 1 0 1 1 1
ROM 2 20 0000 to 2F FFFF 0 0 1 0 1 0 1 1
ROM 2 30 0000 to 3F FFFF 0 0 1 1 1 0 1 1
unused 40 0000 to 4F FFFF 0 1 0 0 1 1 1 1
unused 50 0000 to 5F FFFF 0 1 0 1 1 1 1 1
RAM2 60 0000 to 6F FFFF 0 1 1 0 1 1 0 1
unused 70 0000 to 7F FFFF 0 1 1 1 1 1 1 1
RAM 1 80 0000 to 8F FFFF 1 0 0 0 1 1 1 0
RAM 1 90 0000 to 9F FFFF 1 0 0 1 1 1 1 0
RAM 1 A0 0000 to AF FFFF 1 0 1 0 1 1 1 0
RAM 1 B0 0000 to BF FFFF 1 0 1 1 1 1 1 0
RAM 1 C0 0000 to CF FFFF 1 1 0 0 1 1 1 0
RAM 1 D0 0000 to DF FFFF 1 1 0 1 1 1 1 0
RAM 1 E0 0000 to EF FFFF 1 1 1 0 1 1 1 0
RAM 1 F0 0000 to FF FFFF 1 1 1 1 1 1 1 0
magnet. In most matter the magnetic effects of electron spin exhibits a strong spontaneous magnetization and behaves like
are overcome by the stronger force generated by the thermal a tiny magnet with a North Pole at one end and a South Pole at
vibration of the atoms that prevents magnetic interaction the other end. Within a large piece of ferromagnetic material,
between adjacent atoms. the magnetic axes of individual domains are arranged at
In ferromagnetic materials such as iron there is a stronger random and there is no overall magnetic field in the bulk
interaction between electron spins, which results in the material.
alignment of electrons over a region called a domain. Domains Suppose we thread a wire through a hole in a ring (called a
range from 1 m to several centimeters in size. Because the toroid) of a ferromagnetic material and pass a current, i,
electron spins are aligned within a domain, the domain through the wire. The current generates a vector magnetic
518 Chapter 12 Computer memory
EXAMPLE 6
A memory board in a 68K-based system with a 16-bit data bus has 1 Mbyte of RAM composed of 128K 8 RAM chips located at
address C0 0000 onward. The board also has a block of 256 kbytes of ROM composed of 128K 8 chips located at address D8
0000. Design an address decoder for this board.
Two byte-wide RAM chips span the 16-bit data bus. The minimum block of memory is 2 128K 256 kbytes accessed by
address lines A17 to A00. We require 1 Mbyte of RAM, or four 256 kbyte blocks. Address lines A19 to A18 select a block and A23 to A20
select a Mbyte block out of the 16 possible 1 Mbyte blocks (A23 to A20 1100). The ROM is implemented as a single 256-kbyte
block using two 128-kbyte chips. The following table can be used to construct a suitable decoder.
Device A23 A22 A21 A20 A19 A18 A17 . . . A01 A00 Address range
RAM1 1 1 0 0 0 0 x... x x C0 0000 to C3 FFFF
RAM2 1 1 0 0 0 1 x... x x C4 0000 to C7 FFFF
RAM3 1 1 0 0 1 0 x... x x C8 0000 to CB FFFF
RAM4 1 1 0 0 1 1 x... x x CC 0000 to CF FFFF
ROM 1 1 0 1 1 0 x... x x D8 0000 to DB FFFF
EXAMPLE 7
Design an address decoder that locates three block of memory in the following ranges: 00 0000 to 7F FFFF, A0 8000 to A0 8FFF,
and F0 0000 to FF FFFF.
Address range A23 to A20 A19 to A16 A15 to A12 A11 to A8 A7 to A4 A3 to A0 Block size
000000 to 7FFFFF First location 0000 0000 0000 0000 0000 0000 8 Mbytes
Last location 0111 1111 1111 1111 1111 1111 spanned by 22 lines
A08000 to A08FFF First location 1010 0000 1000 0000 0000 0000 4 kbytes
Last location 1010 0000 1000 1111 1111 1111 spanned by 12 lines
F00000 to FFFFFF First location 1111 0000 0000 0000 0000 0000 1 Mbyte
Last location 1111 1111 1111 1111 1111 1111 spanned by 20 lines
From the table, you can see that the first block is selected by address line A23, the second block by address lines A23 to A12, and
the third block by address lines A23 to A20.
EXAMPLE 8
The following address decoding PROM selects three blocks of memory in a 68K-based system. How large is each block and what
address range does it occupy?
CPU address line A23 A22 A21 CS2 CS1 CS0
PROM address line A2 A1 A0 D2 D1 D0
0 0 0 0 1 1
0 0 1 1 1 1
0 1 0 1 0 1
0 1 1 1 0 1
1 0 0 1 1 0
1 0 1 1 1 0
1 1 0 1 1 0
1 1 1 1 1 0
The PROM decodes the 68K’s three highest-order address lines A23 to A21. These address lines partition the 68K’s 16-Mbyte
address space into eight 2-Mbyte blocks. CS2 selects the 2-Mbyte block for which A23, A22, A21 0, 0, 0. This is the address space
00 0000 to 1F FFFF. CS1 selects two 2-Mbyte blocks for which A23, A22 0, 1. This is the 4-Mbyte address space 40 0000 to
7F FFFF. CS0 selects the four 2-Mbyte blocks for which A23 1. This is the 8-Mbyte address space 80 0000 to FF FFFF.
12.6 Secondary storage 519
or
m fact 60 Gbyte Assume that initially the ferromagnetic
fo r
inch
3.5 36 Gbyte Server material is magnetized in a logical zero state
15K RPM
10
9.1 Gbyte
25 Gbyte and has an internal field Br. If a negative
8.1 Gbyte
external field is applied (i.e. negative i, there-
5.1 Gbyte
fore negative H), the value of the internal
4.5 Gbyte 4 Gbyte
Microdrive magnetization B goes slightly more negative
1.2 Gbyte
1 than Br and we move towards point P in
h 1 Gbyte Microdrive
. 5 inc Fig. 12.25. If H is now reduced to zero, the
2 ed
. 0 inch er bas 0.34 Gbyte Microdrive remnant magnetization returns to Br. In
1 usm
con
other words, there is no net change in the
0.1
state of the ferromagnetic material.
94 95 96 97 98 99 2000 01 02 03 04 05 06
Now consider applying a small positive
Availability year
internal field H. The internal magnetization
Figure 12.24 The increase in disk capacity (from Hitachi Global is slightly increased from Br and we move
Storage Technologies, San Jose Research Centre). along the curve towards point Q. If the external magnetiza-
tion is reduced we move back to Br. However, if H is
increased beyond the value
B Internal Hm, the magnetization of
magnetic field the material flips over at Q,
Magnetic field H +Bm R
due to current i and we end up at point R.
H S +Br
B Magnetic field B When we reduce the external
in the material
field H to zero, we return
to Br and not to Br. If
–Hm the material is initially in a
H
+Hm negative state, increasing
Magnetic material External
i magnetic field the external magnetization
Wire carrying –B r Q beyond Hm causes it to
current i –Bm assume a positive state. A
P
magnetic field of less than
(a) Magnetic core. (b) Hysteresis curve that relates internal field B to external field H.
Hm is insufficient to change
Figure 12.25 The hysteresis curve. the material’s state.
Similarly, if the ferromagnetic material is in a one state
(Br), a positive value of H has little effect, but a more nega-
field, H, in the surrounding space, where H is proportional to tive value of H than Hm will switch the material to a zero
i. A magnetic field, B, is produced inside the ring by the state (Br).
combined effects of the external field, H and the internal The switching of a ferromagnetic material from one state
magnetization of the core material. A graph of the relation- to another is done by applying a pulse with a magnitude
ship between the internal magnetic field B and the external greater than Im to the wire. A pulse of Im always forces the
magnetic field H for a ferromagnetic material is given in material into a logical one state, and a pulse of ⫺Im forces it
Figure 12.25. This curve is called a hysteresis loop. into a logical zero state.
Suppose that the external field round the wire is initially The hysteresis curve can readily be explained in terms of the
zero; that is, H 0 because the current flowing through the behavior of domains. Figure 12.26 shows a region of a ferro-
wire, i, is zero. Figure 12.25 demonstrates that there are two magnetic material at three stages. At stage (a) the magnetic
possible values of B when H 0: Bi and Br. These two material is said to be in its virgin state with the domains
states represent a logical one and a logical zero. The suffix r in oriented at random and has no net magnetization. This corre-
Br stands for remnant and refers to the magnetism remaining sponds to the origin of the hysteresis curve, where H 0 and
in the ring when the external field is zero. Like the flip-flop, B 0.
520 Chapter 12 Computer memory
Substrate
Region magnetized S–N Region magnetized N–S Region magnetized S–N Figure 12.29 The magnetized
layer.
medium used to store data. The size of the particles has been Write
reduced and their magnetic properties improved. Some tapes current
employ a thin metallic film, rather than individual particles. +I
Metal oxide coatings are about 800 m thick with oxide par- time
ticles approximately 25 m by 600 m with an ellipsoidal –I
shape. A thin film coating is typically only 100 m thick. Recorded
flux
Reading data +B r
A first-generation read head was essentially the same as a time
write head (sometimes a single head serves as a both a read –Br
and a write head). When the magnetized material moves past
the gap in the read head, a magnetic flux is induced in the Read
voltage
head. The flux, in turn, induces a voltage across the terminals +V
of the coil that is proportional to the rate of change of the
time
flux, rather than the absolute value of the magnetic flux itself.
–V
Figure 12.30 shows the waveforms associated with writing
and reading data on a magnetic surface. The voltage from the Figure 12.30 Read/write waveforms.
read head is given by
v(t) K d/dt
You can’t transmit the sequence of logical 1s and 0s to be
K is a constant depending on the physical parameters of the recorded directly to the write head. If you were to record a
system and is the flux produced by the moving magnetic long string of 0s or 1s by simply saturating the surface at Br
medium. Because the differential of a constant is zero, only or Br, no signal would be received during playback. Why?
transitions of magnetic flux can be detected. The output from Because only a change in flux creates an output signal. A
a region of the surface with a constant magnetization is zero, process of encoding or modulation must first be used to trans-
making it difficult to record digital data directly on tape or disk. form the data pattern into a suitable code so that the recorded
data is always changing even if the source is all 1s or 0s.
Similarly, when the information is read back from the tape it
12.6.2 Data encoding techniques must be decoded or demodulated to extract the original dig-
Now that we’ve described the basic process by which informa- ital data. The actual encoding/decoding process chosen is a
tion is recorded on a magnetic medium, we are going to look compromise between the desire to pack as many bits of data
at some of the ways in which digital data is encoded before it is as possible into a given surface area while preserving the
recorded. Magnetic secondary stores record data serially, a bit reliability of the system and keeping its complexity within
at a time, along the path described by the motion of the mag- reasonable bounds.
netic medium under the write head. Tape transports have Let’s look as some of the possible recording codes (begin-
multiple parallel read/write heads and record several parallel ning with a code that illustrates the problem of recording
tracks simultaneously across the width of the tape. long strings of 1s and 0s). However, before we can compare
522 Chapter 12 Computer memory
ENCODING CRITERIA
Efficiency A code’s storage efficiency is defined as the imperfections in the magnetic coating leading to drop-outs
number of stored bits per flux reversal and is expressed in and drop-ins. A drop-out is a loss of signal caused by missing
percent. A 100% efficiency corresponds to 1 bit per flux magnetic material and a drop-in is a noise pulse. Another
reversal. source of noise is cross-talk, which is the signal picked up by
the head from adjacent tracks. Cross-talk is introduced because
Self-clocking The encoded data must be separated into
the read/write head might not be perfectly aligned with the
individual bits. A code that provides a method of splitting the
track on the surface of the recording medium. Noise can also
bits off from one another is called self-clocking and is highly
be caused by imperfect erasure. Suppose a track is recorded
desirable. A non-self-clocking code provides no timing
and later erased. If the erase head didn’t pass exactly over the
information and makes it difficult to separate the data stream
center of the track, it’s possible that the edge of the track
into individual bits.
might not have been fully erased.When the track is rerecorded
Noise immunity An ideal code should have the largest and later played back, a spurious signal from the unerased
immunity to noise and extraneous signals. Noise is caused by portion of the track will be added to the wanted signal.
Current in Current in
write head write head
Figure 12.31 Return-to-bias recording. Figure 12.32 Non-return to zero one recording (NRZ1).
various encoding techniques we need to describe some of the the characteristics of the head and the magnetic medium. A
parameters or properties of a code. In what follows the term wide pulse reduces the maximum packing density of the
flux reversal indicates a change of state in the recorded mag- recorded data and is wasteful of tape or disk surface but is
netic field in the coating of the tape or disk. Simply reversing easy to detect, whereas a very narrow pulse is harder to detect.
the direction of the current in the write head causes a flux Data is read from the disk/tape by first generating a data
reversal. Some of the criteria by which a recording code may window, which is a time slot during which the signal from the
be judged are described in the box. read head is to be sampled. The signal from the read head is
sampled at the center of this window. A sequence of 0s gener-
Return-to-zero encoding ates no output from the read head and there is no simple way
Return-to-zero (RZ) recording requires that the surface be of making sure that the data window falls exactly in the mid-
unmagnetized to store a zero and magnetized by a short pulse dle of a data cell. For this reason return-to-bias is said to be
to store a 1. Because no signal is applied to the write head non-self-clocking. The worst-case efficiency of RB recording is
when recording a 0, any 1s already written on the tape or disk 50% (when the data is a string of 1s) and its noise sensitivity
are not erased or overwritten. Return-to-bias recording (RB) is poor. RB recording is not used in magnetic recording.
is a modification of RZ recording in which a 0 is recorded by
saturating the magnetic coating in one direction and a 1 by Non-return to zero encoding
saturating it in the opposite direction by a short pulse of the One of the first widely used data encoding techniques was
opposite polarity. modified non-return to zero or NRZ1. Each time a 1 is to be
Figure 12.31 illustrates the principles of return-to-bias recorded, the current flowing in the head is reversed. When
recording and playback. A negative current in the write head reading data each change in flux is interpreted as a 1.
saturates the surface to Br. A positive pulse saturates the Figure 12.32 illustrates NRZ1 recording which requires a
surface to Br to write a 1. The pulse width used depends on maximum of one flux transition per bit of stored data giving
12.6 Secondary storage 523
Serial data 0 1 0 1 0 0 1 1
to be recorded
A 1 is recorded
Current in as a positive transition
write head
A 0 is recorded
as a negative transition
Voltage in read head
during playback
1 1000
Input code Output code
0000 11001 1
0001 11011 Example
0010 10010 Suppose the input code is
0 0100 1100101011
0011 10011
0100 11101 0 100100 This is re-arranged as
11 0010 011
0101 10101 Start 1
The output is
0110 10110 1 001000 100000100100001000
0111 10111 0
1000 11010
0 000100
1001 01001
0 0 00100100
1010 01010
1011 01011 1
1100 11110 1 0001000
1101 01101
1110 01110 Figure 12.35 RLL 2,7 encoding algorithm.
1111 01111
The next bit in the input sequence is 1 and we move from
Table 12.5 ANSI X3.54 4/5 group Start along the 1 branch. The second bit is 1 and that leads us to
code. a terminal node whose output code is 1000. This process con-
tinues until we reach the end of the input code and each group
of data. The algorithm that maps the 4 bits of data onto the of 2, 3, or 4 input bits have been replaced by a terminal value.
5-bit group code to be recorded avoids the occurrence of
more than two 0s in succession. This group code and a self-
clocking modification of NRZ1 guarantees at least one flux 12.7 Disk drive principles
transition per three recorded bits.
Another class of recording codes are the RLL or run-length We now look at the construction and characteristics of the
limited codes. Instead of inserting clock pulses to provide disk drive. The hard disk stores data on the surface of a flat,
timing information as in MFM recording, RLL codes limit circular, rigid platter of aluminum coated with a thin layer of
the longest sequence of 0s that can be recorded in a burst. magnetic material.1 Hard disks vary in size from 8 inches
Because the maximum number of 0s in succession is fixed, (obsolete) to 31⁄2 and 51⁄4 inches (PCs) to 1.3 to 21⁄2 inches (lap-
timing circuits can be designed to reliably locate the center of tops and portable devices). The platter rotates continually
each bit cell. A run-length limited code is expressed as Rm,n, about its central axis in much the same way as a black vinyl
where m defines the minimum number of 0s and n the max- disk on the turntable of a gramophone (for readers old
imum number of 0s between two 1s. enough to remember the days before the CD). The rotational
A typical RLL code is RLL 2,7 which means that each 1 is speed of disks in PCs was 3600 rpm, although 7200 rpm is
separated from the next 1 by two to seven 0s. In RLL a maxi- now common and some disks rotate at 15 000 rpm.
mum of four 0s may precede a 1 and three 0s may follow a 1. The read/write head is positioned at the end of an arm
Because RLL records only certain bit patterns, the source data above the surface of the disk. As the disk rotates, the
must be encoded before it can be passed to the RLL coder; for read/write head traces a circular path called a track around
example, the source pattern 0011 would be converted to the disk. Digital information is stored along the concentric
00001000. tracks (Fig. 12.36). Data is written in blocks called sectors
Figure 12.35 illustrates the RLL 2,7 encoding algorithm. along the track. Track spacing is of the order of 120 000
You take the source code and use its bits to locate a terminal tracks/inch. As time passes, track spacing will continue to
node on the tree. Suppose the source string is 0010110 . . . improve, whereas the speed of rotation will not grow at any-
The first bit is zero and we move down the zero branch from thing like the same rate.
Start. The second bit is 0 and we move down the 0 branch to Figure 12.37 illustrates the structure of a disk drive. A sig-
the next junction. The third bit is 1 and we move to the next nificant difference between the vinyl record and the magnetic
junction. The fourth bit is 0 and we move along the 0 branch.
This is a terminal node with the value 00100100; that is, the 1
Some modern platters are made of glass because of its superior
encoded value of the input sequence 00100. mechanical properties such as a low coefficient of thermal expansion.
12.7 Disk drive principles 525
Disk
Track
Sector
Rotation
Actuator
disk is that the groove on the audio disk is physically cut into and the gramophone record. In the former the tracks are
its surface, whereas the tracks on a magnetic disk are simply concentric and the head steps from track to track, whereas in
the circular paths traced out by the motion of the disk under the latter a continuous spiral groove is cut into the surface
the read/write head. Passing a current through the head mag- of the disk and the stylus gradually moves towards the center
netizes the moving surface of the disk and writes data along as the disk rotates. The actuator in Fig. 12.37 is a linear actua-
the track. Similarly, when reading data, the head is moved to tor and is no longer used in hard disks.
the required track and the motion of the magnetized surface Modem disk drives use a rotary head positioner to move the
induces a tiny voltage in the coil of the read head. read/write heads rather than the linear (in and out) position-
A precision servomechanism called an actuator moves ers found on earlier hard disk drives. Figure 12.38 shows how
or steps the arm holding the head horizontally along a radius a rotary head positioner called a voice coil actuator rotates an
from track to track. An actuator is an electromechanical arm about a pivot, causing the head assembly to track over
device that converts an electronic signal into mechanical the surface of the disks. A voice coil is so called because it
motion. Remember the difference between the magnetic disk works like a loudspeaker. A current is passed through a coil
526 Chapter 12 Computer memory
The height at which the head flies above the surface of the data densities had reached 100 Gbits/in2 and it was thought
disk is related to the surface finish or roughness of the mag- that densities would increase by a factor of 10 to yield
netic coating. If the magnetic material is polished, the surface 1 Tbits/in2 within a decade.
to head gap can be reduced by 50% in comparison with an
unpolished surface. Access time
Occasionally, the head hits the surface and is said to crash. A disk drive’s average access time is composed of the time
A crash can damage part of the track and this track must be required to step to the desired track (seek time), the time taken
labeled bad and the lost data rewritten from a back-up copy for the disk to rotate so that the sector to be read is under the
of the file. head (latency), the time for the head to stop vibrating when it
The disk controller (i.e. the electronic system that controls reaches a track (settle time), and the time taken to read the data
the operation of a disk drive) specifies a track and sector and from a sector (read time). We can represent access time as
either reads its contents into a buffer (i.e. temporary store) or
taccess tseek tlatency tsettle tread
writes the contents of the buffer to the disk. Some call a disk
drive a random access device because you can step to a given The average time to step from track to track is difficult
track without first having to read the contents of each track. to determine because the modern voice coil actuated head
Disk drives are sequential access devices because it is neces- doesn’t move at constant velocity and considerations such as
sary to wait until the desired sector moves under the head head settling time need to be taken into account. Each seek
before it can be read. consists of four distinct phases:
●
acceleration (the arm is accelerated until it reaches
12.7.1 Disk drive operational approximately half way to its destination track)
parameters ●
coasting (after acceleration on long seeks the arm moves at
its maximum velocity)
Disk drive users are interested in three parameters: the total ●
deceleration (the head must slow down and stop at its
capacity of the system, the rate at which data is written to or
destination)
read from the disk, and its average access time. In the late
1990s typical storage capacities were 14 Gbytes, data rates
●
settling (the head has to be exactly positioned over the
were several Mbytes/s and average access times from 8 ms to desired track and any vibrations die out).
12 ms. By the end of the century, data densities had reached Designing head-positioning mechanisms isn’t easy. If you
10 Gbits/in2 and track widths of the order of 1 m. In 2004 make the arm on which the head is mounted very light to
528 Chapter 12 Computer memory
improve the head assembly’s acceleration, the arm will be too (the latency). The latency is easy to calculate. If you assume
flimsy and twist. If you make the arm stiffer and heavier, it that the head has just stepped to a given track, the minimum
will require more power to accelerate it. latency is zero (the sector is just arriving under the head).
The average number of steps per access depends on the The worst case latency is one revolution (the head has just
arrangement of the data on the disk and on what happens to missed the sector and has to wait for it to go round). On aver-
the head between successive accesses. If the head is parked at the age, the latency is 1/2 trev, where trev is the time for a single rev-
periphery of the disk, it must move further on average than if it olution of the platter. If a disk rotates at 7200 rpm, its latency
is parked at the center of the tracks. Figure 12.40 shows a file is given by
composed of six sectors arranged at random over the surface of 1
1/(720060) 0.00417 s 4.17 ms
the disk. Consequently, the head must move from track to track 2
at random when the file is read sector by sector. An important parameter is the rate at which data is
A crude estimate of the average stepping time is one-third transferred to and from the disk. If a disk rotates at R revolu-
the number of tracks multiplied by the time taken to step tions per minute and has s sectors per track, and each sector
from one track to the adjacent track. This figure is based on contains B bits, the capacity of a track is B s bits. These
the assumption that the head moves a random distance from Bs bits are read in 60/R seconds giving a data rate of
its current track to its next track each time a seek operation Bs/(60/R) BsR/60 bits/s. This is, of course, the actual
is carried out. If the head were to be retracted to track 0 after rate at which data is read from the disk. Buffering the data in
each seek, the average access time would be half the total the drive’s electronics allows it to be transmitted to the host
number of tracks multiplied by the track-to-track stepping computer at a different rate.
time. If the head were to be parked in the middle of the tracks The length of a track close to the center of a disk is less than
after each seek, the average access time would be 1/4 of the that of a track near to the outer edge of the disk. In order to
number of tracks multiplied by the track-to-track stepping maximize the storage capacity, some systems use zoning in
time. These figures are valid only for older forms of actuators. which the outer tracks have more sectors than the inner
Very short seeks (1 to 4 tracks) are dominated by head sett- tracks.
ling time. Seeks in the range 200 to 400 tracks are dominated Modern disk drives must be tolerant to shock (i.e. acceler-
by the constant acceleration phase and the seek time is pro- ation caused by movement such as a knock or jolt). This
portional to the square root of the number of tracks to step requirement is particularly important for disk drives in
plus the settle time. Long seeks are dominated by the constant portable equipment such as laptop computers. Two shock
velocity or coasting phase and the seek time is proportional parameters are normally quoted. One refers to the tolerance
to the number of tracks. to shock when the disk is inoperative and the other to shock
A hard disk manufacturer specifies seek times as minimum while the disk is running. Shock can cause two problems. One
(e.g. 1.5 ms to step one track), average (8.2 ms averaged over is physical damage to the surface of the disk if the head
all possible seeks), and maximum (17.7 ms for a full stroke crashes into it (this is called head slap). The other is damage
end-to-end seek). These figures are for a 250 Gbyte Hitachi to data structures if the head is moved to another track dur-
Deskstar. ing a write operation. Shock sensors can be incorporated in
The access time of a disk is made up of its seek time and the the disk drive to detect the beginning of a shock event and
time to access a given sector once a track has been reached disable any write operation in progress.
12.7 Disk drive principles 529
An important parameter of the disk drive is its mean time A disk with a MTBF of 1 000 000 hours can be expected to
between failure (MTBF), which is the average time between run for over 100 years.
failures. The MTBF ranges from over 1 000 000 hours for
large drives to 100 000 hours for smaller and older drives. A
100 000-hour MTBF indicates that the drive can be expected
12.7.2 High-performance drives
to operate for about 111⁄2 years continually without failure—a Several technologies have been used to dramatically increase
value that is longer than the average working life of a PC. the performance of disk drives. Here we discuss two of them:
530 Chapter 12 Computer memory
PROGRESS
In 1980 IBM introduced the world’s first 1 Gbyte disk drive, In 2002 IBM dropped out of the disk drive
the IBM 3380, which was the size of a refrigerator, weighed market and merged its disk drive division with
550 pounds, and cost $40 000. In 2000 IBM introduced a Hitachi to create HGST (Hitachi Global Storage
1-Gbyte microdrive, the world’s smallest hard disk drive with a Technologies).
platter that’s about the size of an American quarter.
103
First MR head writing information, is integrated
60% CGR
10 2 Corsair with a magnetoresistive structure
First thin film head 35 million X
optimized for reading. Each of the
3375 increase
10 two elements can be optimized to
25% CGR
1
perform its particular function—
HGST disk drive products reading or writing data
10 –1 Industry lab demos A magnetoresistive head operates
HGST disk drives w/AFC
Demos w/AFC in a different way to conventional
10–2 read heads. In a conventional head, a
IBM RAMAC (first hard disk drive)
10–3
change in magnetic flux from the
60 70 80 90 2000 10 disk induces a voltage in a coil. In a
Production year magnetoresistive head, the flux
Figure 12.41 Data density as a function of time (from HGST). modifies the electrical resistance of a
conductor (i.e. more current flows through the conductor
when you apply a voltage across it). Lord Kelvin discovered
magnetoresistive head technology and partial response this phenomenon, called anisotropic magnetoresistance, in
maximum likelihood data demodulation (PRML).2 Figure 12.41 1857. The read element of an MR head consists of a minute
shows the increase in areal densities for IBM disk drives since stripe of a permalloy material (a nickel-iron compound,
1980 and the recent growth rate made possible largely NiFe) placed next to one of the write element’s magnetic pole
through the use of magnetoresistive heads. pieces. The electrical resistance of the permalloy changes by a
few percent when it is placed in a magnetic field. This change
The magnetoresistive head in the material’s resistance allows the MR head to detect the
The ultimate performance of a disk drive using the tradi- magnetic flux transitions associated with recorded bit pat-
tional read head we described earlier is limited because the terns. During a read operation, a small current is passed
recording head has to perform the conflicting tasks of writing through the stripe of resistive material. As the MR stripe is
data on the disk and retrieving previously written data. As the exposed to the magnetic field from the disk, an amplifier
bit patterns recorded on the surface of disks have grown measures the resulting voltage drop across the stripe.
smaller, the amplitude of the signal from the read head has In the 1980s a phenomenon called the giant magnetoresis-
been reduced making it difficult for the drive’s electronics to tive effect was discovered which provided a much greater
identify the recorded bit patterns. You can increases the read sensitivity than the conventional magnetoresistivity. By 1991
signal enough to determine the magnetic pattern recorded on
the disk by adding turns around the magnetic core of the
2
head because the read signal is proportional to the number of A description of PRML is beyond the scope of this book. PRML
encoding places pulses so close together that the data from one pulse
turns. However, increasing turns also increases the head’s contains interference from adjacent pulses. Digital signal processing
inductance—the resistance of a circuit to a change in the algorithms are used to reconstruct the original digital data.
12.7 Disk drive principles 531
SUPERPARAMAGNETISM
Recording density increased by several orders of magnitude spontaneously reverse direction. Reducing a particle’s size can
over a few years. However, such increases cannot continue dramatically increase its tendency to spontaneously change
because of the physical limitations of magnetic materials. state. According to an IBM paper, halving the size of particles
Suppose we decide to scale magnetic media down and make can change the average spontaneous flux reversal time form
the magnetic particles half their previous size. Halving the size 100 years to 100 ns! When particle sizes are so small that they
of particles increases the areal density by 4 because you halve spontaneously lose their magnetization almost instanta-
both length and width. Halving the size reduces the volume of neously, the effect is called superparamagnetism. The limit
a particle by 8; in turn this reduces the magnetic energy per imposed by superparamagnetism is of the order of 100
particle by a factor of 8. Gbits/in2. Fortunately, techniques involving the use of com-
In a magnetic material, particles are aligned with the plex magnetic structures have been devised to delay the onset
internal field. However, thermal vibrations cause the magnetic of superparamagnetism by at least an order of magnitude of
orientation of the particles to oscillate and some particles can areal density.
Index address
mark Record 1 Record 2
ID ID
address Track Side Sector Sector CRC
address number number length (2 bytes)
address 256 bytes user data CRC
(2 bytes)
mark mark
12.7.5 Organization of data on disks valid patterns, uniquely identifiable bit patterns can be cre-
ated to act as special markers. Such special bit patterns are
Having described the principles of magnetic recording sys- created by omitting certain clock pulses.
tems we now briefly explain how data is can be arranged on a The sectors following the index gap are made up of an ID
disk. This section provides an overview but doesn’t describe a (identification) address mark, an ID field, a gap, a data field,
complete system in detail. Although there is an almost infi- and a further gap. The ID field is 7 bytes long including
nite number of ways in which digital data may be organized the ID address mark. The other 6 bytes of the address field
or formatted on a disk, two systems developed by IBM have are the track number, the side number (0 or 1), the sector
become standard: the IBM 3740-compatible single-density address, the sector length code, and a 2-byte cyclic redun-
recording and the IBM System 34-compatible double-density dancy check (CRC) code. The 16-bit CRC provides a power-
recording. ful method of detecting an error in the sector’s ID field and
A disk must be formatted before it can be used by writing is the 16-bit remainder obtained by dividing the polynomial
sectors along the tracks in order to let the controller know representing the field to be protected by a standard generator
when to start reading or writing information. Formatting polynomial.
involves writing a series of sector headers followed by empty The beginning of the data field itself is denoted by one of
data fields that can later be filled with data as required. two special markers: a data address mark or a deleted data
Figure 12.42 describes the structure of a track formatted address mark (these distinguish between data that is active
according to the IBM 34 format double-density system. Gaps and data that is no longer required). Following the data
are required between data structures to allow for variations in address mark comes a block of user data (typically 128 to
the disk’s speed and time to switch between read and write 1024 bytes) terminated by a 16-bit CRC to protect the data
operations. The disk drive is a mechanical device and doesn’t field from error. The data field is bracketed by two gaps to
rotate at an exactly constant speed. Consequently, the exact provide time for the write circuits in the disk to turn on to
size of a sector will be slightly different each time you write it. write a new data field and then turn off before the next sector
Second, the drive electronics needs a means of locating the is encountered. Gap 2 must have an exact size for correct
beginning of each sector. operation with a floppy disk controller, whereas gaps 1, 3, and
A track consists of an index gap followed by a sequence of 4 are simply delimiters and must only be greater than some
sectors. The number and size of sectors varies from operating specified minimum.
system to operating system. Each sector includes an identity
field (ID field) and a data field. The various information units
on the disk are separated by gaps. A string of null bytes is Disk data structures
written at the start of the track followed by an index address The large-scale structure of information on disks belongs
mark to denote the start of the current track. The address to the realm of operating systems. However, now that
mark is a special byte, unlike any other. We’ve already seen we’ve come so far we should say something about files.
that the MFM recording process uses a particular algorithm Conceptually, we can imagine that a filing system might
to encode data. That is, only certain recorded bit patterns are require three data structures: a list of sectors available to the
valid. By deliberately violating the recording algorithm and filing system (i.e. the free sectors), a directory of files, and the
recording a bit pattern that does not conform to the set of files themselves.
534 Chapter 12 Computer memory
PROBLEM
A 31⁄2-inch floppy disk drive uses two-sided disks and records (a) Total capacity sides tracks sectors
data on 80 tracks per side. A track has nine sectors and each bytes/sector 2 80 9 512 737 280 bytes
holds 512 bytes of data. The disk rotates at 360 rpm, the (called 720 Kbyts).
seek time is 10 ms track to track, and the head settling time (b) Average rotational latency 1⁄2 period of revolution
is 10 ms. From the above information calculate the 360 rpm corresponds to 360/60 6 revolutions per
following. second one revolution 1/6 second
(a) The total capacity of the floppy disk in bytes. average latency is therefore 1/12 second 83.3 ms.
(b) The average rotational latency. (c) Average time to locate sector latency head settling
time seek time 83.3 ms 10 ms 80/2 10 ms
(c) The average time to locate a given sector assuming that
the head is initially parked at track 0. (d) In one revolution (1/6 second), nine sectors pass under
the head. Therefore, time to read one sector is
(d) The time taken to read a single sector once it has been
1/6 1/9 18.52 ms.
located.
(e) During the reading of a sector, 512 bytes are read in
(e) The average rate at which data is moved from the disk to
18.52 ms. The average data rate is the number of bits
the processor during the reading of a sector. This should
read divided by the time
be expressed in bits per second.
taken (512 8)/0.01852 221 166 bits/s.
(f) The packing density of the disk in terms of bits per inch
(f) Packing density total number of bits divided by track
around a track located at 3 inches from the center.
length 9 512 8/(2 3.142 1.5) 1955.4 bits/in.
Free sector list You can recover so-called deleted files as long
in track 0 sector 1 Each bit in the list determines
whether the corresponding as they haven’t been overwritten since they
sector is free or has been allocated were removed from the directory and their sec-
Sector 1 Sector 2 Sector 3 tors returned to the pool of free sectors.
1 1 1 1 0 1 1 1 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 There’s little point in storing data on a disk
unless it can be easily accessed. To achieve this
objective, a data structure called a directory
Track 1 sector 2 holds information about the nature of each file
allocated Track 1 sector 6 Three contiguous and where the file can be found. Information
unallocated unallocated sectors
in directories varies from the file name plus the
location of the first sector of the file to an
Figure 12.43 Free sector list.
extensive description of the file including attributes such as
file ownership, access rights, date of creation, and date of last
The free sector list access.
A simple method of dealing with the allocation of sectors to The sectors of a file can be arranged as a linked list in which
files is to provide a bit-map (usually in track 0, sector 1). Each each sector contains a pointer to the next sector in the list as
bit in the bit-map represents one of the sectors on the disk Fig. 12.44 demonstrates. The final sector contains a null
and is clear to indicate a free sector and set to indicate an pointer because it has no next sector to point to. Two bytes are
allocated sector. Free means that the sector can be given to a required for each pointer; one for the track number and one
new file and allocated means that the sector already belongs to for the sector number. The advantage of a linked list is that
a file. If all bits of the bit-map are set, there are no more free the sectors can be randomly organized on the disk (random-
sectors and the disk is full. Figure 12.43 illustrates the free ization occurs because new files are continually being created
sector list. and old files deleted).
Suppose the disk file manager creates a file. It first searches Linked lists create sequential access files rather than ran-
the bit-map for free sectors, and then allocates the appropri- dom access files. The only way of accessing a particular sector
ate number of free sectors to the new file. When a file is in the file is by reading all sectors of the list until the desired
deleted, the disk file manager returns the file’s sectors to the sector is located. Such a sequential access is, of course, highly
pool of free sectors simply by clearing the corresponding bits inefficient. Sequential access files are easy to set up and a
in the bit-map. The sectors comprising the deleted file are not sequential file system is much easier to design than one that
overwritten when the file is deleted by the operating system. caters for random access files.
12.7 Disk drive principles 535
Sector cluster
Cluster 4 Cluster 5 Cluster 6 Cluster 7 Cluster 8 Cluster 9
3
4 5 End
5 6
6 8
7 9
8 7 (c) Logical arrangement of clusters.
9 FFFF
A Cluster 4 Cluster 5 Cluster 6 Cluster 7 Cluster 8 Cluster 9
As time passes and files are created, modified, and deleted, Each entry in the FAT corresponds to an actual cluster of
files on a disk may become very fragmented (i.e. the locations sectors on the disk. Figure 12.45 illustrates the structure of a
of their sectors are, effectively, random). Once the sectors of a FAT with entries 4 to 9 corresponding to clusters 4 to 9.
file are located at almost entirely random points on the disk, Assume that a file starts with cluster number 4 and each clus-
disk accesses become very long because of the amount of ter points to the next cluster in a file. The FAT entry corres-
head movement required. Defragmentation programs are ponding to the first cluster contains the value 5, which
used to clean up the disk by reorganizing files to make indicates that the next cluster is 5. Note how entry 6 contains
consecutive logical sectors have consecutive addresses. We the value 8 indicating that the cluster after 6 is 8. Clusters
now briefly describe the structure of the filing system used by aren’t allocated sequentially, which leads to the fragmenta-
MS-DOS. tion we described earlier. Figure 12.45(b) shows the physical
sequence of clusters on the disk corresponding to this FAT.
The MS-DOS file structure We have used lines with arrows to show how clusters are con-
MS-DOS extends the simple bit-map of Fig. 12.43 to a linked nected to each other. Figure 12.45(c) shows how the operat-
list of clusters, where each cluster represents a group of sec- ing system sees the file as a logical sequence of clusters.
tors. DOS associates each entry in a file allocation table (FAT), Cluster 9 in the FAT belonging to this file contains the
with a cluster of two to eight sectors (the size of the clusters is value FFFF16, which indicates that this is the last cluster in a
related to the size of the disk drive). Using a cluster-map file. Another special code used by DOS, FFF716, indicates that
rather than a bit-map reduces both the size of the map the corresponding cluster is unavailable because it is
and the number of times that the operating system has to damaged (i.e. the magnetic media is defective). The FAT is
search the map for new sectors. However, the cluster-map set up when the disk is formatted and defective sectors
increases the granularity of files because files are forced to are noted. The first two entries in a file allocation table
grow in minimum increments of a whole cluster. If sectors provide a media descriptor that describes the characteristics
hold 1024 bytes, four-sector clusters mean that the minimum of the disk.
increment for a file is 4 1024 bytes 4 Kbytes. If the disk When MS-DOS was designed, 12-bit FAT entries were suf-
holds many files, the total wasted space can be quite large. ficient for disks up to 10 Mbytes. A 16-bit FAT, called FAT16,
536 Chapter 12 Computer memory
can handle entries for disks above 10 Mbytes. The maximum 12.8 Optical memory technology
size disk that can be supported is the number of clusters
multiplied by the number of sectors per cluster multiplied Optical storage is the oldest method of storing information
by the number of bytes per sector. Because the FAT16 system known to humanity. Early systems employed indentations in
supports only 65 525 clusters, the maximum disk size is lim- stone or pottery that were eventually rendered obsolete by
ited (assuming a limit on the size of sectors and clusters). flexible optical storage media such as papyrus and later
These figures demonstrate how rapidly the face of computing paper.
changed in the 1990s—a 10 Mbyte hard disk was once Up to the 1980s, general-purpose digital computers used
considered large, whereas today it’s difficult to find a disk magnetic devices to store information. Optical storage sys-
less than about 80 Gbytes (apart from in some portable tems were not widely used until the 1990s, because it was dif-
systems). The rapid growth in disk capacity forced Microsoft ficult to perform all the actions required to store and to
to adopt a 32-bit FAT with Windows NT and later releases of retrieve data economically until improvements had been
Windows 95. FAT32 allows the operating system to handle made in a wide range of technologies.
disks up to 2 terabytes. The optical disk or CD-ROM dramatically changed sec-
DOS storage media hold four types of data element. The ondary storage technology and made it possible to store large
first element is called the boot record and identifies the oper- quantities of information on a transportable medium at a
ating system, and structure of the disk (number of sectors low cost. A CD-ROM can store over 500 Mbytes of user data
and clusters, size of clusters) and can provide a boot program on one side of a single 120 mm (4.72 in) disk which is equiv-
used when the system is first powered up). Following the boot alent to around 200 000 pages of text. The optical disk is a
sector are two FATs (one is a copy of the other provided for rigid plastic disk, (Fig. 12.46), whose surface contains a long
security). After the FATs a root directory provides the details spiral track. The track is laid down on a clear polycarbonate
of the files. These details include the file name, the file char- plastic substrate inside the disk and is covered with a trans-
acteristics (when created etc.) and the address of the file’s first parent plastic protective layer. Like the magnetic disk, informa-
cluster in the FAT. The remainder of the disk is allocated to tion is stored along a track in binary form. Unlike the
the files themselves.
Top surface
Polycarbonate disk
1.2 mm
125 nm Aluminum reflector
Acrylic protection
lable
Back of disk
r
Laser Laser
cto
r
cto
beam beam
cte
cte
De
De
1.6 microns
In phase
In phase
Out of phase
Land
Land
Pit λ/4 Figure 12.50 Organization of the track with land/pits.
Polycarbonate
beam travels through the plastic material of the disk its
Figure 12.49 Light reflected from a CD. wavelength is reduced to 500 nm.
The sizes of the pits are such that half the energy of the spot
falls on a pit and half falls onto the land. The reflected energy
as bright (not twice as bright because the beam’s energy is the is ideally zero if the light from the pits and land interfere
square of its amplitude). However, if the beams are 180 out destructively. The optimum separation of the pits is deter-
of phase with one wave going up as the other goes down, the mined by the wavelength of the light used by the laser.
waves will cancel and the spot will disappear. The data stored on the CD-ROM has to be encoded to
When light from the laser hits the land (i.e. the area achieve both maximum storage density and freedom
between the pits), the light is reflected back and can be from errors. Moreover, the encoding technique must be self-
detected by a sensor. When the light from the laser hits a pit, clocking to simplify the data recovery circuits. Figure 12.51
about half falls on the pit and the other half on the land illustrates the basic encoding scheme chosen by the designers
around the pit (see Fig. 12.49). The height of the pit is of the CD-ROM. The length of the pits themselves is modu-
approximately 0.13 m above the surrounding land so that lated and the transition of a pit to land (or from land to pit)
light that hits the land has to travel an extra 2 0.13 m fur- represents a one bit.
ther to get to the detector. However, 0.13 m corresponds to The source data is encoded so that each 8-bit byte is
1/4 of the wavelength of the light in the plastic medium and transformed into a 14-bit code. Although there are 214
the light reflected back from around a pit travels 1/2 wave- 16 384 possible 14-bit patterns, only 28 256 of these
length further than the light reflected from the top of a pit. patterns are actually used. The encoding algorithm chooses
The light from the pit and light reflected from the surround- 14-bit code words that do not have two consecutive 1s sepa-
ing land destructively interfere and the light waves cancel rated by less than two 0s. Moreover, the longest permitted run
each other out. A change in the level of light intensity of 0s is 10. These two restrictions mean that the 14-bit code
reflected from the surface of the disk represents a change has 267 legal values, of which 256 are actually used. The
from land to pit or from pit to land. Figure 12.50 shows the 14-bit codes corresponding to the first 10 8-bit codes are
structure of the pits and land in more detail. given in Table 12.9.
The spot of laser light that follows a track should be as The groups of 14-bit code words are not simply joined end
small as possible in order to pack as many pits and therefore to end, but are separated by three so-called merging bits. The
data onto the disk as possible. The minimum size of the spot function of the merging bits is to ensure that the encoding
is determined by a number of practical engineering consider- rules are not violated when the end of one group is taken in
ations. The resolution (i.e. the smallest element that can be conjunction with the start of the next. These merging bits
seen) of the optical system is determined by the wavelength of carry no useful data and are simply separators. The following
the laser light (780 nm) and the numerical aperture of the example demonstrates the need for merging bits.
objective lens (0.45). Numerical aperture (NA) is defined as
Source data: . . . . . . . . 0010 1000 . . . . . .
lens diameter divided by the focal length and is a measure of
a lens’s light-gathering power. The value of 0.45 is a compro- These two patterns generate the sequence ..00101000 . . .
mise between resolution and depth of focus. Increasing the Note how the end of the first group and the start of the
resolution and hence storage capacity makes it harder to second group create the forbidden pattern 101 that has a
focus the beam on the disk. These values of wavelength and 1 separated from another 1 by less than two 0s. We can solve
NA provide a minimum resolution of 1 m. Note that there the problem by inserting the separator 000 between the
is sometimes confusion about the wavelength of the laser groups to get
light. The wavelength is 780 nm in air, but when the laser Encoded data: . . . . . . . . . 00100001000 . . . . . . .
12.8 Optical memory technology 539
Data with merging bits 010 00010010000010 000 10010001000010 001 00010000100100 100 00100000100001
Merging bits ensure that coding rules are not violated at group ends
Data stream 001000000000010 0100001001000010000100100010000100010001000010010010000100000100001000
24-bit sync pattern
NRZ encoding
Disk surface
Pits on the disk
Because of the way in which pits are laid down to the very
Source data bits Encoded bits
high mechanical precision required by the system, it’s impos-
0 00000000 01001000100000 sible to avoid large numbers of errors. We therefore have to
1 00000001 10000100000000
employ powerful error-correcting codes to nullify the effect
of the errors. Due to the complexity of these codes, all we can
2 00000010 10010000100000
do here is to describe their characteristics. Note that the
3 00000011 10001000100000
encoding mechanism for data is different to that for audio
4 00000100 01000100000000
information because data requires more protection.
5 00000101 00000100010000 Consequently, a CD stores fewer bytes of used digital data
6 00000110 00010000100000 than audio data.
7 00000111 00100100000000 The Cross Interleaved Reed–Solomon code (CIRC) takes
8 00001000 01001001000000 groups of 24 bytes of data and encodes them into groups of
9 00001001 10000001000000 32 bytes. Information is interleaved (spread out over the sur-
10 00001010 10010001000000 face of a track) so that a burst of errors at one physical loca-
tion affects several code groups. The following hypothetical
Table 12.9 Converting 8-bit values to a 14-bit code. example should clarify this concept. Suppose data is recorded
in groups of 4 bytes a1a2a3a4 b1b2b3b4 c1c2c3c4 d1d2d3d4 and
that a group is not corrupted unless 2 bytes are lost in a group
(because of some form of error-correcting mechanism).
Three 0s (i.e. merging bits) have been inserted between the Because errors tend to occur in groups (because of, say,
two code words to eliminate the possibility of a forbidden a scratch), large amounts of data will be lost. If we interleave
sequence of bits. the bytes, we might get a1b1c1d1 a2b2c2d2 a3b3c3d3 a4b4c4d4. In
Another factor in the choice of the pattern of bits to be this case, if we lose 2 consecutive bytes, we will be able to cor-
used as the merging bits is the need to keep the average rect the error because the bytes are from different groups.
lengths of the track and land along the surface of the tracks One of the differences between the CD used to store audio
equal. This restriction is necessary because the CD drive’s information and the CD-ROM used by computers is that the
focusing and tracking mechanism uses the average energy latter employs an extra layer of encoding to reduce further
reflected from the surface and, therefore, it is necessary to the undetected error rate to one in 1013 bits. Moreover, the
avoid changes in average energy due to data dependency. sophisticated CIRC encoding makes it possible to correct an
The channel clock derived from the signal recovered from error burst of up to 450 bytes (which would take up 2 mm of
the pits and land is 4.3218 MHz because this is the maximum track length). The capacity of a CD-ROM is 553 Mbytes of
rate of change of signal from the pits and land at the standard user data (an audio CD can store 640 Mbytes of sound).
CD scanning speed of 1.3 m/s. The bit density is 1.66 bits/m The spiral track of the CD-ROM is divided into individu-
or 42 kbits/inch. At a track pitch of 1.6 m this corresponds ally addressable sectors. The address of a sector is expressed
to 6 108 bits/in2 or 106 bit/mm2. absolutely with respect to the start of the track and is in the
540 Chapter 12 Computer memory
CD SPEED
The speed at which a CD rotates was determined by the Advances in technology allowed the disks to spin faster to
conflicting requirements of the technology at the time the read data in less time. A CD reader described as 8X can read
CD was first manufactured and the desire to store as much data eight times faster than a standard drive. Modern CDs
data as possible. Because the CD was originally devised to have read speeds of over 50 that of an audio disk, although
store audio information, its duration was set as 74 minutes to such a sustained increase in speed is rarely achieved in
allow von Karajan’s Beethoven’s Ninth Symphony to go on a practice.
single CD. Write speeds and rewrite speeds have also increased; for
First-generation CDs operated in real time; if it took example, in 2004 a writable and re-writable CD drive
74 minutes to play a symphony, the disk took 74 minutes to might be described as 52 32 52; that is, it has a
read. When CDs began to be used to store data, it was not 52X read speed, a 52x write speed, and a 32x rewrite
convenient to wait up to about 74 minutes to load a program. speed.
form of minutes, seconds, and blocks from the start (this for- 12.8.2 Writable CDs
mat is the same as that of the audio CD). A sector or block
is composed of 12 synchronizing bytes (for clock recovery), When the CD drive first appeared, it was a read-only mecha-
a 4-byte header that identifies the sector, a block of 2048 bytes nism. Today, CD drives are available that can write data to a
of user data and 288 auxiliary bytes largely made up of the CD once only (CD-R), or write to CDs that can be erased and
error-correcting code. rewritten (CD-RW).
Because the size of the pits is constant and they are Some CD write mechanisms simply ablate (i.e. blast
recorded along a spiral on a disk, the number of pits per rev- away) the surface of a non-reflecting layer of material above
olution must vary between the inner and outer tracks. a reflecting background to create a pit. Others employ a
Contrast this with the magnetic disk, in which the bit density powerful laser to melt a region of a coating of tellurium to
changes between inner and outer tracks because the bits must create a pit. Another writable disk uses an organic dye within
be smaller on inner tracks if there are to be the same number a layer in the disk. When the dye is hit by a laser during the
as in outer tracks. write operation, the dye’s optical properties are modified.
A consequence of constant-size pits is that the speed of the The write laser has a power of 30 mW, which is about
disk depends on the location of the sector being read (i.e. the six times more powerful than the laser used to read data from
disk moves with a constant linear velocity, rather than a con- a CD.
stant angular velocity). If the pits have a constant length, You can create true read/write optical storage systems
there are more pits around an outer track and therefore the that write data onto the disk, read it, and then erase it in
disk must rotate slowly to read them at a constant rate. As the order to write over it again. Clearly, any laser technology
read head moves in towards the center, the disk must speed that burns or ablates a surface cannot be used in an
up because there are fewer pits around the circumference. erasable system. Erasable CDs exploit two fundamental
First-generation CD-ROMs (and audio CDs) spin at between properties of matter, its optical properties and its magnetic
about 200 and 500 rpm. As you might imagine, this arrange- properties.
ment severely restricts the access time of the system. Figure 12.52 illustrates the principle of an erasable CD.
Moreover, the relatively heavy read head assembly also The CD substrate is prestamped with the track structure and
reduces the maximum track-to-track stepping time. These the track or groove coated with a number of layers (some are
factors together limit the average access time of a CD-ROM for the protection of the active layer). The active layer uses a
to in the region of 100 to 200 ms (an order of magnitude material like terbium iron cobalt (TeFeCo), which changes the
worse than hard disks). We used the expression track-to- polarization of the reflected laser light. The magnetization of
track stepping, even through the track is really a continuous the TeFeCo film determines the direction of the reflected
spiral. When in the seek mode, the head steps across the spi- light’s polarization.
ral and reads an address block to determine whether it has Initially the film in Fig. 12.52(a) is subjected to a uniform
reached the correct part of the spiral. As the technology used magnetic field to align the TeFeCo molecules and therefore
to manufacture CD drives improved through the 1990s, drive provide a base direction for the polarization of the reflected
speeds were increased. Speeds went up from twice to 32 times light. This base can be thought of as a continuous stream of
the nominal CD rotation speed by the end of the 1990s and zero bits. During the write phase (Fig. 12.52(b)) a short
average access times dropped to 80 ms. pulse of laser light hits the surface and heats the film
12.8 Optical memory technology 541
Transfer rate
Burst transfer rate 3.0 Mbytes/s (async., max.)
10.0 Mbytes/s (sync., max.)
(a) All molecules aligned.
Sustained transfer 6.14 Mbytes/s to 3.07 Mbytes/s
rate (9.1 Gbyte/media)
Laser beam 5.84 Mbytes/s to 2.87 Mbytes/s
(8.6 Gbyte/media)
Speed
Access Time 25 ms (avg.)
Applied field Latency 8.3 ms (avg.)
Rotational speed 3600 rpm
Buffer memory 8 Mbytes
(b) Reverse field applied and spot heated by a laser beam.
Reliability
MTBF 100 000 POH
MSBF 750 000 Cycles
MTTR 30 minutes
(c) Laser removed. Direction of magnetization of spot reversed. Bit error rate 1017 bits
Laser beam Laser beam
Table 12.10 Characteristics of a magneto-optical disk drive.
changing its magnetic properties. When the surface is heated High-capacity optical storage and the DVD
up to 300C, the surface reaches its Curie point and loses the Not very long ago, the 600 Mbyte 51⁄4 inch CD-ROM was
magnetization. By simultaneously activating an electromag- the state of the art. Progress in everything from laser tech-
net under the surface of the disk, the direction of the film’s nology to head-positioning to optical technology soon
magnetization can be reversed with respect to the base meant that the CD-ROM was no longer at the cutting edge
direction when the laser is switched off and the material of technology. Like all the other parts of the computer, the
cools. This action creates a 1 state. As the spot cools down CD-ROM has evolved. In the late 1990s a new technology
(Fig. 12.52(c)), the drop in temperature fixes the new called the DVD-ROM (digital versatile disk) appeared. The
direction of magnetization. DVD-ROM has a minimum capacity six times that of a
The disk is read by focusing a weaker polarized beam on CD-ROM and a potential capacity much more than that.
the disk and then detecting whether the reflected beam was Part of the driving force behind the development of the
rotated clockwise or counterclockwise (Fig. 12.52(d)). The DVD-ROM was to put feature-length movies in digital
same laser can be used for both reading and writing; the read video form on disk.
power is much less than the write power. The DVD-ROM looks like a conventional CR-ROM and
To erase a bit, the area that was written to is pulsed once the underlying technology is exactly the same. Only the
again with the high power laser and the direction of parameters have changed. Improvements in optical tracking
the magnetic field from the electromagnet reversed to write have allowed the track spacing to be reduced and hence the
a zero. length of the track to be considerably increased. DVD tracks
High-capacity magneto-optical disks use a rugged are 0.74 m apart (conventional CD-ROMs use 1.6 m
polycarbonate disk platter mounted inside an enclosed car- spacing). Lasers with shorter wavelengths (635 nm) have
tridge that can store over 9 Gbytes of data. Table 12.10 permitted the use of smaller pits.
542 Chapter 12 Computer memory
Feature DVD CD
Writable DVDs
Bit length 0.277 m 0.13 m 2.1
Track pitch 1.6 m 0.74 m 2.16 In principal, writable DVDs are implemented in the same
Data area 8605 mm 2
8759 mm 2
1.02
way as writable CDs. You modify the pit/land structure by a
laser beam that either modifies the disk’s magnetic or optical
Modulation efficiency 17/8 16/8 1.06
properties. Unfortunately, the DVD industry has not settled
Error correction loss 34% 13% 1.32
on a single writable disk standard due to competing eco-
Sector overhead 8.2% 2.6% 1.07
nomic, technical, and political (industrial) pressures. There
Total increase 7 are five basic standards for recordable DVDs: DVD-R,
DVD-RW, DVDR, DVDRW, and DVD-RAM.
Table 12.12 Improvements in the efficiency of DVD in terms of The write-once standards are DVD-R and DVDR. These
data density. use a laser to change the optical properties of a material by burn-
ing pits on a dye layer within the disk. The rewritable standards,
DVD-RW and DVDRW, use a laser to reversibly change the
The DVD-ROM can be double-sided, which instantly optical properties of a material by changing, for example, its
doubles its data capacity. Moreover, by using semitransparent state between crystalline (reflective) and amorphous (dull).
layers, it is possible to have several optical layers within the The DVD-R and DVDR formats compete against each
disk. Focusing the laser on a particular layer accesses data in other; the differences between them lie principally in the struc-
that layer. Other layers are out of focus. ture of the data on the disk. The most widely compatible for-
Just as writable CDs have been developed, it is possible to mat is DVD-R (in particular, it’s compatible with the DVD
buy writable DVDs. Unfortunately, several mutually incom- players and with old DVD drives in PCs). In recent years, many
patible technologies were developed nearly simultaneously DVD drives are able to handle a range of different formats by
forcing consumers to select a particular system (a similar changing the laser properties to suit the actual media and the
situation existed in the early days of the VCR until the VHS read/write software to suit the required data structures.
format swept away its competitors). By 2004 DVD manufac- Another rewritable DVD is DVD-RAM, which uses DVDs
turers were selling drives that were compatible with most of with a pattern pressed onto the surface. DVD-RAM was orig-
the available formats. Table 12.11 describes the differences inally designed for random access data storage by computers.
12.8 Optical memory technology 543
Sector address information is molded into the side of the 12.5 A computer has a 64-bit data bus and 64-bit-wide
track. DVD-RAM is the most incompatible of the DVD memory blocks. The memory devices have an access time of
recordable formats (i.e. fewer drives read DVD-RAM than 35 ns. A clock running at 200 MHz controls the computer and all
other recordable formats). operations take an integral (i.e. whole number) of clock cycles.
All types of DVD reserve a control area at the inside What is the effective bandwidth of the memory system?
__
edge of the track that contains the disk’s identification. This 12.6 What is the purpose of a semiconductor memory’s CS
mechanism allows the drive to identify the type of medium (chip select) input?
currently loaded. 12.7 A dynamic RAM chip is organized as 64M 4 bits.
A memory composed of 1024 Mbytes is to be built with these
chips. If each word of the memory is 64 bits wide, how many
■ SUMMARY
chips are required?
The von Neumann machine needs memory to store programs 12.8 What are the principal characteristics of random access
and data. Lots of memory. As computer technology has and serial access memory?
advanced, the size of user programs and operating systems has
more than kept up. In the early 1980s a PC with a 10 Mbyte 12.9 Why is all semiconductor ROM RAM but not all
hard disk was state of the art. Today, even a modest utility might semiconductor RAM ROM?
require over 10 Mbytes and a high-resolution digital camera is 12.10 If content addressable memory (CAM) could be manufac-
quite happy to create 5 Mbyte files when operating in it RAW tured as cheaply as current semiconductor memories, what impact
(uncompressed) mode. do you think it would have on computers? We haven’t covered
In this chapter we have looked at the computer’s memory CAM in this text—you’ll have to look it up.
system. We began with a description of the characteristics
of fast semiconductor memory and then moved on to 12.11 What is flash memory and why is it widely used to store
characteristics of slower but much cheaper secondary storage. a PC’s BIOS (basic input/output system)?
Today, there is a bewildering number of memory technologies. 12.12 Use a copy of a current magazine devoted to personal
We have briefly covered some of them: from semiconductor computing to work out the cost of memory today (price per
dynamic memory to devices based on magnetism to optical megabyte for RAM, hard disk, flash memory, CD ROM, and
storage technology. Memory technology is important because, DVD).
to a great extent, it determines the way in which we use
computers. Faster CPUs make it possible to process data rapidly, 12.13 Give the size (i.e. the number of addressable locations) of
enabling us to tackle problems like high-speed, real-time graphics. each of the following memory blocks as a power of 2. The blocks
Faster, denser, and cheaper memories make it possible to store are measured in bytes.
and process large volumes of data. For example, the optical disk (a) 4K (b) 16K
makes it possible to implement very large on-line databases. (c) 2M (d) 64K
Low-cost high-capacity hard disks now enable people to carry (e) 16M (f) 256K
more than 80 Gbytes of data in a portable computer or over 12.14 What address lines are required to span
400 Gbytes in a desktop machine. In just two decades the (i.e. address) each of the memory blocks in the previous prob-
capacity of hard disks in personal PCs has increased by a factor lem? Assume that the processor is byte addressable and has 24
of 40 000. address lines A00 to A23. What address lines must be decoded to
select each of these blocks?
■ PROBLEMS 12.15 What is an address decoder and what role does it carry
out in a computer?
12.1 Why is memory required in a computer system?
12.16 A computer’s memory can be constructed from memory
12.2 Briefly define the meaning of the following terms
components of various capacities (i.e. total number of bits) and
associated with memory technology:
organizations (i.e. locations x width of each location). For each
(a) random access
of the following memory blocks, calculate how many of the
(b) non-volatile
specified memory chips are required to implement it.
(c) dynamic memory
(d) access time Memory block Chip organization
(e) EPROM (a) 64 kbytes 8K 8
(f) serial access (b) 1 Mbyte 32K 4
12.3 What properties of matter are used to store data? (c) 16 Mbytes 256K 8
12.4 A computer has a 64-bit data bus and 64-bit-wide 12.17 What is partial address decoding and what are its
memory blocks. If a memory access takes 10 ns, what is the advantages and disadvantages over full address
bandwidth of the memory system? decoding?
544 Chapter 12 Computer memory
12.18 An address decoder in an 8-bit microprocessor with 12.25 Why does data have to be encoded before it can be
16 address lines selects a memory device when address recorded on a magnetic medium?
lines A15, A14, A13, A11 1, 1, 0, 1. What is the size of the
12.26 Explain how data is recorded using PE encoding and draw
memory block decoded and what range of addresses
a graph of the current in the write head generated by the data
does it span (i.e. what are the first and last addresses in this
stream 10101110.
block)?
12.27 A disk is a serial (sequential) access device that can
12.19 An address decoder in a 68K-based microprocessor
implement random access files. Explain this apparent contradic-
selects a memory device when address lines
tion of terminology.
A23, A22, A21, A20 1, 1, 0, 1. What is the size of the memory
block decoded and what range of addresses does 12.28 How do the following elements of a track-seek time
it span (i.e. what are the first and last addresses in affect the optimum arrangement of data on a disk: acceleration,
this block)? coasting, deceleration, and settling?
12.20 Design address decoders to implement each of the 12.29 What is an audio-visual drive and how does it differ from
following 68K address maps. In each case, the blocks of memory a conventional hard drive?
are to start from address $00 0000.
12.30 What are the advantages of the SCSI interface over the
(a) 4 blocks of 64 kbytes using 32K 8-bit chips IDE interface?
(b) 8 blocks of 1 Mbyte using 512K 8-bit chips
(c) 4 blocks of 128 kbytes using 64K 8-bit chips 12.31 What are the limits on ultimate performance of the
following.
12.21 A memory system in a 68K-based computer includes
blocks of ROM, static RAM, and DRAM. The sizes of these three (a) The hard disk.
blocks are (b) The floppy disk.
(c) The CD-ROM.
ROM: 4 Mbytes
SRAM: 2 Mbytes 12.32 What are the operational characteristics of the
DRAM: 8 Mbytes serial access devices found in a PC? Use one or more of the
magazines devoted to the PC to answer this question.
These memory blocks are implemented with the following
memory components: 12.33 An image consists of 64 columns by 64 rows of pixels.
ROM: 1M 16-bit chips Each pixel is a 4-bit 16-level gray-scale value. A sequence of
SRAM: 512K 8-bit chips these images is stored on a hard disk. This hard disk rotates at
DRAM: 4M 4-bit chips 7200 rpm and has 64 1024-byte sectors per track.
(a) Show how the blocks of memory are organized in (a) Assuming that the images are stored sequentially, how fast
terms of the memory devices used to implement can they be transferred from disk to screen?
them. (b) If the images are stored randomly throughout the disk, what
(b) Draw a memory map for this system and indicate the start is the longest delay between two consecutive images if the
and end addresses of all blocks. disk has 1500 tracks and the head can step in or out at a
(c) Draw an address decoding table for this rate of one track per millisecond.
arrangement.
(d) Design an address decoder for this system using 12.34 A hard disk drive has 10 disks and 18 surfaces
simple logic gates logic. available for recording. Each surface is composed of 200 con-
(e) Construct an address decoder using a PROM for this system centric tracks and the disks rotate at 7200 rpm. Each track is
and design a decoding table to show its contents. divided into 8 blocks of 256 32-bit words. There is one
read/write head per surface and it is possible to read the 18
12.22 A computer’s memory system is invariably tracks of a given cylinder simultaneously. The time to step from
non-homogeneous. That is, it is made up of various types of track to track is 1 ms (103 s). Between data transfers the head
storage mechanism, each with its own characteristics. is parked at the outermost track of the disk.
Collectively, these storage mechanisms are said to form a
Calculate the following.
memory hierarchy. Explain why such a memory hierarchy is
necessary, and discuss the characteristics of the memory (a) The total capacity in bits of the disk drive.
mechanisms that you would find in a modern (b) The maximum data rate in bits/second.
high-performance PC. (c) The average access time in milliseconds.
(d) The average transfer rate when reading 256 word blocks
12.23 In the context of memory systems, what is the meaning located randomly on the disk.
of hysteresis? (e) If the disk has a 3-inch diameter and the outermost track
12.24 Can you think of any examples of the effects of comes to 1 inch from the edge of the disk
hysteresis in everyday life? calculate the recording density (bits/in) of the
12.8 Optical memory technology 545
innermost and the outermost tracks. The track 12.44 What are the advantages of a magnetoresistive head
density is 200 tracks/in. over a thin-film head?
12.35 Derive an expression for the average distance moved by 12.45 Use the Internet to find the properties of today’s large
a head from one cylinder to another (in terms of the number of hard disk drives.
head movements). Movements are made at random and the
12.46 SMART technology is used to predict the failure of a
disk has N concentric cylinders
hard disk. To what extent can this technology be applied to
numbered from 0 to N1 with the innermost cylinder num-
other components and subsystems in a computer?
bered 0. Assume that when seeking the next
cylinder, all cylinders have an equal probability of 12.47 The speed (access time) of semicouductor devices has
being selected. Show that the average movement approaches increased dramatically over the past few decades. However, the
N/3 for large values of N. Hint: Consider the Kth cylinder and access time of hard discs has failed to improve at the same rate.
calculate the number of steps needed to move to the Jth cylin- Why is this so?
der where J varies from 0 to (N1).
12.48 A magnetic tape has a packing density of 800 characters
12.36 A floppy disk drive has the following parameters: per inch and an interblock gap of 1⁄2 inch and is filled with
sides: 2 records. Each contains 400 characters. Calculate the fraction of
tracks: 80 the tape containing useful data if the records are written as
sectors/track: 9 (a) single record blocks
bytes/sector: 1024 (b) blocks containing four records
rotational speed: 360 rpm
track-to-track step time: 1 ms 12.49 Data is recorded on magnetic tape at 9600 bpi along
each track of nine-track tape. Information is organized as blocks
Using the above data, calculate the following. of 20 000 bytes and an interblock gap of 0.75 in is left between
(a) total capacity of the disk. blocks. No information is recorded in the interblock gaps. What
(b) average time to locate a sector. is the efficiency of the storage system?
(c) time to read a sector once it has been located.
(d) data transfer rate during the reading of a 12.50 An engineer proposes to use a video recorder (VCR) to
sector. store digital data. Assume that the useful portion of each line
can be used to store 256 bits. What is the storage capacity of a
12.37 Why does a floppy disk have to be formatted before data 1-hour tape (in bits), and at what rate is data transferred? A TV
can be written to it? How do you think that sector size affects picture is transmitted as 525 lines, repeated 30 times per sec-
the performance of a disk system? ond in the USA and 625 lines, repeated 25 times per second in
the UK.
12.38 What is a CRC?
12.51 Do standards in memory technology help or hinder
12.39 Several books state that if you get the interleave factor
progress?
of a disk wrong, the operating system’s performance will be dra-
matically degraded. Why? 12.52 Does magnetic tape have a future as a secondary storage
medium?
12.40 What are the advantages of MS-DOS’s file allocation
table (FAT) over the free-sector bit-map and linked list 12.53 What are the relative advantages and disadvantages of
of sectors. magnetic and optical storage systems?
12.41 Interpret the meaning of the following extract from a FAT. 12.54 Why is a laser needed to read the data on a CD-ROM?
1 2 12.55 Why is it relatively harder to write data on a CD than to
2 4 read it?
3 7
4 FFFF 12.56 Discuss the ethics of this argument: Copying software
5 6 ultimately benefits the manufacturer of the copied
6 8 software, because it creates a larger user base for the software
7 5 and, in turn, creates new users that do pay for the software.
8 FFFF
12.57 Data is recorded along a continuous spiral on a CD-ROM.
9 FFF7
Data is read from a CD-ROM at a constant bit rate (i.e. the num-
12.42 Why are gaps required when a data structure is set up on ber of bits/s read from the CD-ROM is constant). What implica-
a floppy disk during formatting? tions do you think that this statement has for both the designer
and the user of a CD-ROM?
12.43 Why are error-detecting systems so important in
secondary storage systems (in comparison with primary storage 12.57 A disk platter has bit density of 1000 bits/mm2. Its
systems)? innermost track is at a radius of 2 cm, its outermost
546 Chapter 12 Computer memory
track at a radius of 5 cm. What is the total capacity of 12.60 When a ‘1’ is recorded on a disk drive and the analog
the disk if we assume a uniform bit density and no data over- signal read back from the read head, the resulting sampled
head? signal is 0.0, 0.4, 1.0, 0.4, 0.0, where the signal is sampled every T
12.59 How fast does a hard disk have to rotate in order seconds. If a ‘0’ is recorded, the sampled signal is
for its stored energy to be equivalent to its own weight in TNT? 0.0, 0.4, 1.0, 0.4, 0.0. Suppose the binary string 011010 is
Assume a 31⁄2-inch aluminum platter. Note: the energy density written to the disk and each bit is transmitted at T-second
of TNT is 2175 J/g and the energy of rotation of a disk is 1⁄2I2 intervals. What signal would be read back from the disk
and I mr2 where m is the disk’s mass, r its radius, and its corresponding to 011010 if the signal were sampled every T
rotational velocity in radians per second. seconds?
The operating system 13
CHAPTER MAP
INTRODUCTION
We now look at one of the most important components of a modern computer, the operating
system. Operating systems can be very large programs indeed (e.g. 100 Mbytes). Some might argue
that a section on operating systems is out of place in an introductory course on computer hardware.
We include this topic here for two reasons. First, the operating system is intimately connected with
the hardware that it controls and allocates to user programs. Second, some students may not
encounter the formal treatment of operating systems until later in their studies.
We begin with an overview of operating systems and then concentrate on three areas in which
hardware and software overlap: multitasking, exception handling, and memory management.
Multitasking permits a computer to run several programs at the same time. Exception handling
is concerned with the way in which the operating system communicates with user applications
and external hardware. Memory management translates addresses from the computer into the
actual addresses of data within the CPU’s memory system.
Before continuing, we need to make a comment about terminology. The terms program and job
are used synonymously in texts on operating systems and mean the same thing. Similarly, the
terms task and process are also equivalent. A process (i.e. task) is an instance of a program that
includes the code, data, and volatile data values in registers. The ability of a computer to execute
several processes concurrently is called multitasking or multiprogramming. However, the term
multiprocessing describes a system with several processors (CPUs) that run parts of a process in
parallel.
13.1 The operating system orchestra. The great conductor is a celebrity who gets invited to
take part in talk shows on television and is showered with soci-
The relationship between an operating system and a computer ety’s highest awards.And yet the conductor doesn’t add a single
is similar to the relationship between a conductor and an note to a concert. The importance of conductors is well
548 Chapter 13 The operating system
known—they co-ordinate the players. A good conductor Figure 13.1 shows how the components of the operating
knows the individual strengths and weaknesses of players and system relate to each other and to the other programs that
can apply them in such a way as to optimize their collective run under the operating system’s supervision. The diagram is
performance. depicted as a series of concentric circles for a good reason—
An operating system is the most important piece of software programs in the outer rings use facilities provided by pro-
in a computer system. Its role is to co-ordinate the functional grams in the inner rings. At the center of the circle lies the
parts of the computer to maximize the efficiency of the system. scheduler, which switches from one task to another in a multi-
We can define efficiency as the fraction of time for which the tasking environment. The scheduler is smaller than pro-
CPU is executing user programs. It would be more accurate if grams in the outer ring such as database managers and word
we were to say that the operating system is designed to remove processors. A scheduler is often said to be tightly coded
inefficiency from the system. Suppose a program prints a docu- because it uses a small amount of code optimized for speed.
ment. While the printer is busy printing the document, the Sophisticated operating systems employ hardware and
CPU is idling with nothing to do. The operating system would software mechanisms to protect the important inner rings
normally intervene to give the CPU something else to do while from accidental or illegal access by other components. If a
it’s waiting for the printer to finish. user task corrupts part of the kernel, the operating system
A second and equally important role of the operating system may crash and the system halts.
is to act as the interface between the user and the computer. Not all computers have an operating system. A computer
Programmers communicated with first-generation operating used as a controller in, for example, a digital camera may not
systems via a job control language (JCL), which looked rather need an operating system (although complex embedded
like any other conventional computer language. Today’s oper- controllers do have operating systems). Whenever functions
ating systems such as Microsoft’s Windows provide an inter- normally performed by an operating system are required,
face that makes use of a WIMP (windows, icons, mouse, and they are incorporated into the program itself.
pointer) and GUI (graphical user interface) environment.
From the user’s point of view an operating system should
behave like the perfect bureaucrat; it should be efficient, help-
13.1.1 Types of operating system
ful, and, like all the best bureaucrats, should remain in the Operating systems can be divided into categories: single-
background. A poorly designed operating system, when user, batch mode, demand mode, real-time, and
asked to edit a file, might reply ‘ERROR 53’. A really good client–server. The distinction between operating system
operating system would have replied, ‘Hi. Sorry, but my disk classes can be vague and a real operating system may have
is full. I’ve noticed you’ve got a lot of backup copies, so if you attributes common to several classes. We now briefly
delete a couple I think we’ll be able to find room for your file. describe the various types of operating system (although the
Have a nice day’. Raphael A. Finkel, in his book An Operating modern high-performance PC and the local area network
System’s Vade Mecum’ (Prentice-Hall, 1988) calls this aspect have rendered some of them obsolete).
of an operating system the beautification principle, which he The single-user operating system (e.g. MS-DOS) allows
sums up by ‘. . . an operating system is a collection of algo- only one user or process to access the system at a time. First-
rithms that hides the details of the hardware and provides a generation mainframe operating systems worked in a batch-
more pleasant environment’. mode. Jobs to be executed were fed into the computer,
originally in the form of punched cards. Each user’s program
began with job control cards telling the operat-
ing system which of its facilities were required.
The operating system scheduled the jobs accord-
User applications ing to the resources they required and their pri-
Disk file manager ority, and eventually generated an output.
Batch mode operation is analogous to a dry
cleaning service. Clothes are handed in and are
Scheduler picked up when they’ve been cleaned. Batch-
mode operating systems accepted jobs on
punched card (or magnetic tape). The disadvant-
Operating system interface age of batch mode systems is their lengthy turn
around time. It was frustrating in the 1970s to
wait 5 hours for a printout only to discover that
the job didn’t run because of a simple mistake in
Figure 13.1 Hierarchical model of an operating system. one of the cards.
13.1 The operating system 549
MS-DOS
When IBM was creating the PC in 1890, Bill Gates was MS-DOS was released as MS-DOS 1.0 in 1981 and devel-
approached to supply a simple operating system. Bill oped by stages to become MS-DOS 6.22 in 1994. Future
Gates came up with MS-DOS (Microsoft Disk Operating developments were not necessary because the market for a
System), which was loosely based on a first-generation command line operating system dried up when graphical
microprocessor operating system called CP/M. MS-DOS operating systems like Windows became available.
allowed you to create, list, delete, and manipulate files The final version of DOS, 7.0, was released in 1995 when it
on disk. was incorporated in Windows 95.
Demand mode operating systems allow you to access system that responds to interrupts generated by external
the computer from a terminal, which was a great improve- events.
ment over batch mode operation because you can complete Real-time operating systems are found wherever the
each step before going on to the next one. Such an response time of the computer must closely match that of
arrangement is also called interactive because the the system it is controlling. Real-time operating systems are
operating system and the user are engaged in a dialogue. so called because the computer is synchronized with what
Each time the user correctly completes an operation, they people call clock time. Other operating systems operate in
are informed of its success and invited to continue by computer time. A job is submitted and its results delivered
some form of prompt message. If a particular command after some elapsed time. There is no particular relationship
results in an error, the user is informed of this by the operat- between the elapsed time and the time of day. The actual
ing system and can therefore take the necessary corrective elapsed time is a function of the loading of the computer
action. and the particular mix of jobs it is running. In a real-time
Real-time operating systems belong to the world of system the response time of the computer to any stimulus is
industrial process control. The primary characteristic of a guaranteed.
real-time operating system is that it must respond to an Modern multimedia systems using sound and video are
event within a well-defined time. Consider a computer- also real-time systems—not least because a pause in a video
controlled petrochemical plant. The conditions at many clip while the computer is carrying out another process is
parts of the plant are measured and reported to the computer most disconcerting. Real-time operating system technology
on a regular basis. Control actions must be taken as condi- has had a strong influence on the way in which processors
tions in the plant change; for example, sudden build-up of have developed; for example, Intel’s multimedia extensions
pressure in a reaction vessel cannot be ignored. The com- (MMX) added special-purpose instructions to the Pentium’s
puter running the plant invariably has a real-time operating instruction set to handle video and sound applications.
550 Chapter 13 The operating system
Some modern operating systems are called client–server players share the illusion that they have a single opponent of
and run on distributed systems. A client–server system may their own.
be found in a university where each user has their own com- The organization of a game of simultaneous chess can
puter with a CPU, memory, and a hard disk drive, linked to a readily be applied to the digital computer. All we need is a
server by a local area network. Processes running on one of periodic signal to force the CPU to switch from one job to
the terminals are called client processes and are able to make another and a mechanism to tell the computer where it was
requests to the server. The operating system is distributed up to when it last executed a particular job. The jobs are
between the client and the server. A client on one host can use referred to as tasks or processes and the concept of executing
the resources of a server on another host. several processes together is called multiprogramming or
multitasking. A process is a program together with its associated
program counter, stack, registers, and any resources it’s using.
13.2 Multitasking Before we look at how multitasking is implemented we dis-
cuss some of its advantages. If each process required only CPU
Multitasking is the ability of a computer to give the impres- time, multitasking would have little advantage over running
sion that it can handle more than one job at once. A computer processes consecutively (at least in terms of the efficient use of
cannot really execute two or more programs simultaneously, resources). If we re-examine simultaneous chess, we find that
but it can give the impression that it is running several pro- its success is based on the great speed of the master player
grams concurrently. The following example demonstrates when compared with that of their opponents. While each
how such an illusion is possible. player is laboriously pondering their next move, the master
Consider a game of simultaneous chess where a first-class player is busy making many moves.
player is pitted against several weaker opponents by stepping A similar situation exists in the case of computers. While
from board to board making a move at a time. As the master one task is busy reading information from a disk drive and
player is so much better than their opponents, one of the loading it into memory or is busy printing text on a printer,
master’s moves takes but a fraction of the time they take. The another task can take control of the CPU. A further advantage
HISTORY OF WINDOWS
Microsoft’s first version of Windows, version 1.0, was released 2000) used underlying 32-bit binary code rather than the
in 1985. This was a graphical version of Microsoft’s command- 16-bit code of DOS and earlier versions of Windows. Windows
line MS-DOS operating system. Version 2.0 appeared in 1987 NT also introduced the NTFS file system that was far more
and provided better windows management facilities. sophisticated and reliable than the FAT system used by
Windows 3.0 was released in 1990 and was Microsoft’s first MS-DOS and early versions of Windows.
really successful GUI-based operating system. This version Windows XP was launched in 2001 and brought together
made better use of the processor’s memory management Microsoft’s previous two lines (Windows 98 aimed at the PC
mechanisms. user and Windows 2000 aimed at the corporate user).Windows
Windows 95 and 98 (released in 1995 and 1998 respect- XP may be thought of as a mature version of Windows in the
ively) continued the development of Microsoft’s GUI techno- sense that it used true 32-bit code, supported the NTFS file
logy. Changes were incremental rather than revolutionary; for management mechanism, and provided extensive support for
example,Windows 95 provided support for long file names multimedia applications, new high-speed interfaces, and local
(rather than the old ‘6.3’ DOS format, which restricted names area networks.There are two versions of XP.The standard ver-
to six characters).Windows 98 provided a better integration of sion, XP home, is intended for the small-scale user and XP pro-
operating system and Internet browser as well as the beginning fessional is aimed at the corporate and high-performance user.
of support for peripherals such as the universal serial bus, USB. XP professional supports remote processing (using the system
Microsoft released Windows ME (Millennium Edition) in 2000. via an Internet connection) and symmetrical multiprocessing
This was the end of the line for Microsoft’s operating systems (systems with more than one CPU).
that began with Windows 3.0. ME provided further modest The Windows operating system became the target of many
improvements to Windows 98 such as the incorporation of a malware writers (malware includes computer viruses, worms,
media player and a system restore mechanism. ME was regarded Trojan horses, and spyware). Microsoft has had to keep up
as unstable and bug ridden and was not a significant success. with malware writers by continually releasing updates to close
Microsoft launched a separate range of graphical operating the loopholes exploited by, for example, virus writers. In late
systems; first NT in 1993 (New Technology) and then 2004 Microsoft released its Service pack 2 to update XP. This
Windows 2000. These were regarded as professional operating service pack included a firewall to prevent illegal access from
systems, targeted at corporate users. NT (and later Windows the Internet.
13.2 Multitasking 551
of multiprogramming is that it enables several users to gain capable of autonomous operation. That is, they must
access to a computer at the same time. either be able to take part in DMA (i.e. direct memory
Consider two processes, A and B, each of which requires a access) operations without the active intervention of the
several different activities to be performed during the course CPU, or they must be able to receive a chunk of high-
of its execution (e.g. video display controller, code execution, speed data from the CPU and process it at their leisure.
disk access, etc.). The sequence of activities carried out by
One of the principal problems a complex multitasking
each of these two processes as they are executed is given in
operating system has to overcome is that of deadlock. Suppose
Fig. 13.2. Note that VDT1 and VDT2 are two displays.
process A and process B both require CPU time and a printer
If process A were allowed to run to completion before
to complete their activity. If process A has been allocated the
process B were started, valuable processing time would be
CPU and the printer by the operating system, all is well and
wasted while activities not involving the CPU were carried
process B can proceed once process A has been completed.
out. Figure 13.3 shows how the processes may be scheduled to
Now imagine the situation that occurs when process A
make more efficient use of resources. The boxes indicate the
requests both CPU time and the printer but receives only the
period of time for which a given resource is allocated to a par-
CPU, and process B makes a similar request and receives the
ticular process. For example, after process A has first used the
printer but not the CPU. In this situation both processes have
CPU, it accesses the disk. While the disk is being accessed by
one resource and await the other. As neither process will give
process A, process B can use the processor.
up its resource, the system is deadlocked and hangs up indef-
The fine details of multiprogramming operating systems
initely. Much work has been done on operating system
are beyond the scope of an introductory book. However, the
resource allocation algorithms to deal with this problem.
following principles are involved.
1. The operating system schedules a process in the most
13.2.1 What is a process?
efficient way and makes best use of the facilities available.
The algorithm may adapt to the type of jobs that are run- A task or process is a piece of executable code. Each process
ning, or the operator may feed system parameters into the runs in an environment made up of the contents of the pro-
computer to maximize efficiency. cessor’s registers, its program counter, its status register (SR),
2. Operating systems perform memory management. If and the state of the memory allocated to this process. The
several processes run concurrently, the operating system environment defines the current state of the process and tells
must allocate memory space to each of them. Moreover, the computer where it’s up to in the execution of a process.
the operating system should locate the processes in At any instant a process is in one of three states: running,
memory in such a way as to make best possible use of the runnable, or blocked. Figure 13.4 provides a state diagram for a
memory. The operating system must also protect one task process in a multitasking system. When a process is created,
from unauthorized access to another. it is in a runnable state waiting its turn for execution.
When the scheduler passes control to the process, it is running
3. If the CPU is to be available to one process while another
(i.e. being executed). If the process has to wait for a system
is accessing a disk or using a printer, these devices must be
resource such as a printer before it can continue, it enters the
blocked state. The difference between runnable and blocked is
Process A Process B
simple—a runnable process can be executed when its turn
VDT1 CPU Disk CPU VDT1 VDT2 CPU Disk VDT2 CPU comes; a blocked process cannot enter the runnable state until
time the resources it requires become free.
At the end of the interrupt handling routine an RTE (return current value of the processor status word (containing the
from exception) instruction is executed and the program then CCR) are pushed on the stack. The RTE instruction restores
continues from the point at which it was interrupted. both the program counter and the status word. Consequently,
The 68K’s RTE instruction is similar to its RTS (return from an exception doesn’t affect the status of the processor.
subroutine) instruction. When a subroutine is called, the Suppose now that the interrupt handling routine modifies
return address is pushed on the stack. When an exception the stack pointer before the return from exception is executed.
(i.e. interrupt) is generated, both the return address and the That is, the stack pointer is changed to point at another
process’s volatile portion. When the RTE is executed, the value
of the program counter retrieved from the stack isn’t that
Time allocation complete belonging to the program being executed just before the inter-
rupt. The value of the PC loaded by the return from exception
belongs to a different process that was saved earlier when
another program was interrupted—that process will now be
My turn to run executed.
Runnable Running Figure 13.5 demonstrates the sequence of events taking
place during process switching. Initially process A at the top
of Fig. 13.5 is running. At time T, the program is interrupted
Waiting for a by a real-time clock and control passed to the scheduler in the
Resource becomes resource
available operating system. The arrow from the program to the sched-
Blocked uler shows the flow of control from process A to the operat-
ing system. The scheduler stores process A’s registers,
program counter, and status in memory. Process switching is
Figure 13.4 State diagram of a process in a multitasking system. also called context switching because it involves switching
Process A
time = T
Scheduler
Interrupt Save registers of
current process
Process B
time = T + t
Scheduler
Interrupt Save registers of
current process
The scheduler saves the current
process's context (i.e. volatile
Select next process portion) and invokes a new process.
Restore registers
of next process
Process B
time = T + 2t
Figure 13.5 Switching
Interrupt processes.
13.2 Multitasking 553
from the volatile portion of one process to the volatile a process-switching interrupt has occurred and the contents
portion of another process. The scheduler component of of the program counter and machine status have been
the operating system responsible for switching processes is pushed onto the stack (i.e. A’s stack). For the sake of simpli-
called the first-level interrupt handler. city Fig. 13.6 assumes that all items on the stack occupy a
In Fig. 13.5 an interrupt occurs at T, T t, T 2t, . . . , and single location.
every t seconds switching takes place between processes A and In Fig. 13.6(c) the operating system has changed the con-
B. We have ignored the time required to process the interrupt. tents of the system stack pointer so that it is now pointing at
In some real-time systems, the process-switching overhead is process B’s stack (i.e. the stack pointer is SPB). Finally, in
very important. Fig. 13.6(d) the operating system executes an RTE and process
Figure 13.6 demonstrates how process switching works. Two B’s program counter is loaded from its stack, which causes
processes, A and B, are located in memory. To keep things sim- process B to be executed. Thus, at each interrupt, the operat-
ple, we will assume that the regions of memory allocated to ing system swaps the stack pointer before executing an RTE
these processes do not change during the course of their execu- and a new process is run.
tion. Each process has its own stack, and at any instant the stack A more realistic operating system maintains a table of
pointer may be pointing to either A’s stack or B’s stack. processes to be executed. Each entry in the table is a task con-
In Fig. 13.6(a) process A is running and process A’s stack trol block (TCB), which contains all the information the
pointer SPA is pointing at the top of the stack. In Fig. 13.6(b) operating system needs to know about the process. The TCB
includes details about the process’s priority, its maximum (subject to the limitations of priority). If the process
run time, and whether or not it is currently runnable (as well is not runnable (i.e. blocked), it remains in the computer but
as its registers). is bypassed each time its turn comes. When the process is to
Figure 13.7 illustrates the structure of a possible task con- be run, its run flag is set and it will be executed next time
trol block. In addition to the process’s environment, the TCB round.
contains a pointer to the next TCB in the chain of TCBs; that
is, the TCBs are arranged as a linked list. A new process is
created by inserting its TCB into the linked list.
Some operating systems allow processes to be priorit- 13.3 Operating system support
ized so that a process with a high priority will always be from the CPU
executed in preference to a process with a lower priority.
A runnable process is executed when its turn arrives We now describe how a processor supports operating
system functions. It’s possible to design processors that are
protected from certain types of error or that provide hard-
ware support for multitasking. First-generation 8-bit micro-
PRE-EMPTIVE MULTITASKING
processors didn’t provide the operating systems designer
There are two versions of multitasking. The simplest version with any special help. Here, we concentrate on the 68K family
is called non-pre-emptive multitasking or because it provides particularly strong support to operating
co-operative multitasking. When a task runs, it executes systems.
code until it decides to pass control to another task.
At any instant a processor can be in one of several states or
Co-operative multitasking can lead to system crashes when
levels of privilege; for example, members of the 68K family
a task does not relinquish control.
In pre-emptive multitasking, the operating system forces provide two levels of privilege. One of the 68K’s states is called
a task to relinquish control after a given period. the supervisor state and the other the user state. The operating
system runs in the supervisor state and applications programs
Process 1
Process name
Pointer to next process
Process status
Process priority
Process 2
Program counter
Registers Process name The task control blocks
are arranged as a linked list
Pointer to next process
Memory requirement
Process status
Process priority
Program counter Process 3
Registers
To next process
in TCB chain
Memory requirement
running under the control of the operating system run in the Figure 13.8(a) and (b) show user programs and the operat-
user state. We will soon see that separating the operating sys- ing system existing in separate compartments or environ-
tem from user applications makes the system very robust and ments. We now explain why user programs and the operating
difficult to crash. When an applications program crashes system sometimes really do live in different universes. In sim-
(e.g. due to a bug), the crash doesn’t affect the operating sys- ple 68K-based systems, the processor’s supervisor and user
tem running in its protected supervisor environment. state mechanisms aren’t exploited, and all code is executed in
the supervisor state. More sophisticated systems with an oper-
ating system do make good use of the 68K’s user and super-
13.3.1 Switching states visor state mechanisms.
Let’s start from the assumption that the supervisor state used When power is first applied to the 68K, it automatically
by the operating system confers first-class privileges on the enters its supervisor state. This action makes sense, because
operating system—we’ll find out what these privileges are you would expect the operating system to initially take con-
shortly. When the processor is running in its user state, any trol of the computer while it sets everything up and loads the
interrupt or exception forces it into its supervisor state. That user processes that it’s going to run.
is, an exception causes a transition to the supervisor state The three questions we’ve now got to answer are the
and, therefore, calls the operating system. following.
Figure 13.8 illustrates two possible courses of action that ●
How does the 68K know which state it’s in?
may take place in a 68K system when an exception is generated. ●
How is a transition made from one state to another?
Both these diagrams are read from the top down. In each case, ●
What does it matter anyway?
the left-hand side represents user or applications programs
running in the user state and the right-hand side represents the The answer to the first question is easy—the 68K uses a flag
operating system running in the supervisor state. bit, called an S-bit, in its status register to indicate what state
In Fig. 13.8(a) a user program is running and an exception it’s currently operating in. If S 1, the processor is in its
occurs (e.g. a disk drive may request a data transfer). A jump is supervisor state and if S 0, the processor is in its user state.
made to the exception handler that forms part of the operating The S-bit is located in bit 13 of the 16-bit status register (SR).
system. The exception handler deals with the request and a The lower-order byte of the status register is the condition
return is made to the user program. However, the exception code register (CCR). The upper byte of the status register
might have been generated by a fatal error condition that arises containing the S-bit is called the system byte and defines the
during the execution of a program. Figure 13.8(b) shows the operating state of the processor.
situation in which an exception caused by a fatal error occurs. The second question we asked was ‘How is a transition
In this case, the operating system terminates the faulted user made from one state to another?’ The state diagram in
program and then runs another user program. Fig. 13.9 describes the relationship between the 68K’s user
Exception
Exception
Exception
Exception handler
Task killed
handler
Return
Return
New
task
Any exception and are said to be privileged. The STOP instruction brings the
processor to a halt and the RESET acts on external hardware
such as disk drives. You might not want the applications pro-
grammer to employ these powerful instructions that may
Supervisor User cause the entire system to crash if used inappropriately. Other
Any exception state
state privileged instructions are those that operate on the system
byte (including the S-bit) in the status register. If the applica-
tions programmer were permitted to access the S-bit, they
could change it from 0 (user state) to 1 (supervisor state) and
S 0 bypass the processor’s security mechanism.
If the 68K’s user/supervisor mode mechanism were lim-
Figure 13.9 Switching between user and supervisor states.
ited to preventing the user-state programmer executing cer-
tain instructions, it would be a nice feature of the processor,
and supervisor states. Lines with arrows indicate transitions but of no earth-shattering importance. The user/supervisor
between states (text against a line explains the action that state mechanism has two important benefits; the provision of
causes the transition). Figure 13.9 shows that a transition dual stack pointers and the support for memory protection.
from the supervisor state to the user state is made by clearing These two features protect the operating system’s memory
the S-bit in the status register. Executing a MOVE #0, SR from either accidental or deliberate modification by a user
instruction clears the S-bit (and the other bits) of the status application. We now describe how the 68K’s supervisor state
byte and puts the 68K in the user state. You could clear only protects its most vital region of memory—the stack.
the S-bit with the instruction ANDI #$DFFF,SR.
When the operating system wishes to execute an applica-
13.3.2 The 68K’s two stacks
tions program in the user state, it clears the S-bit and executes
a jump to the appropriate program; that is, the operating sys- Most computers manage subroutine return addresses by
tem invokes the less privileged user state by executing an means of a stack. The processor’s stack pointer points to the
instruction that clears the S-bit to 0. top of the stack and the stack pointer is automatically updated
Figure 13.9 demonstrates that once the 68K is running in its as items are pushed onto the stack or are pulled off it. When a
user state, the only way in which a transition can be made to subroutine is called by an instruction like BSR XYZ, the address
the supervisor state is by means of an exception—any excep- immediately after the subroutine call (i.e. the return address) is
tion. A return can’t be made to the supervisor state by using an pushed on the stack. The final instruction of the subroutine,
instruction to set the S-bit to 1. If you could do this, anyone RTS (return from subroutine), pulls the return address off the
would be able to access the supervisor state’s privileged fea- stack and loads it in the program counter.
tures and the security mechanism it provides would be worth- If you corrupt the contents of the stack by overwriting the
less. Let’s say that again—a program running in the user state return address or if you corrupt the stack pointer itself, the
cannot deliberately invoke the supervisor state directly. RTS will load an undefined address into the program
Suppose a user program running in the user state tries counter. Instead of making a return to a subroutine’s calling
to enter the privileged supervisor state by executing point, the processor will jump to a random point in memory
MOVE $2000, SR to set the S-bit. Any attempt by the user state and start executing code at that point. The result might lead
programmer to modify the S-bit results in a privilege viola- to an illegal instruction error or to an attempt to access non-
tion exception. This exception forces the 68K into its supervisor existent memory. Whatever happens, the program will crash.
state, where the exception handler deals with the problem. Consider the following fragment of very badly written
We can now answer the third question—what’s the benefit code that contains a serious error. Don’t worry about the fine
of the 68K’s two-state mechanism? Some instructions such as details—it’s the underlying principles that matter.
STOP and RESET can be executed only in the supervisor state Remember that the 68K’s stack pointer is address register A7.
13.3 Operating system support from the CPU 557
SP
0 Return SP 0 Return 0 Return 0 Return
address address address SP address
2 2 2 2
4 Parameter 4 Parameter SP 4 Parameter 4 Parameter
6 6 6 6
(a) State of the stack (b) State of the stack (c) State of the stack (d) State of the stack
after subroutine call. after adjusting the SP. after incorrectly reading after adjusting the
the parameters. stack pointer.
The programmer first pushes the 16-bit parameter in data Selected in user state
register D3 onto the stack by means of MOVE.W D3,(A7), and when S = 0
then calls a subroutine at location Sub_X. Figure 13.10(a) USP
illustrates the state of the stack at this point. As you can see, the User stack pointer User stack
stack contains the 16-bit parameter (one word) and the 32-bit
return address (two words) on top of the buried parameter.
When the subroutine is executed, the programmer attempts
to retrieve the parameter from the stack by first stepping past
Selected in supervisor
the 4-byte return address on the top of the stack. The instruc- state when S = 1
tion ADDA.L #4,A7 adds 4 to the stack pointer to leave it
pointing at the required parameter (Figure 13.10(b)). This is a
SSP
terrible way of accessing the parameter because you should
Supervisor Supervisor stack
never move the stack pointer down the stack when there are stack pointer
valid items on the stack above the stack pointer—do remember
that we’re providing an example of how not to do things.
The programmer then reads the parameter from the stack by
means of the operation MOVE.L (A7) ,D0. This instruction
Figure 13.11 The 68K’s two stack pointers.
pulls a longword off the stack and increments the stack pointer
by the size of the operand (four for a longword) (Fig. 13.10(c)).
Because the stack pointer has been moved down by first step- an erroneous return address resulting in a jump to a random
ping past the return address and then pulling the parameter off region of memory. We have demonstrated that this blunder
the stack, it must be adjusted by six to point to the subroutine’s not only gives the wrong result, but also generates a fatal error.
return address once more (i.e. a 4-byte return address plus a We now demonstrate how the user/supervisor mechanism
2-byte parameter) (Fig. 13.10(d)). Finally, the return from helps us to deal with such a situation.
subroutine instruction RTS) pulls the 32-bit return address off
the stack and loads it in the program counter. The 68K’s user and supervisor stack pointers
This fragment of code fails because it contains a serious There’s very little the computer designer can do to prevent pro-
error. The parameter initially pushed on the stack was a 16-bit gramming errors that corrupt either the stack or the stack
value, but the parameter read from the stack in the subroutine pointer. What the computer designer can do is to limit the
was a 32-bit value. The programmer really intended to effects of possible errors. Members of the 68K family approach
write the instruction MOVE.W (A7) ,D0 rather than the problem of stack security by providing two identical stack
MOVE.L (A7) ,D0; the error in the code is just a single letter. pointers—each of which is called address register A7 (see
The effect of this error is to leave the stack pointer pointing Fig. 13.11). However, both stack pointers can’t be active at the
at the second word of the 32-bit return address, rather than same time because either one or the other is in use (it’s a bit like
the first word. The SUBA.L #6,A7 instruction was intended to Clark Kent and Superman—you never see them together).
restore the stack pointer to its original value. However, because One of the 68K’s two stack pointers is called the supervisor
the stack pointer is pointing 2 bytes above the correct return stack pointer (SSP) and is active whenever the processor is in the
address, the RTS instruction loads the program counter with supervisor state. The other stack pointer, the user stack pointer
558 Chapter 13 The operating system
(USP) is active when the processor is in the user state. Because useful in implementing multitasking. Each user program has
the 68K is always in either the user state or the supervisor state, its own private stack. When the process is running, it uses the
only one stack pointer is available at any instant. The supervisor USP to point to its stack. When the process is waiting or
stack pointer is invisible to the user programmer—there’s no blocked, its own stack pointer is saved alongside the other ele-
way in which the user programmer can access the supervisor ments of its volatile portion (i.e. environment) in its task
stack pointer. However, the operating system in the supervisor control block.
state can use privileged instruction MOVE USP,Ai and MOVE The supervisor stack pointer is used by the operating sys-
Ai,USP to access the user stack pointer. tem to manage process switching and other operating system
Let’s summarize what we’ve just said. When the 68K is functions. In this way, each application can have its own user
operating in its supervisor state, its S-bit is 1 and the super- stack pointer and the operating system’s stack can be sep-
visor stack pointer is active. The supervisor stack pointer points arated from the user processes.
at the stack used by the operating system to handle its sub- Suppose an applications program (i.e. process) is running
routine and exception return addresses. Because an exception and a process-switching interrupt occurs. A jump is made to
sets the S-bit to 1, the return address is always pushed on the the scheduler, the S-bit is set, the supervisor stack pointer
supervisor stack even if the 68K was running in the user becomes active, and the return address and status word are
mode at the time of the exception. When the 68K is operating saved on the supervisor stack.
in its user state, its S-bit is 0 and the user stack pointer is The CPU’s address and data registers plus its PC and status
active. The user stack pointer points at the stack used by the register hold information required by the interrupted
current applications program to store subroutine return process. These registers constitute the process’s volatile por-
addresses. tion. The scheduler saves these registers on the supervisor
Consider the previous example of the faulty applications stack. You can use MOVEM.L D0-D7/A0-A6,-(A7) to push
program running in the user state (see Fig. 13.11). When the registers D0 to D7 and A0 to A6 onto the stack pointed at by
return from subroutine instruction is executed, an incorrect A7.1 We don’t save A7 because that’s the supervisor stack
return address is pulled off the stack and a jump to a random pointer. We do need to save the user stack pointer because
location made. An illegal instruction exception will eventu- that belongs to the process. We can access the USP by MOVE
ally occur when the processor tries to execute a data pattern USP,A0 and then save A0 on the supervisor stack with the
that doesn’t correspond to a legal op-code. An illegal instruc- other 15 registers (PC and SR).
tion exception forces a change of state from user to super- Having saved the last process’s volatile portion, the sched-
visor mode. The illegal instruction exception handler runs in uler can go about its job of switching processes. The next step
the supervisor state, whose own stack pointer has not been would be to copy these registers from the stack to the
corrupted. That is, the applications programmer can corrupt process’s entry in its task control block. Typically, the sched-
their own stack pointer and crash their program, but the uler might remove the process’s volatile environment from
operating system’s own stack pointer will not be affected by the top of the supervisor stack and copy these registers to the
the error. When a user program crashes, the operating system process’s task control block.
mounts a rescue attempt. The scheduler can now locate the next process to run
You may wonder what protects the supervisor stack according to an appropriate algorithm (e.g. first-come-first-
pointer. Nothing. It is assumed that a well constructed and served, highest priority first, smallest process first, etc.). Once
debugged operating system rarely corrupts its stack and the next process has been located it can be restarted by copy-
crashes (at least in comparison with user programs and pro- ing the process’s registers from the TCB to the supervisor
grams under development). stack and then pulling the registers off the stack immediately
The 68K’s two-stack architecture doesn’t directly prevent before executing an RTE instruction. Note that restoring a
the user programmer from corrupting the contents of the process’s volatile environment is the mirror image of saving a
operating system’s stack. Instead, it separates the stack used process’s volatile environment.
by the operating system and all exception-processing soft- The behavior of the task-switching mechanism can be
ware from the stack used by the applications programmer by expressed as pseudocode.
implementing two stack pointers. Whatever the user does in
their own environment cannot prevent the supervisor step-
ping in and dealing with the problem.
1
The 68K instruction MOVEM.L A0-A3/D2-D4 -(A7) pushes regis-
ters A0 to A3 and D2 to D4 on the stack pointed at by A7. The mnemonic
Use of two stacks in process switching MOVEM means move multiple and lets you copy a group of registers onto
the stack in one operation. MOVEM.L (A7) ,A0-A3/D2-D4 performs the
Earlier in this chapter we described the notion of multitask- inverse operation and pulls seven registers off the stack and restores them
ing. The 68K’s two stack pointer mechanism is particularly to A0 to A3 and D2 to D4.
13.3 Operating system support from the CPU 559
We represent this algorithm in the following 68K program. pushed on the supervisor stack and a jump is made to the
In order to test the task-switching mechanism, we’ve created a TRAP #0 exception handling routine whose address is in
dummy environment with two processes. Process 1 prints the memory location $00 0080.
number 1 on the screen whenever it is executed. Process 2 prints The program is entered at $400 where the supervisor stack
the sequence 2, 3, 4, . . . , 9, 2, 3, . . . when it is called. If we allow pointer is initialized and dummy values loaded into A6 and
each process to complete one print cycle before the next process A0 for testing purposes (because much of the program
is called, the output should be 12131415 . . . . 18191213 . . . involves transferring data between registers, the stack, and
In a real system, a real-time clock might be used to period- task control blocks, it’s nice to have visible markers when you
ically switch tasks. In our system we use a TRAP #0 instruction are debugging a program by single-stepping it).
to call the task switcher. This instruction acts like a hardware We have highlighted the body of the task switcher. The
interrupt that is generated internally by an instruction in the subroutine NEW selects the next process to run. In this case,
program (i.e. the program counter and status registers are there are only two processes and the code is as follows.
560 Chapter 13 The operating system
Task E
Task A Task A unused
unused
Task D Task D
Task E is split
Task B unused
into two parts
unused Task E
The next step is to copy all these registers to the task control conjunction with the operating system, deals with the alloca-
block pointed at by CURRENT (the variable that points to the tion of memory to variables.
active TCB). This operation saves the current task’s volatile If all computers had an infinite amount of random access
portion. memory, life would be much easier for the operating system
The task control block is changed by calling NEW to find the designer. When a new program is loaded from disk, you can
next task. The registers saved in the TCB are then copied to place it immediately after the last program you loaded into
the stack and then restored from the stack to invoke the new memory. Moreover, with an infinitely large memory you never
process. have to worry about loading programs that are too large for the
Now that we’ve described the 68K’s user and supervisor available memory. In practice, real computers may have too
modes and the role of exceptions in process switching, we can little memory. In this section we are going to look at how
introduce one of the most important aspects of an operating the operating system manages the available memory.
system, memory management. Figure 13.13(a) demonstrates multitasking where three
processes, A, B, and C are initially in memory. This diagram
shows the location of programs in the main store. In
13.4 Memory management Fig. 13.13(b) process B has been executed to completion and
deleted from memory to leave a hole in the memory. In
We’ve assumed that the computer’s central processing unit Fig. 13.13(c) a new process, process D, is loaded in part of the
generates the address of an instruction or data and that this unused memory and process A deleted. Finally, in Fig. 13.13(d)
address corresponds to the actual location of the data in a new process, process E, is loaded in memory in two parts
memory. For example, if a computer executes MOVE $1234,D0, because it can’t fit in any single free block of memory space.
the source operand is found in location number 123416 A multitasking system rapidly runs into the memory alloca-
in the computer’s random access memory. Although this tion and memory fragmentation problems described by
statement is true of simple microprocessor systems, it’s not Fig. 13.13. Operating systems use memory management to
true of computers with operating systems such as UNIX map the computer’s programs onto the available memory
and Windows. An address generated by the CPU doesn’t space. Memory management is carried out by means of
necessarily correspond to the actual location of the data in special-purpose hardware called a memory management unit
memory. Why this is so is the subject of this section. (MMU) (see Fig. 13.14). Today’s sophisticated micropro-
Memory management is a general term that covers all the cessors like the Pentium include an MMU on the same chip as
various techniques by which an address generated by a CPU the CPU. Earlier microprocessors often used external MMUs.
is translated into the actual address of the data in memory. The CPU generates the address of an operand or an instruc-
Memory management plays several roles in a computer sys- tion and places it on its address bus. This address is called a log-
tem. First, memory management permits computers with ical address—it’s the address that the programmer sees. The
small main stores to execute programs that are far larger than MMU translates the logical address into the location or phys-
the main store.2 Second, memory management is used in ical address of the operand in memory. Figure 13.14 shows how
multitasking operating systems to make it look as if each
process has sole control of the CPU. Third, memory manage- 2
Running programs larger than the actual immediate access memory
ment can be employed to protect one process from being cor- was once very important when memory cost a fortune and computers
rupted by another process. Finally, memory management, in had tiny memory systems.
562 Chapter 13 The operating system
the logical address 1234567816 address from the CPU gets within a page. If, for example, the CPU generates the address
mapped onto the physical address ABC67816. 101102, location 6 on logical page 2 is accessed.
The logical address consists of two parts, a page address and In a system with memory management the 3-bit word
a word address. In the previous example, page 1234516 gets address from the CPU goes directly to the memory, but the
translated into page ABC16 and the word address 67816 remains 2-bit page address is sent to the memory management unit (see
unchanged. Figure 13.15 illustrates the relationship between Fig. 13.16). The logical page address from the CPU selects an
word address and page address for a very simple computer sys- entry in a table of pages in the MMU as Fig. 13.16 demon-
tem with four pages of eight words (i.e. 4 8 32 locations). strates. Suppose the processor accesses logical page 2 and the
The logical address from the CPU in Fig. 13.15 consists of corresponding page table entry contains the value 3. This
a 2-bit page address that selects one of 22 4 pages, and a value (i.e. 3) corresponds to the physical page address of the
3-bit word address that provides an offset (or index) into the location being accessed in memory; that is, the MMU has
currently selected page. A 3-bit offset can access 23 8 words translated logical page 2 into physical page 3. The physical
Logical Physical
address address
Logical page
number Physical
CPU address Memory
Memory
management
unit
Physical page
number
The MMU translates a
logical page address into
a physical page address
Physical memory
2 6 2 6
Figure 13.15 The structure of paged memory. Figure 13.16 Mapping logical onto physical pages.
13.4 Memory management 563
address corresponds to the location of the actual operand in Consider a system with 4-kbyte logical and physical pages
memory. If you compare Figs 13.15 and 13.16 you can see that and suppose the processor generates the logical address
the same logical address has been used to access two different 88123416. This 24-bit address is made up of a 12-bit logical
physical addresses. page address 88116 and a 12-bit word address 23416. The 12
Why should the operating system take an address from the low-order bits (23416) define the same relative location
processor and convert it into a new address to access physical within both logical and physical address pages. The logical
memory? To answer this question we have to look at how pro- page address is sent to the MMU, which looks up the corre-
grams are arranged in memory. Figure 13.17 shows the struc- sponding physical page address in entry number 881 in the
ture of both logical memory and physical memory during the page table. The physical page address found in this location is
execution of processes A, B, C, and D. As far as the processor passed to memory.
is concerned, the processes all occupy single blocks of address Let’s look at the way in which the MMU performs map-
space that are located consecutively in logical memory ping. Figure 13.18 demonstrates how the pages or frames of
(Fig. 13.17(a)). logical address space are mapped onto the frames of physical
If you examine the physical memory (Fig. 13.17(b)), the address space. The corresponding address mapping table is
actual processes are distributed in real memory in an described in Table 13.1. Notice that logical page 3 and logical
almost random fashion. Processes B and C are split into non- page 8 are both mapped onto physical page 6. This situation
consecutive regions and two regions of physical memory are might arise when two programs share a common resource
unallocated. The logical address space seen by the processor is (e.g. a compiler or an editor). Each program thinks that it has
larger than the physical address space—process D is currently a unique copy of the resource, although both programs access
located on the hard disk and is not in the computer’s RAM. a shared copy of the resource.
This mechanism is called virtual memory.
A processor’s logical address space is composed of all the
13.4.1 Virtual memory
addresses that the processor can specify. If the processor has a
32-bit address, its logical address space consists of 232 bytes. We’ve already said that a computer can execute programs
The physical address space is the memory and its size larger than its physical memory. In a virtual memory system
depends on how much memory the computer user can the programmer sees a large array of physical memory (the
afford. We will soon see how the operating system deals with virtual memory), which appears to be entirely composed of
situations in which the processor wishes to run programs that high-speed main store. In reality, the physical memory is
are larger than the available physical address space. The func- composed of a relatively small high-speed RAM and a much
tion of the MMU is to map the addresses generated by the larger but slower disk store. Virtual memory has two advant-
CPU onto the actual memory and to keep track of where data ages. It allows the execution of programs larger than the
is stored as new processes are created and old ones removed. physical memory would normally permit and frees the pro-
With an MMU, the CPU doesn’t have to worry about where grammer from worrying about choosing logical addresses
programs and data are actually located. falling within the range of available physical addresses.
Programmers may choose any logical address they desire for
their program and its variables. The actual addresses selected
by a programmer don’t matter, because the logical addresses
CPU memory space Actual memory space are automatically mapped into the available physical memory
space as the operating system sees fit.
Task C The means of accomplishing such an apparently impos-
Task A
sible task is called virtual memory and was first used in the Atlas
Task A
Task B computer at the University of Manchester, England in 1960.
unused Hard disk Figure 13.19 illustrates a system with 10 logical address pages
but only five physical address pages. Consequently, only 50%
Task C Task B
of the logical address space can be mapped onto physical
unused
address space at any instant. Table 13.2 provides a logical page
Task C
to physical page mapping table for this situation. Each entry
Task D Task B
in the logical address page table has two entries: one is the
Task D is not in
RAM—it's on disk present bit, which indicates whether the corresponding page
is available in physical memory and the other is the logical
(a) Logical address space. (b) Physical address space. page to physical page mapping.
Part of a program that’s not being used resides on disk.
Figure 13.17 Logical and physical address space. When this code is to be executed, it is copied from disk to the
564 Chapter 13 The operating system
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
Figure 13.18 Mapping logical address
Logical page Physical page space onto physical address space.
of virtual memory are, essentially, the same as those system looks at its dirty bit. If this bit is clear, nothing need be
governing the operation of cache memory (described later). done; if it is set, the page must be copied to disk.
When a page fault is detected, the operating system trans- Virtual memory allows the programmer to write programs
fers a new page from disk to physical memory and over- without having to know anything about the characteristics of
writes a page in physical memory. If physical memory is full, real memory and where the program is to be located.
it’s necessary to discard an existing page. The most sensible
way of selecting an old page for removal is to take the page 13.4.2 Virtual memory and the
that is not going to be required in the near future.
Unfortunately, this scheme is impossible to implement. A
68K family
simple page replacement algorithm is called the not- Members of Motorola’s 68K family are well suited to virtual
recently-used algorithm, which is not optimum but it is very memory technology. We’ve already stated that the 68K’s archi-
easy to implement. tecture provides mechanisms to support operating systems.
When a new page replaces an old page, any data in the old The 68K’s protected state when S 1 separates operating sys-
page frame that has been modified since it was created must be tem and application level programs (aided by the dual stack
written back to disk. A typical virtual memory system clears a pointer mechanism). 68K processors have a function control
dirty bit in the page table when the page is first created. output that tells an external system such as a memory manage-
Whenever the processor performs a write operation to an ment unit whether the CPU is executing an instruction in the
operand on this page, the dirty bit is set. When this page is user or the supervisor state.
swapped out (i.e. overwritten by a new page), the operating Figure 13.20 illustrates the dialogue that takes place
between the CPU, the memory management unit (MMU),
and the memory system during a read or a write cycle. The
Logical page Present bit Physical page MMU is configured by the operating system when the com-
puter is first powered up. The operating system sets up logical
0 1 0 address to physical address translation tables and defines the
1 1 3 type of access that each page may take part in (we’ll see the
2 0 reason for this shortly).
3 1 1 At the start of a memory access the CPU generates a logical
4 0 address and sends it to the MMU together with the control
signals that define the type of the access (i.e. read or write,
5 0
program or data, user or supervisor mode). If the location
6 1 2
being accessed is not currently in the main store or is an ille-
7 1 4
gal access, the MMU sends an error message to the CPU to
8 0 abort the current access and to begin exception processing
9 0 and error recovery. An illegal access occurs when a process
attempts to write to a page that has been designated read-
Table 13.2 Logical to physical address mapping table only, or when a user program is attempting to access a page
corresponding to Fig. 13.19. assigned to supervisor space and the operating system.
Start
MMU MEMORY
(main store)
Physical address
CPU
Here's an address Access memory
Logical address
Error
Address Check the Address
illegal address ok The MMU translates
a logical address into
a physical address. If a
translation is not possible,
the MMU informs the CPU
that the current access Figure 13.20 Dialogue between
can't be completed.
the CPU, MMU, and memory.
566 Chapter 13 The operating system
By dividing memory space into regions of different charac- copying a page from disk to main store and then updating the
teristics, you can provide a considerable measure of security. A MMU. An illegal access would probably result in the offend-
user program cannot access memory space belonging to the ing process being suspended.
operating system, because an attempt to access this memory The 68K’s user/supervisor modes, exception-handling
space would result in the MMU generating an interrupt. Not facilities, and memory management make it a very robust
only does the processor protect the supervisor stack pointer processor. Errors in a user program that would otherwise
from illegal access by a user program, the 68K and MMU com- bring the system to a halt force a switch to the 68K’s super-
bination protects the supervisor stack (and any other address visor state and allow the operating system to either repair the
space allocated to the supervisor) from illegal access. damage or to terminate the faulty program. The memory
Figure 13.21 illustrates the structure of a memory man- management mechanism protects the operating system from
agement system in a 68K-based computer that checks illegal access by applications programs and even protects one
whether the address space currently being accessed is legal. user program from access by another.
Each entry in the MMU’s page translation table contains the
details about the page’s access rights. Whenever the 68K Memory management in real systems
performs a memory access, it indicates the type of access on In reality, memory management is a very complex mechan-
its function code output pins (e.g. user/supervisor, ism, even though the underlying concepts are very simple.
code/data). For example, the 68020 may say ‘I’m operating The picture we have just presented is very simplified because
in the user state performing a read access to data with a log- we’ve omitted the detail.
ical address 1234567816’. The MMU compares the CPU’s A real memory management system does not normally
function code and the read/write signal with the informa- have a single page table; it would be a too big. If we have a
tion in the currently accessed page in its mapping table. If 32-bit virtual address and an 8 kbyte page, the number of bits
the access is legal, a memory access takes place. If either the used to specify a logical page is 32 13 19. This arrange-
corresponding physical page is not in memory or the access ment would require a page table with 219 entries.
is illegal, a page fault is generated and a signal returned to Figure 13.22 demonstrates how a real system solves the
the 68K’s bus error input. In terms of the previous example, page table problem by means of a hierarchical table search.
the logical address 1234567816 might generate a page The 10 most-significant bits of the virtual address access a
address 1234516. If this page is in the MMU and it can be first-level table. The output of the first-level table is a pointer
accessed by a user-mode write, a logical-to-physical page to a second-level table that is indexed into by 9 bits from the
translation can take place. virtual address. This table provides that actual physical page
A bus error is a special type of exception and the 68K calls number. A multilevel table scheme allows us to use the first-
the appropriate handler in the operating system to deal with level table to point to, for example, different processes, and
it. A missing physical page results in the operating system the second-level table to point to the pages that make up each
Data bus
68K CPU
Address bus
Logical Main store
address
Control signals
Memory
management
unit (MMU)
Address from Space type
CPU access type: CPU selects
Read/write memory space
Program/data
User/supervisor Control signals
from CPU select
space type
Error signal Error
(page fault) signal
MMU mapping
table
Page
table
base 19 bits 13 bits Target operand
Physical address
19 bits
Select
2nd level 2nd level
table page table
1st level
page table
MALWARE
Memory management is an important line of defense against writers to embed viruses in data used by applications
errors that occur when an application accesses memory space programs such as e-mails.
not allocated to it. Some programs deliberately perform Viruses can be injected by ingenious techniques such as
operations that are unintended by the computer user; these buffer overflow. A buffer is a region of memory used to store
are collectively called malware and include viruses, worms, data. Buffer overflow occurs when the data takes more space
Trojan horses, and spyware. than that allocated by the buffer. By exploiting buffer overflow
People who would never dream of going into the street and you can fill a region of memory with code (rather than data)
assaulting a passer-by will quite cheerfully release programs and then transfer control to that code to activate the virus.
via the Internet that create havoc on people’s computers; they Some processors now contain hardware mechanisms to
will destroy a child’s homework, modify patient records in a prevent the execution of such code.
hospital, or delete a photographer’s images. They use the Commercially available antivirus programs are
excuse that they are testing people’s computer security (even widely available to scan memories for the signature of a
Bonnie and Clyde never claimed that they were testing bank virus (a signature is the binary sequence left behind when
security) or they say that they are leading an attack on the code of a virus is compressed rather like a cyclic
Microsoft’s evil empire. redundancy code). Some viruses are polymorphic and
A virus is a program that has strong analogies with mutate as they spread, making it difficult to detect their
biological viruses because it replicates itself, spreads signature.
autonomously, mutates, and can damage its host. A virus is A Trojan horse is a program that appears harmless but
introduced into a host via the Internet or via an infected which carries out a task unknown to the user. A worm is a
program on a floppy disk, flash memory, or CD/DVD. program that exploits the Internet and spreads from computer
A virus must be an executable program in order for it to to computer generating so much traffic that the Internet can
run and to replicate itself. In the PC world, a virus may have be dramatically slowed.
the extension .exe or .pif. However, one of the strengths of Spyware is a class of program that may spread like a virus or
modern computer applications is the use of scripting may be introduced as part of another program. Spyware
languages and macros that allow a user program to respond monitors your surfing habits (or even accesses to personal
to its environment. These facilities are employed by virus data) and sends this information to a third party.
568 Chapter 13 The operating system
process. Performing a logical-to-physical address translation 13.3 What is the difference between operating systems on
to locate a physical page address is called a table walk. large and small computers?
The arrangement of Fig. 13.22 requires 210 level-one pages 13.4 WIMP-based operating systems have largely replaced JCL-
and 29 level-two pages; that is, 3 29 pages. A single-level based operating systems on PCs. Do JCL-based operating
page table would require 219 pages. systems such as Microsoft’s DOS 6 and UNIX have any
The price paid for a memory management system (espe- advantages over WIMP-based systems?
cially one with multilevel tables) is the time it takes to per- 13.5 Is it necessary for a CPU to support interrupts in order to
form an address translation. Practical memory mapping is construct an operating system?
possible only because very few table accesses take place. Once 13.6 A process in a multitasking system can be in one of three
a sequence of logical-to-physical address mappings have states: running, runnable, or blocked. What does this statement
been performed the address translation is cached in a transla- mean and what are the differences between the three states?
tion look aside buffer (TLB). The next time the same logical 13.7 What is a process control block and what is the minimum
page address appears, the corresponding page address is read amount of information that it must store?
from the TLB to avoid a table walk. Because of the way data 13.8 What are the 68K’s user and supervisor states and why
and programs are structured, address translations mainly have they been implemented?
take place using the TLB. 13.9 Explain why the stack is such an important data structure
and how stack errors can cause the system to crash.
■ SUMMARY 13.10 The 68K provides a greater degree of protection from
user (applications) errors by implementing two stack pointers.
The operating system is a unique topic in computer science Explain how this protection mechanism works.
because nowhere else does hardware and software so closely
13.11 If two stack pointers are a good thing (i.e. the 68K’s user
meet. Although most computers today see the operating system
and supervisor stack pointers), can you see advantages in having
as the GUI and the file manager, there is another part of the
two PCs or two sets of data registers, and so on?
operating system that lies hidden from the user. This is the
kernel that performs process switching in a multitasking system 13.12 What is the difference between a physical address and a
and allocates logical address space to the available memory. logical address?
In this chapter we have shown how the multitasking can be 13.13 When a new physical page is swapped into memory, one
implemented by saving one process’s volatile portion and then of the existing pages has to be rejected. How is the decision to
restoring another task by loading its volatile portion in the reject an existing page made?
processor’s registers. 13.14 What is the difference between virtual memory and
One of the most important functions carried out by cache memory?
operating systems is the management of the memory. We have
shown how logical addresses in the program can be mapped 13.15 Write a program in 68K assembly language that
onto locations in the immediate access memory. We have also periodically switches between two processes (assume these are
looked at the 68K’s user/supervisor mode facility and described fixed processes permanently stored in memory).
how it can be used to create secure operating systems. 13.16 What is the difference between pre-emptive and
non-pre-emptive operating systems? Are the various Windows
operating systems pre-emptive?
■ PROBLEMS
13.17 What is malware? How has it developed over the last
13.1 What is an operating system? few years?
13.2 What is the difference between a modern operating 13.18 What hardware facilities in a computer can be used to
system and a typical operating system from the 1970s? defeat the spread of malware?
Computer communications 14
CHAPTER MAP
INTRODUCTION
Two of the greatest technologies of our age are telecommunications and computer engineering.
Telecommunications is concerned with moving information from one point to another. We take
the telecommunications industry for granted. If you were to ask someone what the greatest
technological feat of 1969 was, they might reply, ‘The first manned landing on the moon.’You
could say that a more magnificent achievement was the ability of millions of people half a million
kilometers away to watch events on the moon in their own homes.
It’s not surprising that telecommunications and computer engineering merged to allow
computers to communicate and share resources. Until the 1990s developments in
telecommunications didn’t greatly affect the average person in the same way that computer
technology had revolutionized every facet of life. Better communications meant lower telephone
bills and the cell phone.
Computer networks began as part of a trend towards distributed computing with
multicomputer systems and distributed databases. From the 1970s onward computer networks
were implemented to allow organizations such as the military, the business world, and the
academic communities to share data. Easy access to the Internet and the invention of the browser
created a revolution almost as big as the microprocessor revolution of the 1970s. The success of
the Internet drove developments in communications equipment.
This chapter examines the way in which computers communicate with each other, concentrating
more on the hardware-related aspects of computer communication than the software.
We begin with a short history of communications, concentrating on the development of long-
distance signaling systems. We then introduce the idea of protocols and standards, which play a
vital role in any communications system. Simply moving data from one point to another isn’t the
whole story. Protocols are the mutually agreed rules or procedures enabling computers to
exchange data in an orderly fashion. By implementing a suitable protocol we ensure that the data
gets to its correct destination and deal with the problems of lost or corrupted data.
The next step is to examine how digital data in serial form is physically moved from one point
to another. We look at two types of data path, the telephone network and the RS232C interface
that links together computers and peripherals. Two protocols for the transmission of serial data are
570 Chapter 14 Computer communications
When the first edition of this text appeared, their own private local area networks; by
computer communications was very much a 1999 25% of US households had more than
corporate affair. Only the rich communicated one PC and this figure was expected to reach
with each other. By the time the third edition 50% by 2005.
appeared, the PC had become popular and The growth in PC ownership is driven by
the Internet and World Wide Web were used several factors. More and more people are
by people at home. Connections to the moving from being computer literate to
Internet were mainly via the public switched being computer experts capable of
telephone network, although a lucky few had maintaining complex systems.
broadband connections via cable or the The cost of computing has declined in real
telephone using ASDL. terms and the performance of hardware
Today, high-speed connections to the has continued to increase. The
Internet are commonplace and many homes market has been driven by computer
have several PCs. Each member of the games and domestic entertainment
household may have their own PC and some such as home theatre and the rise
may have laptop PCs with wireless networks. of the DVD, the camcorder, and the
This means that many home users now have digital camera.
14.1 Background with a PC at home could access NASA’s database to see pic-
tures of the latest space shots before they got on the evening
It’s expensive to construct data links between computers sep- news. Moreover, the child didn’t need to know anything
arated by distances ranging from the other side of town to the about computer science other than how to operate a mouse.
other side of the World. There is, however, one network that Figure 14.1 illustrates the concept of a computer network
has spanned the globe for over 50 years, the public switched with two interconnected local area networks. A network
telephone network (PSTN). Some even refer to the PSTN by performs the same function as a telephone exchange and
the acronym POTS (plain old telephone system). The tele- routes data from one computer to another. The LANs in
phone network doesn’t provide an ideal solution to the link- Fig. 14.1 might be used to share data in, for example, a uni-
ing of computers, because it was not originally designed to versity environment. The local area networks are themselves
handle high-speed digital data. connected to the telephone system via hardware called a
During the 1980s a considerable change in the way com- modem. Figure 14.1 also demonstrates that a single computer
puters were used took place. The flood of low-cost microcom- can be connected to the other networks via the PSTN.
puters generated a corresponding increase in the number of A LAN let’s you communicate with a mainframe on a
peripherals capable of being controlled by a computer. It is distant site or with one of the many microprocessors and
now commonplace to connect together many different com- peripherals on your own site. The local area network has
puters and peripherals on one site (e.g. a factory), enabling made possible the paperless office in which people pass
data to be shared, control centralized, and efficiency memos to each other via the network.
improved. Such a network is called a local area network (LAN). Figure 14.2 describes the type of network that you might
When the PC became popular, low-cost hardware and now see in a school, a small office, or a home. A wireless
software were used to link PCs to the global network, the gateway is connected to a PC via a cable. The gateway has a
Internet. By the late 1990s networks were no longer connection to a cable modem that provides a high-speed link
the province of the factory or university—any school child to the Internet via a cable network. The gateway uses wireless
14.1 Background 571
COMMUNICATIONS HARDWARE
A few years ago, most computer users employed only one large network together. The router simply passes information
piece of communications equipment; the modem, which links unchanged from one segment to another.
computers to the telephone network. The modem itself was Some organizations might have multiple networks. A bridge
invariably an external device that connected to a PC via its is a device that links two different networks. If a computer on
serial RSC32C interface. a network sends data to another device on the same network,
Today, modems are often internal devices that plug into a the bridge takes no part in the communication. If, however,
PC’s motherboard. Indeed, many modem laptops come with the message is intended for a device on another network, the
an internal modem as standard. bridge passes the message between the networks. The address
Today’s PCs are designed to connect to local area networks. examined by a bridge is the unique media access address
The computer uses a network interface card (NIC), to given to each physical node in a network. The bridge operates
connect to a bus called an Ethernet. Each NIC has its own at the data link layer level of a network.
unique fixed internal address created at the time of The router is an even more sophisticated network device
the card’s manufacture. This is a physical address that because it can link different types of network that may be
identifies the computer within the network, but it is separated by a communications path. The bridge simply
not the address by which the computer is known detects information whose destination is on another network
externally. and passes it on, whereas a router has to be able to
Some modern network interface cards use wireless communicate with different types of network with different
communications, Wi-Fi, to allow computers, laptops, and even protocols. A router operates at the network level and can
printers to operate over a range of between 10 m and 100 m. connect networks with different data link level protocols.
Large networks require devices to amplify signals on them. Routers are able to reformat packets of information before
The repeater is a device that simply links two segments of a transmitting then to another network.
Instead, each node must have its own road map and make a its other neighbor. Messages flow in one direction round
decision on which link the message is to be transmitted on the ring.
the way to its destination. The only routing requirement placed on each node is that
Calculating the best route through the network for each it must be able to recognize a message intended for itself. The
message has the computational overhead of working out ring does not suffer from contention like the bus topology.
routing algorithms. Furthermore, whenever a new link or
node is added to the network, the routing information must
be changed at each node. Figure 14.4 shows how a message
may be routed through an unconstrained topology. We will Source
return to the topic of routing.
The bus
The bus topology is illustrated in Fig. 14.6. Both the bus and
the ring are attempts to minimize the complexity of a net- Figure 14.5 The star topology.
work by both removing a special-purpose central node and
the need for individual nodes to make routing decisions.
In a bus all nodes are connected to a common data highway.
The bus may be a single path linking all nodes. A more general
form of bus consists of several interlinked buses and is called an
unrooted tree. When a message is put on the bus by a node, it
flows outwards in all directions and eventually reaches every
point in the network. The bus has one topological and one prac-
Figure 14.6 The bus topology.
tical restriction. Only one path may exist between any two
points, otherwise there would be nothing to stop a message
flowing round a loop forever. The practical limitation is that the
bus cannot exceed some maximum distance from end to end.
The principal problem faced by the designers of a bus is how
to deal with a number of nodes wanting to use the bus at the
same time. This is called bus contention and is dealt with later.
The ring
Figure 14.7 illustrates the ring topology, in which the nodes
are connected together in the form of a ring. Like the bus, this
topology provides a decentralized structure, because no
central node is needed to control the ring. Each node simply
receives a message from one neighbor and passes it on to Figure 14.7 The ring topology.
574 Chapter 14 Computer communications
However, a node on the ring has the problem of how to inject invented a telegraph in 1828 that used the magnetic field
a new message into the existing traffic flow. round a wire to deflect a compass needle. By 1840 a 40-mile
A ring is prone to failure because a broken link makes it stretch between Slough and Paddington in London had been
impossible to pass messages all the way round the ring. Some linked using the Wheatstone and Cooke telegraph.
networks employ a double ring structure with two links Figure 14.8 illustrates the operation of a different type of
between each node. If one of the links is broken it is possible telegraph that produces a sound rather than the deflection of
for the ring to reconfigure itself and bypass the failure. compass needles. When the key is depressed, a current flows
in the circuit magnetizes the iron core inside the coil, and
energizes the solenoid. The magnetized core attracts a small
14.1.3 History of computer iron plate that produces an audible click as it strikes the core.
communications Information is transmitted to this type of telegraph in the
Before we describe computer networks, it’s instructive to form of the Morse code.
take a short look at the history of data transmission. Some Samuel Morse constructed his code from four symbols: the
think that electronics began in the 1960s or even later. dot, the dash (whose duration is equal to three dots), the
Telecommunications predates the electronic digital com- space between dots and dashes, and the space between words.
puter by over a century and its history is just as exciting as the Unlike simple codes, the Morse code is a variable length code.
space race of the 1960s. Key players were engineers every bit The original Morse key didn’t send a ‘bleep’—a dot was the
as great as Newton or Einstein. interval between two closely spaced clicks and a dash the
At the beginning of the nineteenth century, King interval between two more widely spaced clicks. In other
Maximilian in Bavaria had seen how the French visual sema- words, the operator had to listen to the space between clicks.
phore system helped Napoleon’s military campaigns. In 1809 In 1843 Morse sent his assistant Alfred Vail to the printer’s
Maximilian asked the Bavarian Academy of Sciences to look to count the relative frequencies of the letters they were using
for a way to communicate over long distances. As a result, to set up their press. Morse gave frequently occurring letters
Samuil T. von Sömmering designed a crude telegraph that short codes and infrequently occurring letters were given long
used a conductor (one for each character) that required 35 symbols; for example, the code for E is • and Q is — —• —.
parallel wires. How was information transmitted in a It’s interesting to note that the Morse code is relatively close to
pre-electronic age? If you pass electricity through water con- the optimum Huffman code for the English language.
taining a little acid the electric current breaks down the water The very first long-distance telecommunications networks
into oxygen and hydrogen. Sömmering’s telegraph worked by were designed to transmit digital information from point
detecting the bubbles that appeared in a glass tube containing to point (i.e. on–off telegraph signals). Information was
acidified water when electricity was passed through it. transmitted in binary form using two signal levels
Sömmering’s telegraph wasn’t exactly suited to high-speed (current mark, no current space). The transmitter was
transmission—but it was a start. the Morse key and the receiver was the Morse telegraph.
Hans C. Oersted made the greatest leap forward in elec-
trical engineering in 1819 when he discovered that an electric The first long-distance data links
current creates a magnetic field round a conductor. We take wires and cables for granted. In the early nineteenth
Conversely, a moving magnetic field induces an electric century, plastics hadn’t been invented and the only materials
current in a conductor. available for insulation and waterproofing were things like
A major driving force behind early telecommunications asphaltum. In 1843 a form of rubber called gutta percha was
systems was the growth of the rail network. A system was discovered and was used to insulate the signal-carrying path
required to warn stations down the line that a train was arriv- in cables. The Atlantic Telegraph Company created an insu-
ing. Charles Wheatstone and Charles William Cooke lated cable for underwater use containing a single copper
Iron
Spring
conductor made of seven twisted strands, surrounded by theory of light. I’m not certain what he did in his spare time.
gutta percha insulation. This cable was protected by 18 One of Thomson’s most quoted statements that still applies
surrounding iron wires coated with hemp and tar. today was:
Submarine cable telegraphy began with a cable crossing
I often say when you can measure what you are speaking about and
the English Channel to France in 1850. Alas the cable failed
express it in numbers, you know something about it, but when you can-
after only a few messages had been exchanged. A more
not measure it, when you cannot express it in numbers, your knowledge
successful attempt was made the following year. of it is of a meager and unsatisfactory kind.
Transatlantic cable laying from Ireland began in 1857 but
was abandoned when the strain of the cable descending to In 1855 Thomson presented a paper to the Royal Society
the ocean bottom caused it to snap under its own weight. The analyzing the effect of pulse distortion, which became the
Atlantic Telegraph Company tried again in 1858. Again, the cornerstone of what is now called transmission line theory.
cable broke after only 3 miles but the two cable-laying ships The cause of the problems investigated by Thomson lies in
managed to splice the two ends. After several more breaks the physical properties of electrical conductors and insula-
and storm damage, the cable reached Newfoundland in tors. At its simplest, the effect of a transmission line is to
August 1857. reduce the speed at which signals can change state.
It soon became clear that this cable wasn’t going to be a Thomson’s theories enabled engineers to construct data links
commercial success because the signal was too weak to detect with much lower levels of distortion.
reliably (the receiver used the magnetic field from current in
the cable to deflect a magnetized needle). The original voltage Origins of the telephone network
used to drive a current down the cable was approximately In 1872 Alexander Graham Bell who had recently emigrated
600 V. So, they raised the voltage to about 2000 V to drive to the USA started work on a method of transmitting several
more current along the cable. Such a high voltage burned signals simultaneously over a single line. Bell’s project was
through the primitive insulation, shorted the cable, and called the harmonic telegraph. This project failed, but it did
destroyed the first transatlantic telegraph link after about lead to the development of the telephone in 1876. Note that
700 messages had been transmitted in 3 months. development of the telephone is a complex story and Bell is
In England, the Telegraph Construction and Maintenance no longer recognized as the sole inventor of the telephone.
Company developed a new 2300-mile-long cable weighing A network designed to transmit intelligible speech
9000 tons, which was three times the diameter of the failed (as opposed to hi-fi) must transmit analog signals in the
1858 cable. Laying this cable required the largest ship in the frequency range 300 to about 3300 Hz (i.e. the so-called
world. After a failed attempt in 1865 a transatlantic link was voice-band). Consequently, the telephone network now link-
established in 1866. ing millions of subscribers across the World can’t be used
During the nineteenth century the length of cables to directly transmit digital data that requires a bandwidth
increased as technology advanced. It soon became apparent extending to zero frequency (i.e. d.c.). If the computer had
that signals suffer distortion during transmission. The 1866 been invented before the telephone, we wouldn’t have had
transatlantic telegraph cable could transmit only eight words this problem. Transmission paths that transmit or pass sig-
per minute. By the way, it cost $100 in gold to transmit nals with frequency components from d.c. to some upper
20 words (including the address) across the first transatlantic limit are called baseband channels. Transmission paths that
cable. transmit frequencies between a lower and an upper
A sharply rising pulse at the transmitter end of a cable is frequency are called bandpass channels.
received at the far end as a highly distorted pulse with long Digital information from computers or peripherals must
rise and fall times. The sponsors of the transatlantic cable be converted into analog form before it is transmitted across
project were worried by the effect of this distortion and the a bandpass channel such as the PSTN. At the receiving end of
problem was eventually handed to William Thomson at the the network, this analog signal is reconverted into digital
University of Glasgow. form. The device that converts between digital and analog
Thomson was one of the nineteenth century’s greatest sci- signals over a data link is called a modem (i.e.
entists who published more than 600 papers. He developed modulator–demodulator). Ironically enough, all the long-
the second law of thermodynamics and created the absolute haul links on modern telephone networks now transmit dig-
temperature scale. The unit of temperature with absolute ital data, which means that the analog signal derived from the
zero at 0 K is called the kelvin in his honor—Thomson later digital data must be converted to digital form before trans-
became Lord Kelvin. Thomson worked on the dynamical mission over these links. It is probable that the PSTN will
theory of heat and carried out fundamental work in hydro- become entirely digital and speech will be converted to digi-
dynamics. His mathematical analysis of electricity and tal form within the subscriber’s own telephone. Indeed, the
magnetism covered the basic ideas for the electromagnetic only analog link in many telephone systems is just the
576 Chapter 14 Computer communications
connection between the subscriber and the local exchange. The telegraph, telephone, and vacuum tube were all steps
This link is sometimes called the last mile. on the path to the development of computer networks. As
Although the first telegraph systems operated from point each of these practical steps was taken, there was a corre-
to point, the introduction of the telephone led to the devel- sponding development in the accompanying theory (in the
opment of switching centers, or telephone exchanges. The case of radio, the theory came before the discovery).
first-generation of switches employed a telephone operator Table 14.1 provides a list of some of the most significant dates
who manually plugged a subscriber’s line into a line con- in the early development of long-distance communications
nected to the next switching center in the link. By the end of systems.
the nineteenth century, the infrastructure of the computer Computer communications is a complex branch for com-
networks was already in place. puting because it covers so many areas. A programmer drags
In 1897 an undertaker called Almon Strowger invented the an icon from one place to another on a screen. This action
automatic telephone exchange that used electromechanical causes the applications program to send a message to the
devices to route calls between exchanges. When a number operating system that might begin a sequence of transactions
was dialed, a series of pulses were sent down the line to a resulting in data being retrieved from a computer half way
rotary switch. If you dialed, for example 5, the five pulses around the World. Data sent from one place to another has to
would move a switch five steps to connect you to line number be encapsulated, given an address, and sent on its way. Its
five, which routed your call to the next switching center. progress has to be monitored and its receipt acknowledged. It
Consequently, when you called someone the number you has to be formatted in the way appropriate to the transmis-
dialed depended on the route though the system. A system sion path. All these actions have to take place over many dif-
was developed where each user could be called with the same ferent communications channels (telephone, radio, satellite,
number from anywhere and the exchange would automati- and fiber optic cable). Moreover, all the hardware and soft-
cally translate this number to the specific numbers required ware components from different suppliers and constructed
to perform the routing. Mechanical switching was gradually with different technologies have to communicate with each
replaced by electronic switching and the pulse dialing that other.
actually operated the switches gave way to the use of tones The only way we can get such complex systems to work is
(i.e. messages to the switching computers). to create rules or protocols that define how the various
By the time the telegraph was well established, radio was components communicate with each other. In the next
being developed. James Clerk Maxwell predicted radio waves section we look at these rules and the bodies that define them.
in 1864 following his study of light and electromagnetic
waves. Heinrich Hertz demonstrated the existence of radio
waves in 1887 and Guglielmo Marconi is credited with being
the first to use radio to span the Atlantic in 1901. 14.2 Protocols and computer
In 1906 Lee deForest invented the vacuum tube amplifier. communications
Without a vacuum tube (or transistor) to amplify weak
signals, modern electronics would have been impossible Communication between two computers is possible
(although primitive computers using electromechanical provided that they employ standard hardware and software
devices could have been built without electronics). conforming to agreed standards. Much of computer
14.2 Protocols and computer communications 577
communications is concerned with how computers go about protocol as it is used in computer communications. When any
exchanging data, rather than with just the mechanisms used two parties communicate with each other (be they people or
to transmit data. Therefore, the standards used in computer machines), they must both agree to abide by a set of unam-
communications relate not only to the hardware parts of a biguous rules. For example, they must speak the same lan-
communication system (i.e. the plugs and sockets connecting guage and one may start speaking only when the other
a computer to a transmission path, the transmission path indicates a readiness to listen.
itself, the nature of the signals flowing along the transmission Suppose you have a bank overdraft and send a check to
path), but to the procedures or protocols followed in trans- cover it. If after a few days you receive a threatening letter
mitting the information. from the manager, what do you conclude? Was your check
Most readers will have some idea of what is meant by a received after the manager’s letter was sent? Has one of your
standard, but they may not have come across the term debits reached your account and increased the overdraft? Was
578 Chapter 14 Computer communications
the check lost in the post? This confusion demonstrates that Windows operating system is an example of an industrial stan-
the blind transmission of information can lead to unclear dard. The success of Windows has encouraged its adoption as a
and ill-defined situations. It is necessary for both parties to standard by most PC manufacturers and software houses.
know exactly what messages each has, and has not, received. The other type of standard is a national or international
We need a set of rules to govern the interchange of letters. standard that has been promulgated by a recognized body.
Such a set of rules is called a protocol and, in the case of There are international standards for the binary representa-
people, is learned as a child. When computers communicate tion of numbers. When the decimal number nine is transmit-
with each other, the protocol must be laid down more for- ted over a network, it is represented by its universally agreed
mally. If many different computers are to communicate with international standard, the binary pattern 00111001.
each other, it is necessary that they adhere to standard proto- The world of standards involves lots of different parties
cols that have been promulgated by national and interna- with vested interests at local, national, and international
tional standards organizations, trade organizations, and levels. A standard begins life in a working party in a profes-
other related bodies. sional organization such as the Institute of Electrical and
In the 1970s and 1980s the number of computers and Electronic Engineers (IEEE) or the Electronic Industries
the volume of data to be exchanged between computers Association (EIA). The standard generated by a professional
increased dramatically. Manufacturers were slow to agree on body is forwarded to the appropriate national standards body
and to adopt standard protocols for the exchange of data, (e.g. the American National Standards Institute (ANSI) in the
which led to incompatibility between computers. To add USA, the British Standards Institute (BSI) in the UK, or DIN
insult to injury, it was often difficult to transfer data between in Germany). The standard may reach the International
computers that were nominally similar. Computers frequently Standards Organization (ISO) made up of members from the
employed different dialects of the same high-level language World’s national standards organizations.
and formatted data in different ways, encoded it in different
ways, and transmitted it in different ways. Even the builders 14.2.2 Open systems and standards
of the Tower of Babel had only to contend with different lan-
The International Standards Organization (ISO) has con-
guages. The development of standard protocols has much
structed a framework for the identification and design of
improved the situation.
protocols for existing or for future communications systems.
The issue of standardization arises not only in the world of
This framework enables engineers to identify and to relate
computer communications. Standardization is an important
together different areas of standardization. The OSI frame-
part of all aspects of information technology. For example,
work doesn’t imply any particular technology or method of
the lack of suitable standards or the non-compliance with
implementing systems.
existing standards has a dampening effect on the progress of
This framework is called the Reference Model for Open
information technology. Independent manufacturers do not
Systems Interconnection (ISO model for OSI) and refers to an
wish to enter a chaotic market that demands a large number
open system, which, in the ISO context, is defined as
of versions of each product or service produced to cater for all
the various non-standard implementations. Similarly, users a set of one or more computers together with the software, peripherals,
do not want to buy non-standard equipment or services that terminals, human operators, physical processes and means of data
do not integrate with their existing systems. transfer that go with them, which make up a single information
processing unit.
can devise standards for them. In this way, any manufacturer The engineer in the first factory describes to an assistant the
can produce equipment or software that performs a particu- nature of some work that is to be done. The assistant then
lar function. If designers use hardware and software con- dictates a letter to a secretary who, in turn, types the letter and
forming to well-defined standards, they can create an hands it to a courier. Here, the original task (i.e. communi-
information transmission system by putting together all the cating the needs of one engineer to another) is broken down
necessary parts. These parts may be obtained from more than into subtasks, each of which is performed by a different
one source. As long as their functions are clearly defined and person. The engineer doesn’t have to know about the actions
the way in which they interact with other parts is explicitly carried out by other people involved in the exchange of data.
stated, they can be used as the building blocks of a system. Indeed, it does not matter to the engineer how the informa-
Figure 14.9 illustrates the structure of the ISO reference tion is conveyed to their counterpart.
model for OSI, where two parties, A and B, are in communi- In the ISO model, communication between layers within a
cation with each other. The ISO model divides the task of system takes place between a layer and the layers immediately
communicating between two points between seven layers of above and below it. Layer X in System A communicates
protocol. Each layer carries out an action or service required only with layers X 1 and X 1 in System A (see Fig. 14.9).
by the layer above it. The actions performed by any given Layer 1 is an exception, because there’s no layer below it.
layer of the reference model are precisely defined by the ser- Layer 1 communicates only with layer 2 in A and with the
vice for that layer and require an appropriate protocol for the corresponding layer 1 in B at the other end of the communi-
layer between the two points that are communicating. This cations link. In terms of the previous analogy, the secretary
view conforms to current thinking about software and is who types the letter communicates only with the assistant
strongly related to the concept of modularity. who dictates it and with the courier who transports it.
In everyday terms, consider an engineer in one factory who Fig. 14.10 illustrates this example in terms of ISO layers,
wishes to communicate with an engineer in another factory. although this rather simple example doesn’t correspond
exactly to the ISO model. In particular, layers 3 to 6 are repre-
sented by the single layer called assistant.
Station A Station B Another characteristic of the ISO model is the apparent or
Level 7 Virtual link Level 7 virtual link between corresponding layers at each end of the
Application layer Application layer communication channel (this link is also called peer to peer).
Two corresponding layers at two points in a network are
called peer subsystems and communicate using layer proto-
Level 6 Virtual link Level 6 cols. Therefore, a message sent by layer X at one end of the
Presentation layer Presentation layer link is in the form required by the corresponding layer X at
the other end. It appears that these two layers are in direct
communication with each other, as they are using identical
Level 5 Virtual link Level 5 protocols. In fact, layer X at one end of the link is using the
Session layer Session layer
layers below it to transmit the message across the link. At the
other end, layer 1 and higher layers process the message until
it reaches layer X in the form it left layer X at the other end of
Level 4 Virtual link Level 4 the link. Returning to our analogy, the secretary at one
Transport layer Transport layer
factory appears to communicate directly with the secretary at
the other factory, because the language used in the letter is
appropriate to the task being performed by the two
Level 3 Virtual link Level 3
Network layer Network layer secretaries.
We now look at the functions performed by the seven
layers of the ISO reference model for open systems interconnec-
Virtual link tion, starting with the uppermost layer, the application layer.
Level 2 Level 2
Data link layer Data link layer
The application layer
The highest layer of the ISO reference model is the application
Level 1 Virtual link Level 1
layer, which is concerned with protocols for applications pro-
Physical layer Physical layer grams (e.g. file transfer, electronic mail). This layer represents
the interface with the end user. Strictly speaking, the OSI ref-
Figure 14.9 The basic reference model for open systems erence model is concerned only with communications and
interconnection. does not represent the way in which the end user employs the
580 Chapter 14 Computer communications
Level 7 Level 7
Consider another analogy. A Russian diplomat can phone
Engineer Engineer a Chinese diplomat at the UN, even though neither speaks
the other’s language. Suppose the Russian diplomat speaks to
a Russian-to-English interpreter who speaks to an English-
Level 3–6 Level 3–6 to-Chinese interpreter at the other end of a telephone link,
Assistant Assistant who, in turn, speaks to the Chinese diplomat. The diplomats
represent the applications layer process and talk to each other
about political problems. They don’t speak to each other
Level 2 Level 2 directly and use a presentation layer to format the data before
Typist Typist it is transmitted between them. The Chinese-to-English and
English-to-Russian translators represent the presentation
layer.
Level 1 Real connection Level 1 This analogy illustrates an important characteristic of the
Courier Courier
OSI reference model. The English-to-Chinese translator may
be a human or a machine. Replacing one with the other has
Figure 14.10 Illustrating the concept of layered protocols. no effect on the application layer above it or on the informa-
tion transfer layers below it. All that is needed is a mechanism
that translates English to Chinese, subject to specified perfor-
information. The protocol observed by the two users in mance criteria.
the application layer is determined entirely by the nature of The presentation layer’s principal function is the transla-
the application. Consider the communication between two tion of data from one representation to another. This layer
lawyers when they are using the telephone. The protocol used performs other important functions such as data encryption
by the lawyers is concerned with the semantics of legal jargon. and text compression.
Although one lawyer appears to be speaking directly to
another, they are using another medium involving other pro- The session layer
tocols to transport the data. In other words, there is no real Below the presentation layer sits the session layer. The session
person-to-person connection but a virtual person-to-person layer organizes the dialogue between two presentation layers.
connection built upon the telephone network. It establishes, manages, and synchronizes the channel
Another example of an application process is the operation between two application processes. This layer provides dia-
of an automatic teller at a bank. The operator is in communi- logue control of the type, ‘Roger, over’, in radio communica-
cation with the bank and is blissfully ignorant of all the tech- tions, and the mechanisms used to synchronize application
nicalities involved in the transaction. The bank asks the user communications (but synchronization actions must be initi-
what transaction they wish to make and the user indicates the ated at the application layer). The session layer resolves colli-
nature of the transaction by pushing the appropriate button. sions between synchronization requests. An example is
The bank may be 10 m or 1000 km away from the user. The ‘. . . did you follow that? . . .’, ‘. . . then I’ll go over it again.’
details involved in the communication process are entirely
hidden from the user; in the reference model the user is
operating at the applications level. The transport layer
The four layers below the session layer are responsible for car-
The presentation layer rying the message between the two parties in communica-
The application layer in one system passes information to the tion. The transport layer isolates the session and higher layers
presentation layer below it and receives information back from the network itself. It may seem surprising that four lay-
from this layer. Recall that a layer at one end of a network ers are needed to perform such an apparently simple task as
can’t communicate directly with the corresponding layer at moving data from one point in a network to another point.
the other end. Each layer except one communicates with only We are talking about establishing and maintaining connec-
the layer above it and with the layer below it. At one end of the tions across interlinked LANs and wide area networks with,
communications system the presentation layer translates possibly, major differences in technology and performance—
data between the local format required by the application not just communications over a simple wire. The reference
layer above it and the format used for transfer. At the other model covers both LANs and WANs that may involve com-
end, the format for transfer is translated into the local format munication paths across continents and include several dif-
of data for the application layer. By format we mean the way ferent communications systems. Figure 14.11 shows how the
in which the computer represents information such as char- ISO model for OSI caters for communications systems with
acters and numbers. intermediate nodes.
14.2 Protocols and computer communications 581
The transport layer is responsible for the reliable transmis- the layers above it. Complex communications systems may
sion of messages between two application nodes of a network have many paths between two points. The network layer
and for ensuring that the messages are received in the order in chooses the optimum path for a message to cross the network
which they were sent. The transport layer isolates higher lay- or for the establishment of a virtual connection. As an anal-
ers from the characteristics of the real networks by providing ogy, consider the postal system. Mail sent to a nearby sorting
the reliable economic transmission required by an applica- office might be directed to a more distant sorting office if the
tion independent of the characteristics of the underlying local office is congested and cannot cope with the volume of
facilities (for example, error detection/correction, multiplex- traffic. Similarly, in a data transmission network, transmis-
ing to reduce cost, splitting to improve throughput, and mes- sion paths are chosen to minimize the transit time of packets
sage reordering). The transport layer doesn’t have to know and the cost of transmission.
anything about how the network is organized.
Packet switching networks divide information into units The data link layer
called packets and then send them across a complex network The data link layer establishes an error-free (to a given prob-
of circuits. Some packets take one route through the network ability) connection between two adjacent points in a net-
and others take another. Consequently, it is possible for pack- work. Information may be transmitted from one end of a
ets to arrive at their destination out of sequence. The trans- network to the other end directly or via intermediate nodes in
port layer must assemble packets in the correct order, which a series of hops. The data link layer at one node receives a
involves storing the received out-of-sequence packets until message from the network layer above it and sends it via
the system is ready for them. the physical layer below it to the data link layer at the adja-
cent node.
The network layer The data link layer also detects faulty messages and auto-
The network layer serves the transport layer above it by con- matically asks for their retransmission. Protocols for the data
veying data between the local transport layer and the remote link layer and the physical layer below it were the first proto-
transport layer. The network layer is system dependent unlike cols to be developed and are now widely adopted. Data link
Station A Station B
layer protocols cover many different technologies: LANs independently and they have to be reassembled at their des-
(for example Ethernet-type networks using CMSA/CD) and tination (they may arrive out of order). A virtual circuit first
WANs (for example X.25). Systems often divide this layer establishes a route through the network and then sends all
into two parts, a higher level logical link control (LLC) and a the packets, in order, via this route. The difference between
lower level medium access control (MAC). circuit switching and a virtual circuit is that message
switching requires a connection for the duration of the con-
The physical layer nection, whereas the virtual circuit can be used by other
The lowest layer, the physical layer, is unique because it pro- messages.
vides the only physical connection between any two points in The service offered by the physical layer is a best effort ser-
a network. The physical layer is responsible for receiving the vice because it doesn’t guarantee reliable delivery of messages.
individual bits of a message from the data link layer and for Information sent on the physical medium might be lost or
transmitting them over some physical medium to the adja- corrupted in transit because of electrical noise interfering
cent physical layer, which detects the bits and passes them to with the transmitted data. On radio or telephone channels
the data link layer above it. The physical layer ensures that bits the error rate may be very high (1 bit lost in 103 transmitted
are received in the order they are transmitted. bits), whereas on fiber optic links it may be very low (1 bit lost
The physical layer handles the physical medium (e.g. wire, in 1012). Layers on top of the physical layer deal with imper-
radio, and optical fiber) and ensures that a stream of bits gets fections in this layer. The physical communication path may
from one place to another. The physical layer also imple- be copper wires, optical fibers, microwave links, or satellite
ments the connection strategy. There are three fundamental links.
connection strategies. Circuit switching establishes a perma- Remember that the ISO reference model permits modifi-
nent connection between two parties for the duration of the cations to one layer without changing the whole of a network.
information transfer. Message switching stores a message tem- For example, the physical layer between two nodes can be
porarily at each node and then sends it on its way across the switched from a coaxial cable to a fiber optic link without any
network. Circuit switching uses a single route through the alterations whatsoever taking place at any other level. After
network, whereas in message switching different messages all, the data link layer is interested only in giving bits to, or
may travel via different routes. Packet switching divides a mes- receiving them from, the physical layer. It’s not interested in
sage into units called packets and transmits them across the how the physical layer goes about its work.
network. Packet switching doesn’t maintain a permanent
connection through the network and is similar to message Standards and the ISO reference model for OSI
switching. Figure 14.12 shows how actual standards for the layers of the
Packet switching comes in two forms, the datagram and reference model have grown. This figure is hourglass shaped.
the virtual circuit. A datagram service transmits packets The bottom is broad to cater for the many low-level protocols
MESSAGE ENCAPSULATION
How do layered protocols deal with messages? In short, a be transmitted from one computer to another. For the sake of
higher layer wraps up a message from a lower layer in its own simplicity, we’ll assume that there aren’t any presentation or
data structure. session layers. The applications layer passes the data to the
The figure demonstrates how information is transported transport layer, which puts a header in front of the data and a
across a network by means of a system using layered trailer after it. The data has now been encapsulated in the
protocols. In (a) we have the application-level data that is to same way that we put a letter into an envelope. The header
and trailer include the address of the sender and the
Data receiver.
(a) Data at the applications layer The packet from the transport layer is handed to the
network layer which, in turn, adds its own header and trailer.
This process continues all the way down to the physical layer.
Transport layer header Data Trailer Now look at the process in reverse. When a network later
(b) Data at the transport layer receives a packet from the data link layer below it, the network
layer strips off the network layer header and trailer and uses
them to check for errors in transmission and to decide how to
Network layer header Transport layer header Data Trailer
handle this packet. The network layer then hands the packet to
(c) Data at the network layer the transport later about it, and so on.
14.2 Protocols and computer communications 583
Videotex syntax
User-defined syntax
CCITT message
Encryption
Connectionless mode
Bridges
HDLC BSC
Lap B
SDLC LLC
CSMA/CD Tokenbus
Token ring
V24 Physical layer standards
Figure 14.12 Standards for
RS499 X24/X21
RS232 the layers of the basic
reference model.
Transmitter 1 Receiver 1
Transmitter 2 Receiver 2
Communications Receiver 3
Transmitter 3
channel
Transmitter 4 Receiver 4
High-speed switch High-speed switch
(multiplexer) (demultiplexer)
(a) Time-division muiltiplexing.
time
(b) A time-division multiplexed signal consists of a sequence of time slots.
Amplitude
demonstrates time division multiplexing (TDM) in which the We’re already familiar with frequency division multiplex-
output of several transmitters are fed to a communications ing. All a radio station does is to change the frequency
channel sequentially. In this example, the channel carries a range of speech and music signals to a range that can be
burst of data from transmitter 1 followed by a burst of data transmitted over the airwaves. A radio receiver filters out one
from transmitter 2, and so on. At the receiving end of the link, range of frequencies from all the other frequencies and then
a switch routes the data to receiver 1, receiver 2, . . . , in order. converts them back to their original range.
If the capacity of the channel is at least four times that of Suppose that the bandwidth of the data from each trans-
each of the transmitters, all four transmitters can share the mitter extends from 0 to 20 kHz and the communications
same channel. All that’s needed is a means of synchronizing link has a bandwidth of 80 kHz. The output of the first trans-
the switches at both ends of the data link. mitter is mapped onto 0 to 20 kHz (no change), the output of
A simple TDM system gives each transmitter (i.e.channel) the the second transmitter is mapped onto 20 to 40 kHz, the out-
same amount of time whether it needs it or not. Such an put of the third transmitter is mapped onto 60 to 60 kHz, and
arrangement leads to an inefficient use of the available band- so on. A device that maps one range of frequencies onto
width. Statistical time division multiplexing allocates time slots another range of frequencies is called a modulator (we will
only to those channels that have data to transmit. Each time slot have more to say about modulators when we introduce the
requires a channel number to identify it, because channels aren’t modem later in this chapter).
transmitted sequentially.Statistical multiplexing is very effective. At the receiver end of the link, filters separate the incoming
Figure 14.15(c) demonstrates an alternative form of multi- signal into four bands and the signals in each of these bands are
plexing called frequency division multiplexing, FDM. In this converted back to their original ranges of 0 to 20 kHz. In practice
case the bandwidth of the channel is divided between the four it is necessary to leave gaps between the frequency bands because
transmitters. Unlike in TDM each transmitter has continuous filters aren’t perfect. Moreover, a bandpass channel doesn’t usu-
Half- and Full-duplex Channels access to the channel but it has ally start from a zero frequency.A typical FDM channel might be
access to only one-quarter of the channel’s bandwidth. from, say, 600 MHz to 620 MHz in 400 slices of 50 kHz each.
SYNCHRONIZING SIGNALS
Serial data transmission begs an obvious question. How is the If the duration of a single bit is T seconds, the length of a
stream of data divided up into individual bits and the bits character is given by the start bit plus seven data bits plus the
divided into separate words? The division of the data stream parity bit plus the stop bit 10T. Asynchronous transmission
into bits and words is handled in one of two ways: is clearly inefficient, because it requires 10 data bits to trans-
asynchronously and synchronously. mit 7 bits of useful information. Several formats for asynchro-
We met asynchronous serial systems when we described nous data transmission are in common use; for example, eight
the ACIA. In an asynchronous serial transmission system the data bits, no parity, one stop bit.
clocks at the transmitter and receiver responsible for dividing Two problems face the designer of a synchronous serial sys-
the data stream into bits are not synchronized. When the tem. One is how to divide the incoming data stream into indi-
transmitter wishes to transmit a word, it places the line in a vidual bits and the other is how to divide the data bits into
space state for one bit period. When the receiver sees this meaningful groups. We briefly look at the division of serial
start bit, it knows that a character is about to follow. The data into bits and return to the division of serial data into
incoming data stream can then be divided into seven bit peri- blocks when we introduce bit-oriented protocols.
ods and the data sampled at the center of each bit. The If the data stream is phase encoded, a separate clock can be
receiver’s clock is not synchronized with the transmitter’s derived from the received signal and the data extracted. The
clock and the bits are not sampled exactly in the center. If the diagram shows a phase-encoded signal in which the data signal
receiver’s clock is within approximately 4% or so of the trans- changes state in the center of each bit cell. A low-to-high tran-
mitter’s clock, the system works well. sition signifies a 1 and a high-to-low transition signifies a 0.
1 0 1 0 1 1 0 0 1
Data
Phase-encoded
data
14.4 The PSTN 587
Relative
Because analog channels can transmit
attenuation (dB) more than one signal simultaneously, the
PTTs have allocated certain parts of the
telephone channel’s bandwidth to signal-
The two shaded regions represent
the forbidden zone. Signals may ing purposes. Human speech doesn’t con-
not fall in these regions. tain appreciable energy within these
+1.7 signaling bands and a normal telephone
conversation doesn’t affect the switching
+0.9 and control equipment using these fre-
0 quencies.
–0.9
A consequence of the use of certain fre-
quencies for signaling purposes is that data
Frequency (Hz)
250 300 800 2400 3400 3600 transmission systems mustn’t generate sig-
nals falling within specified bands.
Figure 14.18 Limits of acceptance for attenuation–frequency distortion. Figure 14.19 shows the internationally
agreed restriction on signals transmitted by
equipment connected to the PSTN.
Relative Any signals transmitted in the ranges 500 to 800 Hz and
attenuation (dB) 1800 to 2600 Hz must have levels 38 dB below the maximum
0 in-band signal level.
–10
High-speed modems
Modems operate over a wide range of bit 11
rates. Until the mid 1990s most modems Digital 10
operated between 300 bps to 9600 bps. Low data 01
bit rates were associated with the switched 00 Time
telephone network where some lines were
very poor and signal impairments reduced Modulated
the data rate to 2400 bps or below. The signal
higher rates of 4800 bps and 9600 bps were
generally found on privately leased lines
where the telephone company offered a
higher grade of service. Time
The growth of the Internet provided a
mass market for high-speed modems.
Improved modulation techniques and bet-
ter signal-processing technology has had a
massive impact on modem design. By the
mid-1990s, low-cost modems operated at Next level 10 Next level 00 Next level 11 Next level 10
14.4 kbaud or 28.8 kbaud. By 1998, advance advance advance advance
phase by 180° phase by 0° phase by 270° phase by 180°
modems capable of operating at 56 kbaud
over conventional telephone lines were Figure 14.23 Differential phase modulation.
590 Chapter 14 Computer communications
NOISE
Noise is the generic term for unwanted signals that are added The signal-to-noise ratio of a channel is defined as
to the received signal. One source of noise, called thermal 10log10(S/N), where S is the signal power and N the noise
noise, is caused by the random motion of electrons in matter. power. Because the signal-to-noise ratio is a logarithmic value,
Thermal noise appears as the background hiss on telephone, adding 10 dB means that the ratio increases by a factor of 10.
radio, and TV circuits, and is called Gaussian noise because of Signal-to-noise ratio determines the error rate over the channel.
its statistical properties. The amount of thermal noise depends These noises are additive because they are added to the
on the temperature of the system and its bandwidth. Only by received signal. Multiplicative noise is caused by multiplying the
cooling the system or by reducing its bandwidth can we received signal by a noise signal.The most common
reduce the effects of thermal noise. Receivers that pick up the multiplicative noise is phase jitter caused by random errors in the
weak signals from distant space vehicles are cooled in liquid phase of the clock used to sample the received signal.All these
nitrogen to minimize the effects of thermal noise. In general, sources of noise make it harder to distinguish between signal
the contribution of thermal noise to all other forms of noise is levels in a digital system.
not usually the limiting factor in terrestrial switched telephone
networks. CHANNEL CAPACITY
Another source of noise is cross-talk picked up from other
A channel has a finite bandwidth that limits its switching
circuits due to electrical, capacitive, or magnetic coupling. We
speed. The maximum data rate is given by 2B ⋅ log2L, where B
can think of cross-talk as crossed lines. Careful shielding of
is the channel’s bandwidth and L is the number of signal levels.
cables and isolation of circuits can reduce cross-talk. Impulsive
If the bandwidth is 3000 Hz and you are using a signal with
noise produces the clicks and crackles on telephone circuits
1024 discrete signal levels, the maximum data rate is
and is caused by transients when heavy loads such as elevator
2 3000 log21024 6000 10 60 kbps. This figure
motors are switched near telephone circuits, lightning, and
relates the capacity of a noiseless channel to its bandwidth.
dirty and intermittent electrical connections. Impulsive noise
You can increase a channel’s capacity by using more signal
accounts for the majority of transmission errors in telephone
levels. Claude Shannon investigated the theoretical capacity
networks. The diagram illustrates impulsive noise.
of a noisy channel in the late 1940s and showed that its
capacity is limited by both its bandwidth and the
Amplitude
noise level. Shannon proved that the theoretical
capacity of a communications channel is given by
B • log2(1 S/N), where B is the bandwidth, S is the
signal power, and N is the noise power. A telephone
line with a bandwidth of 3000 Hz and a signal-to-
noise ratio of 30 dB has a maximum capacity of
Time 3000 log2(1 1000) 29 900 bps.
Shannon’s theorem provides an absolute limit that
can’t be bettered. Modern modems can apparently do
White noise Impulsive noise
better than theory suggests by compressing data
(thermal noise) before transmission. Moreover, the noise on
telephone lines tends to be impulsive or bursty, whereas the
When the transmitted signal reaches the receiver, some of
theoretical calculations relating channel capacity to noise
its energy is echoed back to the transmitter. Echo cancellers at
assume that the noise is white noise (e.g. thermal noise). By
the ends of a telephone channel remove this unwanted signal.
requesting the retransmission of data blocks containing errors
If they are poorly adjusted, the receiver gets the transmitted
due to noise bursts, you can increase the average data rate.
signal plus a time-delay and distortion of the data.
available for the price of a 1200 bps modem only a decade 90 out of phase (90 represents 1⁄4 of 360—hence quadrature).
earlier. Figure 14.24 demonstrates a 32-point QAM constellation in
High-speed modems operate by simultaneously changing which each point represents one of 32 discrete signals. A signal
the amplitude and phase of a signal. This modulation tech- element encodes a 5-bit value, which means a modem with a
nique is called quadrature amplitude modulation (QAM). A signaling speed of 2400 baud can transmit at 12 000 bps.
QAM signal can be represented mathematically by the expres- Figure 14.25 demonstrates that the points in a QAM con-
sion S ⋅ sin(t) C ⋅ (t), where S and C are two constants. stellation are spaced equally. Each circle includes the space
The term quadrature is used because a sine wave and a cosine that is closer to one of the signal elements than to any other
wave of the same frequency and amplitude are almost identi- element. When a signal element is received, the values of S and
cal. The only difference is that a sine wave and a cosine wave are C are calculated and the value of the signal element
14.4 The PSTN 591
14.4.3 High-speed
transmission over the PSTN
Figure 14.24 The 32-point QAM constellation. The backbone of the POTS (plain old
telephone system) is anything but
plain. Data can be transmitted across
the World via satellite, terrestrial microwave links, and fiber
optic links at very high rates. The limitation on the rate at
which data can be transmitted is known as the last mile; that
is, the connection between your phone and the global net-
work at your local switching center.
ISDN
A technology called integrated services digital network (ISDN)
was developed in the 1980s to help overcome the bandwidth
limitations imposed by the last mile. ISDN was intended for
professional and business applications and is now available to
anyone with a PC. There are two variants of ISDN—basic
rate services and primary rate services. The basic rate service
Figure 14.25 The packing of points in a QAM constellation. is intended for small businesses and provides three fully
Amplitude
duplex channels. Two of these so-called B channels can carry
Amplitude voice or data and the third D channel is used to carry control
difference information. B channels operate at 64 kbps and the D chan-
nel at 16 kbps.
Time ISDN’s early popularity was due to its relatively low cost
and the high quality of service it offers over the telephone line.
You can combine the two B channels to achieve a data rate of
Phase
128 kbps. You can even use the D channel simultaneously to
difference provides an auxiliary channel at 9.6 kbps. Note that ISDN can
(a) Phase and amplitude difference. handle both voice and data transmission simultaneously.
Several protocols have been designed to control ISDN
systems. V.110 and V.120 are used to connect an ISDN commu-
e nications devices to high-speed ISDN lines. ISDN took a long
tud
pli time from its first implementation to its adoption by many
Am
businesses. However, newer technologies plus cable networks
Pha
have been devised to overcome the last mile problem and ISDN
se
ADSL
If there’s one thing you can guarantee in the computing
(b) Effect of phase ad amplitude
errors on a QAM signal. world, it’s that yesterday’s state-of-the-art technology will the
current standard and a new state-of-the-art technology will
Figure 14.26 Effect of errors on a QAM point. emerge. Just as ISDN was becoming popular in the late 1990s,
592 Chapter 14 Computer communications
MODEM STANDARDS
In the USA, modem standards were dominated by the Bell ●
Originate/answer The originating modem is at the end of the
System’s de facto standards. Outside the USA, modem channel that carried out the dialing and set up the channel.The
standards were determined by the International Consultative answer modem is at the end of the channel that receives the
Committee on Telegraphy and Telephony (CCITT). Over time, call. Many modems can both originate calls and answer calls, but
CCITT standards became dominant when high-speed modems some modems are answer-only and cannot originate a call.
were introduced. Originate and answer modems employ different frequencies to
Early modems operated at data rates of 75, 300, 600, 1200, represent 1 s and 0 s.
2400, 4800, and 9600 baud. Modern modem rates are 14 400, ●
Asynchronous/synchronous An asynchronous data
19 200, 28 800, 36 600, and 56 000 baud. Modem standards transmission system transmits information as, typically, 8-
define the following. bit characters with periods of inactivity between characters.
●
Modulation method Low-and medium speed A synchronous system transmits a continuous stream of bits
modems use frequency modulation. High-speed modems without pauses, even when the bits are carrying no user
employ phase modulation and QAM (quadrature amplitude information.
modulation). Examples of modem standards
●
Channel type Some modems operate in a half-duplex ●
CCITT V.32 2400 baud, 4800 or 9600 bps, QAM
mode, permitting a communication path in only one ●
CCITT V.33 2400 baud, 14 400 bps, QAM
direction at a time. Others support full-duplex operation
with simultaneous, two-way communication. Some systems ●
CCITT V.34 2400 baud, 28 800 bps, QAM
permit a high data rate in one direction and a low data rate ●
CCITT V.90 56 000 bps (this standard uses analog transmis-
in the other, or reverse, direction. sion in one direction and digital in the other)
The first universal standard for the physical layer was publishedThe following control signals implement most of the
in 1969 by the Electronic Industry Association (EIA) and is important functions of an R232 DTE to DCE link.
known as RS232C (Recommended Standard 232 version C). Request to send (RTS) is a signal from the DTE to the DCE.
This standard was intended for links between modems and When asserted, RTS indicates to the DCE that the DTE wishes
computers but was adapted to suit devices such as printers. to transmit data to it.
RS232 specifies the plug and socket at the modem and the Clear to send (CTS) is a signal from the DCE to the DTE and,
digital equipment (i.e. their mechanics), the nature of the when asserted, indicates that the DCE is ready to receive data
transmission path, and the signals required to control the from the DTE.
operation of the modem (i.e. the functionality of the data link). Data set ready (DSR) is a signal from the DCE to the DTE
In the standard, the modem is known as data that indicates the readiness of the DCE.When this signal is
communications equipment (DCE) and the digital equipment asserted, the DCE is able to receive from the DTE. DSR
to be connected to the modem is known as data terminal indicates that the DCE (usually a modem) is switched on and is
equipment (DTE). in its normal functioning mode (as opposed to its self-test
mode).
DTE DCE Network DCE DTE Data terminal ready (DTR) is a
signal from the DTE to the DCE. When
Terminal Modem PSTN Modem Computer asserted, DTR indicates that the DTE is
ready to accept data from the DCE. In
systems with a modem, it maintains
RS232
data link Non-digital signals the connection and keeps the channel
open. If DTR is negated, the
Because RS232 was intended for DTE to DCE links, its communication path is broken. In everyday terms, negating
functions are very largely those needed to control a modem. DTR is the same as hanging up a phone.
frame. Obviously, if one node is already in the process of is called deference. Once a station has started transmitting it
sending a message, other nodes are not going to attempt to acquires the channel, and after a delay equal to the end-to-
transmit. A collision will occur only if two nodes attempt to end round trip propagation time of the network, a successful
transmit at nearly the same instant. Once a node has started transmission without collision is guaranteed.
transmitting and its signal has propagated throughout the
network, no other node can interrupt. For almost all systems
this danger zone, the propagation time of a message from one
end of the network to the other, is very small and is only a tiny 14.6 Fiber optic links
fraction of the duration of a message.
The contention mechanism adopted by Ethernet is called The very first signaling systems used optical technology—the
Carrier Sense Multiple Access with Collision Detect signal fire, the smoke signal and later the semaphore. Such
(CSMA/CD). When an Ethernet station wishes to transmit a transmission systems were limited to line-of-sight operation
packet, it listens to the state of the bus. If the bus is in use, it and couldn’t be used in fog. From the middle of the nine-
waits for the bus to become free. In Ethernet terminology this teenth century onward, electrical links have made it possible
to communicate over long distances independently of
weather conditions.
CABLE TERMINOLOGY Today, the confluence of different technologies has, once
The physical dimensions, the electrical or optical again, made it possible to use light to transmit messages.
characteristics, and the connectors of the cables used to Semiconductor technology has given us the laser and light-
implement the physical medium of an Ethernet connection emitting diode (LED), which directly convert pulses of elec-
have been standardized. Some of the common standards tricity into pulses of light in both the visible and infrared
are as follows. parts of the spectrum. Similarly, semiconductor electronics
10Base2 10 Mbps thin Ethernet cable (similar to TV has created devices that can turn light directly into electricity
antenna cable). so that we can detect the pulses of light from a laser or LED.
10BaseT 10 Mbps switched Ethernet cable. Used with The relatively new science of materials technology has given
Ethernet routers and hubs. The cable is similar us the ability to create a fine thread of transparent material
to telephone cable with the same RJ45 jack called an optical fiber. The optical fiber can pipe light from its
plug. source to its detector just as the coaxial cable pipes electronic
100BaseT 100 Mbps Ethernet cable using twisted pair signals from one point to another.
cable (similar to 10BaseT). Seemingly, light must be transmitted in a straight line and
100BaseF 100 Mbps fiber Ethernet cable. therefore can’t be used for transmission over paths that turn
corners or go round bends. Fortunately, one of the properties
of matter (i.e. the speed of
Medium n1 Refracted ray light in a given medium)
θ1 θ 2 < θc makes it possible to transmit
Medium of refractive Medium of refractive light down a long thin cylin-
index n1 index n1 der of material like an optical
Medium of refractive Medium of refractive fiber. Figures 14.30(a) and (b)
index n2 index n2
demonstrate the effect of a
θ2 θ2
light beam striking the surface
Reflected ray of an optically dense material
θ2>θc in a less dense medium such as
(a) θ2<θc Incident ray leaves the fiber. (b) θ2 >θc Incident ray experiences air. Light rays striking the sur-
total internal reflection. face at nearly right angles to
the surface pass from the
Medium n1 material into the surrounding
θ2 θ2 air after being bent or refracted
Medium n2 θ2 θ2 as Fig. 14.30(a) demonstrates.
The relationship between
Medium n1
the angle of incidence θ2 and
(c) Propagation of a ray along a fiber by repeated total internal reflection. the angle of refraction θ1 is
cos(θ2)/ cos(θ1) index of
Figure 14.30 Total internal reflection. refraction.
596 Chapter 14 Computer communications
Light rays striking the surface at a shallow angle suffer total There are several types of optical fiber, each with its own
internal reflection and are reflected just as if the surface (i.e. the special properties (e.g. attenuation per km, bandwidth, and
boundary between the optically dense material and the air) cost). Two generic classes of optical fiber are the multimode
were a mirror. The critical angle, θC, at which total internal and single-mode fibers. Multimode fibers operate as
reflection occurs, is a function of the refractive index of the described by bouncing the light from side to side as it travels
material through which the light is propagated and the surface down the fiber. Because a light beam can take many paths
material at which the reflection occurs. The same phenome- down the cable, the transit time of the beam is spread out and
non takes place when a diver looks upward. Total internal a single pulse of light is received as a considerably broadened
reflection at the surface of the water makes the surface look pulse. Consequently, a multimode fiber cannot be used at
like a mirror. Figure 14.30(c) demonstrates how light is prop- very high pulse rates.
agated along the fiber by internal reflections from the sides. A single-mode fiber has a diameter only a few times that of
By drawing out a single long thread of a transparent mater- the wavelength of the light being transmitted (a typical diam-
ial such as plastic or glass, we can create an optical fiber as illus- eter is only 5 m). As a single-mode fiber does not support
trated in Fig. 14.31. The optical fiber consists of three parts: more than one optical path through the fiber, the transmitted
pulse is not spread out in time and a very much greater band-
●
the core itself that transmits the light
width can be achieved.
●
a cladding that has a different index of reflection to the core The advantages of a fiber optic link (Fig. 14.32), over cop-
and hence causes total internal reflection at its interface with per cable and radio technologies are as follows.
the core
Bandwidth The bandwidth offered by the best fiber optic
●
a sheath that provides the optical fiber with protection and
links is approximately 1000-fold greater than that offered by
mechanical strength.
coaxial cable or microwave radio links.
The diameter of the optical fiber is very small indeed— Attenuation High-quality optical fibers have a lower attenu-
often less than 100 m. Sometimes there is an abrupt junc- ation than coaxial cables and therefore fewer repeaters are
tion between the core and cladding (a step-index fiber) and required over long links such as undersea cables.
sometimes the refractive index of the material varies contin-
Mechanics The optical fiber itself is truly tiny and therefore
uously from the core to the cladding (a graded index fiber).
lightweight. All that is needed is a suitable sheath to protect it
Graded index fibers are difficult to produce and therefore
from mechanical damage or corrosion. It is therefore cheaper
more expensive than step-index fibers, but they offer lower
to lay fiber optic links than coaxial links.
attenuation and a higher bandwidth.
Fiber optic links can be created from many materials but a Interference Fiber optic links are not affected by electromag-
fiber drawn from high-quality fused quartz has the least netic interference and therefore they do not suffer the effect
attenuation and the greatest bandwidth (e.g. the attenuation of noise induced by anything from nearby lightening strikes
can be less than 1 db/km). The bandwidth of fiber optic links to cross-talk from adjacent cables. Furthermore, because they
can range from 200 MHz to over 10 GHz (109 Hz) which do not use electronic signals to convey information, there’s
represents very high data rates indeed. no signal leakage from an optical fiber and therefore it’s much
harder for unauthorized persons to eavesdrop.
Source-to-fiber Fiber-to-detector
connector connector
Output
Input signal
signal Optical cable
Transmitter Receiver
module module
Fiber-to-fiber Figure 14.32 The fiber optic
connector link.
14.7 Wireless links 597
the frequency of the radio signals used to transport data and higher than the downlink frequency). Table 14.3 describes
whether or not they are terrestrial or satellite links. Table 14.2 some of the frequency bands used by satellites. Suitable
illustrates a portion of the electromagnetic spectrum used to microwave or coaxial links transmit data from a local source
transmit information. to and from the national satellite terminals.
Signals in the frequency range 100 kHz to about 1000 MHz Satellites are used to transmit television signals, telephone
(i.e. 1 GHz) are used for terrestrial radio and television broad- traffic, and data signals. Data signals can be transmitted at
casting. Frequencies above 1 GHz are called microwaves and are rates greater than 50 Mbps, which is many times faster than
used for applications ranging from radar to information trans- that offered by the public switched telephone network but
mission to heating. Microwaves have two important properties: rather less than that offered by the fiber optic link (and much
they travel in straight lines and they can carry high data rates. less than that offered by the super data highways). Satellite
Because microwaves travel in straight lines, the Earth’s cur- links can be replaced by fiber optic links. The advantage of the
vature limits direct links to about 100 km or so (depending satellite is its ability to broadcast from one transmitter to
on the terrain and the height of the transmitter and receiver many receivers.
dishes). Longer communication paths require repeaters— Satellite systems are very reliable. The sheer size of the
microwaves are picked up by an antenna on a tower, ampli- investment in the satellite and its transport vehicle means that
fied, and transmitted to the next tower in the chain. Few engineers have spent much time and energy in designing reli-
industrial cities are without some tall landmark festooned able satellites. Unfortunately, a satellite doesn’t have an infinite
with microwave dishes. life span. Its solar power panels gradually degrade due to the
Since the late 1960s satellite microwave links have become effects of the powerful radiation fields experienced in space,
increasingly more important. A satellite placed in geostationary and it eventually runs out of the fuel required by its rocket jets
orbit 35 700 km above the equator takes 24 hours to orbit the to keep it pointing accurately at the surface of the Earth.
Earth. Because the Earth itself rotates once every 24 hours, a Satellites operate mostly in the 1 to 10 GHz band.
satellite in a geostationary orbit appears to hang motionless in Frequencies below 1 GHz are subject to interference from
space and remain over the same spot. Such a satellite can be terrestrial sources of noise and the atmosphere attenuates fre-
used to transmit messages from one point on the Earth’s surface quencies above 10 GHz. Satellite users have to take account of
to another point up to approximately 12 000 km away, as a problem imposed by the length of the transmission path
illustrated in Fig. 14.33. (about 70 000 km). Microwaves traveling at the speed of light
Theoretically three satellites each separated by 120 could (300 000 km/s) take approximately 250 ms to travel from the
completely cover a band around the Earth. However, source to their destination. Consequently it is impossible to
receivers at extreme limits of reception would have their receive a reply from a transmission in under 0.5 s. Data trans-
dishes pointing along the ground at a tangent to the surface of mission modes using half duplex become difficult to operate
the Earth. As the minimum practical angle of elevation is due to the long transit delay and the large turnaround time.
about 5, satellites should not be more than about 110 apart Satellite data links are better suited to full-duplex operation.
for reliable operation. Data is transmitted up to the satellite High geosynchronous orbits are not the only option avail-
on the uplink frequency, regenerated, and transmitted down able. Figure 14.34 shows that satellites can be placed in one of
again at the downlink frequency (the uplink frequency is three types of orbit. Satellites in low and medium Earth orbits
Note: kHz = kilohertz = 103 Hz, MHz = megahertz = 106 Hz, GHz = gigahertz = 109 Hz.
Statellite
(a) Organization of a satellite link.
Exchange Exchange
Telephone
(b) The geostationary orbit. (c) The time delay in sending a message between
two ground stations.
Satellite in
35 700 km geostationary
orbit
Earth Diameter of the m
Earth (6400 km) 00k
357
A B
Period
of satellite
orbit is Transmission delay (265 ms)
24 hours
(d) The satellite's field of view is approximately one-third of the Earth's surface.
appear to move across the sky, which means that when your A problem with transmitting on a single frequency is the
satellite drops below the horizon you have to switch the link vulnerability of the radio link; the signal can be easily observed
to another satellite. Low Earth orbits require lots of satellites and it can be jammed. In the Second World War attempts were
for reliable communications, but the latency is very low. made to control torpedoes by radio links. It was clear that using
Fewer satellites are required to cover the World from medium a single frequency would not be a good idea because it could be
Earth orbits and the latency is about 0.05 to 0.14 s. jammed by transmitting another signal at the same frequency.
A solution to the problem of jamming was suggested by Hedy
14.7.1 Spread spectrum technology Lamarr and George Antheil; they proposed changing the fre-
quency of the transmitter and receiver in synchrony to avoid
Although a radio signal is transmitted at a specific frequency, transmitting on a single frequency. A clockwork-driven fre-
the signal does, in fact, occupy a range of frequencies because quency selector could be used in both the transmitter and
of the modulation process. For example, an AM signal trans- receiver to change frequency every few seconds.
mitted at frequency fc, occupies the frequency range fc fm to Antheil and Lamar’s proposal was not put into practice until
fc fm, where fm is the maximum modulating frequency. the early 1960s when the US military implemented it to provide
14.8 The data link layer 599
Geosynchronous
Earth orbit Orbit 22 300 miles
(35 700 km).
Latency 0.24 s
secure radio links. The frequency changes (or hops) were made next step is to show how the data link layer handles entire
as a random sequence using electronics rather than mechanical messages and overcomes imperfections in the physical layer.
switching. Because the frequency of the transmitted signal We are going to look at two popular protocols for the data
rapidly varies over a finite range within a band, the signal energy link layer—a bit-oriented protocol and a protocol used by the
is distributed throughout the band. Consequently, the system is Internet.
often called spread spectrum technology.
An advantage of spread spectrum technology is that the
wireless link is less susceptible to interference. If an interfer-
14.8.1 Bit-oriented protocols
ing signal is at a constant frequency, it will affect the received A bit-oriented protocol handles pure binary data (i.e. strings
signal only when the interfering and data signal frequencies of 1s and 0s or arbitrary length). Binary data can be a core
coincide. Moreover, if you have several spread spectrum dump, a jpeg image, a program in binary form, a floating
frequencies occupying the same band at the same time, inter- point number, and so on. When the data is stored in a pure
ference will take place only when the two or more frequencies binary form it’s apparently impossible to choose any particu-
are the same at the same time. lar data sequence as a reserved marker or flag, because that
The frequency 2.4 GHz has now been allocated to spread sequence may also appear as valid data. We explain how the
spectrum signals2 and IEEE standard 802.11 was developed high-level data link control protocol (HDLC) delivers any
to provide a short-range data commutations facility for lap- pattern of bits between two nodes in a data link by means of
tops and similar devices. The standard uses the same type of a technique called bit stuffing.
collision control mechanism as the Ethernet. The key to understanding the HDLC protocol is the HDLC
Fourteen channels in the 2.4 GHz band are reserved for the frame, the smallest unit of data that can be sent across a net-
802.11 systems. Each channel is separated by 5 MHz. However, work by the data link layer. Frames are indivisible in the sense
these channels indicate only the center frequency used by a that they cannot be subdivided into smaller frames, just as an
transmitter–receiver pair. An actual wireless link uses a band- atom can’t be divided into other atoms. However, a frame is
width of 30 MHz and, therefore, takes up five channels. composed of several distinct parts just as an atom is made up
of neutrons, protons and electrons. Figure 14.35 illustrates
the HDLC format of a single frame.
14.8 The data link layer
Now that we’ve looked at some of the ways in which bits are 2
The 2.4 GHz band is shared by other users such a Bluetooth, baby
moved from one point to another by the physical layer, the monitors, and cordless phones.
600 Chapter 14 Computer communications
HISTORY OF WI-FI
1997 IEEE Standard 802.11 specifies a wireless LAN using are more readily absorbed than those at 2.4 GHz and 802.11a-
2.4 GHz with data rates of 1 and 2 MHz. Apple computer based systems have not achieved the same success at 802.11b.
provides the first operating system to support Wi-Fi 2002 Intel’s Centrino chipset had a remarkable effect on
(called AirPort). the wireless LAN market. Centrino consists of a low-power
1999 Standard 802.11b with a data rate of 11 Mbits/s is CPU, an interface chip, and an 802.11b chip. This chipset was
finalized. The maximum actual data rate is approximately used in countless laptops to provide portability with low-
5 Mbits/s. This was the first Wi-Fi standard to become widely power consumption on Wi-Fi LAN connectivity.
accepted and it paved the way for low-cost wireless 2003 Standard 802.11g combines the lower frequency
networks. advantage of 802.11b and the modulation rate of 802.11a to
1999 The 802.11a standard operates at 5 GHz and provides provide a raw bit rate of 54 Mbits/s in the 2.4 MHz band.
a maximum raw data rate of 54 MHz, corresponding to a prac- Equally importantly, it is backward compatible with 802.11b.
tical user data rate of about 20 Mbits/s. Radio waves at 5 GHz By the end of 2003, companies were producing tri-mode Wi-Fi
adaptors capable of accessing 802.11a/b/g networks.
Flag Flag only binary sequence that may not appear in a stream of
01111110 Address Control Information FCS 01111110 HCLD data is the frame opening or closing flag 01111110.
A simple scheme called zero insertion and deletion or bit stuff-
ing ensures that HDCL data is transparent. Figure 14.36 shows
The information field how bit stuffing operates. Data from the block marked trans-
is optional
mitter is passed to an encoder marked zero insertion that oper-
Figure 14.35 The HDLC frame format. ates according to a simple algorithm. A bit at its input is passed
unchanged to its output unless the five preceding bits have all
been 1s. In the latter case, two bits are
Transmitter + + Data link
+ + Receiver
passed to the output: a 0 followed by the
input bit. As an example consider the
sequence 010111111011 containing the
forbidden flag sequence. If the first bit is
Zero Flag Flag removal Zero the leftmost bit, the output of the encoder
insertion insertion (detection of deletion is 0101111101011.
start of frame)
Data to be transmitted
The bit-insertion mechanism guaran-
Example tees that any binary sequence can appear in
Data to be transmitted 010111010111111111111111111110101 the input data but a flag sequence can’t
Transmitted frame 0111111 0 0 10 111 0 1011111 011111 01 111101111100 1 0 1 0 1111110 occur in the output data stream because
Opening Closing five 1s are always terminated by 0. Flags
flag flag intended as frame delimiters are appended
to the data stream after the encoding block.
Inserted zeros
At the receiving end of the link, open-
Figure 14.36 Bit insertion and deletion. ing and closing flags are detected and
removed from the data stream by the flag
Each frame begins and ends with a unique 8-bit flag, removal circuit. The data stream is then passed to the block
01111110. Whenever a receiver detects the sequence marked zero deletion for decoding, which operates in the
01111110, it knows that it has located the start or the end of a reverse way to zero insertion: if five 1s are received in succes-
frame. An error in transmission may generate a spurious flag sion, the next bit (which must be a 0) is deleted. For example,
by converting (say) the sequence 01101110 into 01111110. In the received sequence 0101111101011111000 is decoded as
such cases, the receiver will lose the current frame. Due to the 01011111101111100.
unique nature of the flag, the receiver will automatically Now that we’ve described how a data stream is divided into
resynchronize when the next opening flag is detected. individual bits and the bits into frames, the next step is to
HDLC puts no restrictions whatsoever on the nature of the look at the HDLC frame. Figure 14.35 demonstrates that the
data carried across the link. Consequently, higher levels of the HDLC frame is divided into five logical fields: an address
reference model can transmit any bit sequence they wish field, a control field, an optional information field, and a
without affecting the operation of the data link layer. The frame check sequence (FCS).
14.8 The data link layer 601
until it is invited to do so by the master. A control field with a locally calculated remainder. The calculated remainder is
P/F 1 sent by the master indicates such an invitation. compared with the received remainder in the FCS field. If
When a control field is sent by a secondary station, the P/F they match, the frame is assumed to be valid. Otherwise the
bit is defined as a final bit and, when set, indicates that the frame is rejected.
current field is the last frame of the series. In other words, a You may wonder how the FCS is detected, because the I-
slave sets P/F to 1 when it has no more frames to send. field may be of any length and no information is sent to indi-
The state variables N(S) and N(R) in the control field are cate its length directly. In fact, the FCS field cannot be
3-bit numbers in the range 0 to 7 that define the state of the detected. The receiver assembles data until the closing flag
system at any instant. N(S) is called the send sequence number has been located and then works backward to identify the
and N(R) is called the receive sequence number. FCS and the I-field.
Only I-frames contain a send sequence number to label
the current information frame; for example, if N(S) 101 HDLC message exchange
the frame is numbered 5. When this frame is received the
The HDLC protocol supports several configurations. Here we
value of N(S) is examined and compared with the previous
consider only the unbalanced master–slave mode (NRM)
value. If the previous value was 4, the message is received
where a slave may initiate transmission only as a result of
in sequence. But if the value was not 4, an error has
receiving explicit permission from the master.
occurred. The sequence count is modulo 8, so that it goes
Before we continue, it’s necessary to define the four mes-
67012345670 . . . . . . Consequently, if eight messages are lost,
sages associated with a supervisory frame. Table 14.5 shows
the next value of N(S) will apparently be correct.
how the four S-frames are encoded.
The receive sequence number, N(R), is available in both S
The RR (receiver ready) frame indicates that the station
and I control fields. N(R) indicates the number of the next I-
sending it is ready to receive information frames and is equiv-
frame that the receiver expects to see; that is, N(R) acknowl-
alent to saying,‘I’m ready.’ The REJ (reject) frame indicates an
edges I-frames up to and including N(R) 1. Suppose station
error condition and usually implies that one or more frames
A is sending an I frame to B with N(S) 3 and N(R) 6.
have been lost in transmission. The REJ frame rejects all
This means that frame A is sending frame number 3 and has
frames, starting with the frame numbered N(R). Whenever a
safely received frames up to 5 from B. A expects to see an infor-
station receives an REJ frame, it must go back and retransmit
mation frame from B with the value of N(S) equal to 6.
all messages after N(R)1. Sending all these messages is
By means of the N(R) and N(S) state variables, it’s impos-
sometimes inefficient, because not all frames in a sequence
sible to lose a frame without noticing it, as long as there are
may have been lost.
not more than seven outstanding I-frames that have not been
The RNR (receiver not ready) frame indicates that the
acknowledged. If eight or more frames are sent, it is impossi-
station is temporarily unable to receive information frames.
ble to tell whether a value of N(R) i refers to frame i or to
RNR is normally used to indicate a busy condition (e.g. the
frame i 8. It is up to the system designer to ensure that this
receiver’s buffers may all be full). The busy condition is
situation never happens. We will soon look at how N(S) and
cleared by the transmission of an RR, REJ, or SREJ frame.
N(R) are used in more detail.
An I-frame sent with the P/F bit set also clears the busy
FCS field condition.
The selective reject (SREJ) frame rejects the single frame
Recall that the data link layer is built on top of an imperfect
numbered N(R) and is equivalent to ‘Please retransmit frame
physical layer. Bits transmitted across a physical medium may
number N(R)’. The use of SREJ is more efficient than REJ,
become corrupted by noise with a 1 being transformed to a 0
because the latter requests the retransmission of all frames
or vice versa. The error rate over point-to-point links in a local
after N(R) as well as N(R).
area network may be of the order of 1 bit lost in every 1012 bits.
Error rates over other channels may be much worse than this.
HDLC provides a powerful error-detection mechanism.
At the receiver, the bits of the address field, control field, and Control bit
1 2 3 4 5 6 7 8 S-frame type
I-field are treated as the coefficients of a long polynomial,
which is divided by a polynomial called a generator. The
1 0 0 0 P/F ← N(R) → RR receiver ready
HDLC protocol uses the CCITT generator
1 0 0 1 P/F ← N(R) → REJ reject
10001000000100001 or x16 x12 x5 1. The result of the
1 0 1 0 P/F ← N(R) → RNR receiver not ready
division yields a quotient (which is thrown away) and a 16-bit
remainder, which is the 16-bit FCS appended to the frame. 1 0 1 1 P/F ← N(R) → SREJ selective reject
At the receiver, the message bits forming the A-, C-, and
I-fields are also divided by the generator polynomial to yield Table 14.5 The format of the S-frame.
14.8 The data link layer 603
Figure 14.38 demonstrates a sequence of HDLC frame request. When the master sends a message with P 1, it
exchanges between A (the master) and B (the slave) in a half- starts a timer. If a response is not received within a certain
duplex mode. Each frame is denoted by type, N(S),N(R), period, the timeout, the master station takes action. In this
P/F, where type is I, RR, REJ, RNR, or SREJ. Typical HDLC case, it sends a supervisory frame (RR,,2,P) to force a
frames are response. The secondary station replies with another super-
visory frame (REJ,,5,F) and the master then repeats the lost
Type N(S), N(R)P/F message.
A selective reject frame, SREJ,,N(R), rejects only the mes-
I, 5, 0 I-frame, N(S) 5, N(R) 0, sage whose send sequence count is N(R). Therefore,
I, 5, 0, P I-frame, N(S) 5, N(R) 0, poll bit set by master SREJ,,N(R) is equivalent to ‘Please repeat your message with
REJ,, 4, F S-frame, N(S) 4, reject, final bit set by slave N(S) N(R).’ If a sequence of messages are lost, it is better to
use REJ,,N(R) and have N(R) and all messages following
Note that a double comma indicates the absence of an N(R) repeated.
N(S) field. Figure 14.39 shows the operation of an HDLC system
Initially in Fig. 14.38, the master station sends three operating in full-duplex mode, permitting the simultaneous
I-frames. The poll bit in the third frame is set to force a exchange of messages in both directions.
response from the slave. The slave replies by sending two We have explained only part of the HDLC data link layer
I-frames that are terminated by setting the F bit of the C-field. protocol. Unnumbered fields are used to perform operations
If the slave had no I-frames to send, it would have responded related to the setting up or establishing of the data link layer
with RR,,3,F. The values of N(S) and N(R) are determined by channel and the eventually clearing down of the channel.
the sender of the frame.
The master sends two more I-frames, terminated by a poll
bit. The first frame (I,3,2) is corrupted by noise and rejected
14.8.2 The Ethernet data link layer
by the receiver. When the slave responds to the poll from the Figure 14.40 describe an Ethernet packet that consists of six
master, it sends a supervisory frame, REJ,,3,F, rejecting the I- fields. The 8-byte preamble is a synchronizing pattern used to
frame numbered 3 and all succeeding frames. This causes the detect the start of a frame and to derive a clock signal from it.
master station to repeat the two frames numbered N(S) 3 The preamble consists of 7 bytes of alternating 1s and 0s
and N(S) 4. followed by the pattern 10101011. Two address fields are
When the master station sends an I-frame numbered provided, one for the source and one for the destination.
I,5,2,P, it also is corrupted in transmission and rejected by the A 6-byte (48-bit) address allows sufficient address space for
receiver. The secondary station cannot respond to this polled each Ethernet node to have a unique address.
Computer A Computer B
I,0,0
I,1,0
I,2,0,P A message is denoted by
type,N(S),N(R),P/F. For example, I,3,0,P
I,0,3 indicates an information frame numbered 3,
I,1,3,F with an N(R) count of 0, and the poll bit set
Noise indicating that a response is required. Note
I,3,2
that message 3 from A (i.e. I,3,2) is lost.
I,4,2,P
Therefore, when A sends the message
REJ,,3,F I,4,2,P with the poll bit set, B responds with
REJ,,3,F. This indicates that B is rejecting
I,3,2 all messages from A numbered 3 and
I,4,2 above. The F bit is set denoting that B has no
I,5,2,p Noise more messages to send to A.
Timeout
RR,2,p
REJ,,5,F
I,5,2
Figure 14.38 An example of
an HDLC message exchange
sequence.
604 Chapter 14 Computer communications
Computer A Computer B
(master) (slave)
I,0,0 • A sends a frame I,0,0 (information frame
I,1,0 numbered 0, A is expecting a frame from B
numbered 0).
I,2,0 • A sends frame I,1,0 (information frame numbered
Noise
I,3,0,P 1, A is still expecting a frame from B numbered 0).
I,0,2 • A sends frame I,2,0. This frame is corrupted by
noise and is not correctly received by B.
REJ,,2,F
• B sends frame I,0,2 (information frame numbered
I,2,1 0, B is expecting a frame from A numbered (2).
I,3,1 Note that because A's frame I,2,0 has been lost,
I,1,3,F B is still expecting to see a frame from B labeled
with N(S) = 2.
I,4,2,P • A sends I,3,0,P (information frame numbered 3,
I,5,2 A is expecting a frame numbered 0 from B). A is
I,6,2 also polling B for a response. At this point A does
not know that its previous message has been
RR,,6
I,2,7 lost, and A has not yet received B's last
I,7,2 message.
I,3,7,F • B sends a reply to A's poll. This is REJ,,2,F
I,0,4,P indicating that all A's messages numbered 2 and
above have been rejected. The final bit, F, is set
indicating that B has nothing more to send at the
RR,,1 moment.
I,4,1 • A now sends I,2,1 (information frame 2, and A is
I,5,1 expecting to see a frame from B numbered 1).
This frame is a repeat of A's information frame
I,1,6 I,6,1 numbered 2, which was lost earlier.
Noise
I,2,6,P
I,6,3,F
Preamble STD Destination Source Type Data CRC Figure 14.41 IEEE 802.3
(7 bytes) (1 byte) address (6 bytes) address (6 bytes) (2 bytes) (variable) (4 bytes) packet format.
The type field is reserved for use by higher level layers to called the medium access control (MAC). The IEEE 802 stan-
specify the protocol. The data field has a variable length, dards divide the data link layer into a medium access layer
although the size of an Ethernet packet must be at least 64 and a logical link control (LLC).
bytes. The data field must be between 46 and 1500 bytes. The
final field is a 4-byte cyclic redundancy checksum (CRC) that
provides a very powerful error-detecting mechanism. 14.9 Routing techniques
Figure 14.41 describes the format of a packet conforming to
the IEEE’s 802.3 standard, which is very similar to the How does a message get from one point in a network to its
original Ethernet packet. The preamble and start-of-frame destination? Routing in a network is analogous to routing in
delimiter are identical to the corresponding Ethernet everyday life. The analogy between network and computer
preamble. The principle difference is that the 802.3 packet has routing is close in at least one sense—the shortest route isn’t
a field that indicates the length of the data portion of the frame. always the best. Drivers avoid highly congested highways.
The 802.3 protocol covers layer 1 of the OSI reference Similarly, a network strives to avoid sending packets along a
model (the physical layer) and part of the data link layer link that is either congested or costly.
14.9 Routing techniques 605
CHARACTER-ORIENTED PROTOCOLS
Character-oriented protocols belong to the early days of data Character-oriented protocols provide point-to-point
communication. They transmit data as ASCII characters using communication between two stations. Like all data link layer
special 7-bit characters for formatting and flow control. For protocols, they both control the flow of information (message
example, the string ‘Alan’ is sent as a sequence of four 7-bit sequencing and error recovery) and they set up and maintain
characters 1000001001101110000110111011. This string of the transmission path.
bits is read from left to right, with the leftmost bit A consequence of reserving special characters for control
representing the least-significant bit of the ‘A’. We need a functions is that the transmitted data stream must not
method of identifying the beginning of a message. Once this contain certain combinations of bits, as these will be
has been done, the bits can be divided into groups of seven (or interpreted as control characters. Fortunately, there are ways
eight if a parity bit is used) for the duration of the message. of getting round this problem by using an escape character
The ASCII synchronous idle character SYN (00101102) denotes that modifies the meaning of following characters.
the beginning of a message. The receiver reads the incoming The diagram below shows the format of a BiSync frame, a
bits and ignores them until it sees a SYN character. The protocol originally devised by IBM. The SOH, STX, and ETC
following demonstrates the use of the SYN character. characters denote start of header, start of text, and end of
text, respectively.
Character sequence Bit sequence
Case 1 0101100 0010110 0100111 010110000101100100111
Case 2 0100010 1101101 0100111 010001011011010100111
Case 3 0101100 0010110 0010110 010110000101100010110
On the left we have provided three consecutive
characters with spaces between successive charac-
SYN SYN SOH Header STX Text ETX BCC
ters. On the right we’ve removed spaces to show the
bit stream. Case 1 shows how the SYN is detected.
This simple scheme is flawed because the end of
one character plus the start of the next may look like SOH Address Block sequence number Control Acknowledgement
a SYN character. Case 2 shows how a spurious SYN
might be detected.To avoid this problem, two SYN
characters are transmitted sequentially. If the receiver STX Text ETX
does detect a SYN, it reads the next character. If this
is also a SYN the start of a message is assumed to have been A BiSync frame header keeps track of the data by giving it a
located, otherwise a false synchronization is assumed and the sequence number and providing a means of sequencing and
search for a valid SYN character continued (case 3). acknowledging frames.
A ring network connects all nodes to each other in the form of is created to avoid sending the eighth ‘1’, thereby passing on
a continuous loop. Unlike the nodes of a bus network that the token. The station holding the token may now transmit its
listen passively to data on the bus unless it is meant for them, data. After it has transmitted its data, it sends a new token
the nodes of the ring take on an active part in all data down the ring. As there is only one token, contention cannot
transfers. When receiving incoming data, a node must test the arise on the ring unless, of course, a station becomes antisocial
packet and decide whether to keep it for itself or to pass it on and sends out a second token. In practice, a practical system is
to its next neighbor. rather more complex, because arrangements must be included
Token rings pass a special bit pattern (the token) round the for dealing with lost tokens.
ring from station to station. The station currently holding the The IEEE has created a standard for the token ring LAN
token can transmit data if it so wishes. If it does not wish to called 802.5. Two types of frame are supported—a three-octet
take the opportunity to send data itself, it passes the token on frame and a variable-length frame. Each frame begins and ends
round the ring. For example, suppose the token has the special with a starting and ending delimiter, which mark the frame’s
pattern 11111111, with zero stuffing used to keep the pattern boundaries. The second octet provides access control (i.e. a
unique. A station on the ring wishing to transmit monitors its token bit, a monitor bit, and priority bits). The short three-
incoming traffic. When it has detected seven 1s it inverts the octet frame format is used to pass the control token round the
last bit of the token and passes it on. Thus, a pattern called a ring from one node to the next. The IEEE 802.5 standard pro-
connector (11111110) passes on down the ring. The connector vides for prioritization. When a station wishes to transmit data
it waits for a free token whose priority is less than its own.
Three octets
1 octet 1 octet 1 octet 1 octet 2 or 6 octets 2 or 6 octets 0 or 5000 octets 4 octets 1 octet 1 octet
Start Access End Frame Destination Source Data CCS End Frame
delimiter control delimiter control address address delimiter status
4 9 4 F–A–C 48 12
F–A–B–C 494 17
1 8 C
A F–A–D–C 423 9
3
F–E–C 17 8
2 7 2
F–E–D–C 143 8
D
4 F–E–D–A–C 1428 15
E
F–E–D–A–B–C 14294 20
F–E–B–C 124 7
Figure 14.42 Cost of routing in a network.
F–E–B–A–C 1298 20
F–E–B–A–D–C 12923 17
bandwidth. Flooding is not normally used by today’s
networks. Table 14.6 The cost of routing a message from node F to C in
Suppose now we apply a cost to each of the routings. This Fig. 14.42.
cost is a figure-of-merit that might be determined by the
reliability of a link, its latency (i.e. delay), or its actual cost (it Table 14.6 indicates that the cheapest route is F to E to B to C,
might be rented). We have provided a number against each which is slightly cheaper than the more direct route F to E to C.
link in Fig. 14.42 to indicate its cost. If we now apply these How do you find the cheapest route though the network
costs to the routines, we get the figure shown in Table 14.6. and what happens if the cost of a link changes (if every node
14.9 Routing techniques 607
14.9.3 IP (Internet protocol) Table 14.8 Routing matrix (next node table).
Although networks were originally developed for highly spe-
cialized applications such as reliable military communications The protocol used for ARPANET’s transport layer forms the
systems and academic research tools, it’s the Internet that’s basis of Internet’s transmission control protocol (TCP).
caught people’s attention because of its impact on everyday The Internet links together millions of networks and indi-
life. The Internet began as a development of the US Defense vidual users. In order to access the Internet, a node must use
Advanced Research Projects Agency, (DARPA) in the 1960s. the TCP/IP protocol (transmission control protocol/Internet
This project created and developed a small experimental net- protocol), which corresponds to layers 4 and 3 of the OSI
work using packet switching called ARPANET (the ‘D’ for reference model, respectively. Some of the higher level
defense has been dropped from the acronym). Research into protocols that make use of TCP/IP are TELNET (a remote
the ARPANET was carried out at many universities and this login service that allows you to access a computer across the
network gradually evolved into what we now call the Internet. Internet), FTP (file transfer protocol), which allows you to
608 Chapter 14 Computer communications
exchange files across the Internet, and SMTM (simple mail the packet is discarded. This facility prevents packets circulat-
transfer protocol), which provides electronic mail facilities. ing round the Internet endlessly.
Here we provide only an overview of the TCP/IP layers. The protocol field specifies the higher level protocol that is
Internet’s network layer protocol, IP, routes a packet using the current packet; for example, the TCP protocol has
between nodes in a network. The packets used by the IP are the value 6. This facility enables the destination node to pass
datagrams and are handled by appropriate data link layer the IP packet to the appropriate service.
protocols—typically Ethernet protocols on LANs and X.25 The header checksum detects errors in the header. Error
protocols across public data networks (i.e. the telephone sys- checking in the data is performed by a higher level protocol. The
tem). Figure 14.43 describes the format of an IP packet (or checksum is the one’s complement of the sum of all 16-bit inte-
frame) that is received from the data link layer below it and gers in the header. When a packet is received the checksum is
passed to the TCP transport layer above it. calculated and compared with the transmitted value. A check-
IP’s version field defines the version of the Internet proto- sum is a very crude means of providing error protection (it’s not
col that created the current packet. This facility allows room in the same league as the FCS) but it is very fast to compute.
for growth because improvements can be added as the state of The source and destination IP address fields provide the
the art improves while still permitting older systems to access address of where the packet is coming from and where it’s
the network. The IP version widely used in the late 1990s was going. We will return to IP addressing later. The options field
IPv4, and IPv6 was developed to deal with some of the prob- is optional and allows the packet to request certain facilities.
lems created by the Internet’s increasing size and to provide For example, you can request that the packet’s route through
for time-critical services such as real-time video and speech. the Internet be recorded or you can request a particular route
The header length defines the size of the header in multi- though the network. Finally, the data field contains the infor-
ples of 32-bit words (i.e. all fields preceding the data). The mation required by the next highest protocol.
minimum length is five. Because the header must be a multi-
ple of 32 bits, IP’s padding field is used to supply 0 to 3 octets IP routing
to force the header to fit a 32-bit boundary. The datagram Both the IP source and destination addresses are 32 bits in
length is a 16-bit value that specifies the length of the entire IP version 4 of the Internet protocol. Version 6 will provide 128-
packet, which limits the maximum size of a packet to 64K bit addresses (that’s probably enough to give each of the
octets. In practice, typical IP packets are below 1 kbyte. Earth’s molecules its own Internet address).
The service type field tells the transport layer how the An IPv4 address is unique and permits 232 (over 4000 mil-
packet is to be handled; that is, priority, delay, throughput, lion) different addresses. When specifying an Internet
and reliability. The service request allows the transport layer address it’s usual to divide the 32 bits into four 8-bit fields and
to choose between, for example, a link with a low delay or a convert each 8-bit field into a decimal number delimited by a
link that is known to be highly reliable. period; for example, the IP address 11000111 10000000
The flags and fragment offset fields are used to deal with 01100000 00000000 corresponds to 199.128.96.0.
fragmentation. Suppose a higher level layer uses larger packets Although an IP address provides 232 unique values, it
than the IP layer. A packet has to be split up (i.e. fragmented) doesn’t allow up to 4000 million nodes (or users) to exist on
and transmitted in chunks by the IP. The fragmentation flags the Internet, because not all addresses are available. An IP
indicate that an IP packet is part of a larger unit that has to be address is a hierarchical structure designed to facilitate the
re-assembled and the fragment offset indicates where the routing of a packet through the Internet and is divided into
current fragment fits (remember that IP packets can be four categories as Fig. 14.44 demonstrates.
received out of order). Internet addresses have two fields—a network address and
The time-to-live field corresponds to the packet’s best- a node address. Class A Internet protocol addresses use a 7-bit
before date and is used to specify the longest time that the network identifier and then divide each network into 224
packet can remain on the Internet. When a packet is created, different nodes. Class B addresses can access one of 214 16K
it is given a finite life. Each time the packet passes a node, the networks each with 64K nodes, and class C addresses select
time-to-live count is decremented. If the count reaches zero, one of 212 4096 networks with 254 nodes.
14.9 Routing techniques 609
0 8 31
Class A 0 Net ID Host ID
0 1 16
Class B 1 0 Net ID Host ID
0 1 2 24
Class C 1 1 0 Net ID Host ID
0 1 2 3
Class D 1 1 1 0 Multicast address
0 1 2 3 4 Figure 14.44 Structure of an
Class E 1 1 1 1 0 Reserved for future use IP address.
You can easily see how inefficient this arrangement is. that control the operation of the TCP; for example, by indi-
Although only 128 networks can use a class A address, each cating the last data segment or by breaking the link. The win-
network gets 16 million node addresses whether they are dow field tells the receiving node how many data bytes the
needed or not. Class A and B addresses have long since been sending node can accept in return. The checksum provides
allocated (removing large numbers of unique addresses from basic error correction for the transport layer. The options
the pool). This leaves only a rapidly diminishing pool of class C field defines TCP options. The padding field ensures that the
addresses (until the IPv6 protocol becomes more widely used). header fits into a 32-bit boundary.
The end user doesn’t directly make use of a numeric The urgent pointer field is used in conjunction with the
Internet address. Logical Internet addresses are written in the URG flag bit. If the URG bit is set, the urgent pointer provides
form [email protected]. The way in a 16-bit offset from the sequence number in the current TCP
which these logical addresses are mapped onto physical header. This provides the sequence number of the last byte in
addresses is beyond the scope of this chapter. urgent data (a facility used to provide a sort of interrupt facil-
ity across the Internet). The host receiving a message with its
Transmission control protocol URG bit set should pass it to the higher layers ahead of any
TCP performs a level-4 transport layer function by interfacing currently buffered data.
to the user and host’s applications processes at each end of the Although the TCP protocol forms the backbone of the
net. The TCP is rather like an operating system because it carries Internet, it is rather old and has its origin in the days of the
out functions such as opening, maintaining, and closing the ARPANET. In particular, the TCP’s error-detecting checksum
channel. The TCP takes data from the user at one end of the net is almost worthless because it isn’t as powerful as the data link
and hands it to the IP layer below for transmission. At the other layer’s FCS error-detecting mechanism. TCP plus IP headers
end of the net, the TCP takes data from the IP layer and passes it are 40 bytes or more and these add a significant overhead to
to the user. Figure 14.45 describes the transport header. short data segments.
The source and destination port addresses provide
application addresses. Each node (host) might have several
■ SUMMARY
application programs running on it and each application is
associated with a port. This means you can run several appli- In this chapter we have provided an overview of some of the
cations, each using the Internet, on a computer at any instant. aspects of interest to those involved with computer communica-
The sequence number ensures that messages can be assem- tions networks. Computer networks is a subject that is advancing
as rapidly as any other branch of computer science, because it
bled in sequence because it contains the byte number of
increases the power of computer systems and exploits many of
the first byte in the data. The acknowledgement number
today’s growing technologies. It is all too easy to think of com-
indicates the byte sequence number the receiving TCP node puter communications as a hardware-oriented discipline cen-
expects to receive and, therefore, acknowledges the receipt of tered almost exclusively on the transmission of signals from
all previous bytes. This arrangement is analogous to the point A to point B. Modern computer communications networks
HDLC protocol used by layer two protocols. have software components that far outweigh their hardware
The offset defines the size of the TCP header and, there- components in terms of complexity and sometimes even cost. In
fore, the start of the data field. The flags field contains 6 bits this chapter we have introduced the ideas behind the seven
610 Chapter 14 Computer communications
layers of the ISO basic reference model for open systems intercon- 9
A B
nection and have described protocols for the bottom two layers.
4 5 6
5
■ PROBLEMS
F C
14.1 If the cost of a computer and all its peripherals is so low 7
today, why is the field of computer communications expanding 3 2
so rapidly? 8
14.2 What is the meaning of a protocol and why are protocols 4 D
so important in the world of communications? E
14.3 What is the difference between a WAN and a LAN?
Figure 14.46 Routing in a network.
14.4 What is an open system?
14.5 Why has the ISO model for OSI proved so important in Bit rate Unit
the development of computer communications? (a) 100 bps ms
14.6 What are the differences between the transport and net- (b) 1 kbps ms
work layers of the ISO reference model? (c) 56 kbps s
14.7 Why is the physical layer of the OSI model different from (d) 100 Mbps ns
all the other layers? 14.22 Each of the following time values represents 1 bit. For each
14.8 What is a virtual connection? value give the corresponding bit rate expressed in the units stated.
14.9 What are the differences between half-duplex and full- Duration Unit of bit rate
duplex transmission modes? How is it possible to make a half- (a) 1 s bps
duplex system look like a full-duplex system? (b) 10 s kbps
(c) 10 s Mbps
14.10 What is the difference between phase and frequency
(d) 15 ns Gbps
modulation?
14.23 For each of the following systems calculate the bit rate.
14.11 What are the types of noise that affect a data link? Which
types of noise are artificial and which are natural? If you were (a) 300 baud 2-level signal
comparing a satellite link and a telephone link, what do you think (b) 600 baud 4-level signal
are the effect, type, and consequences of noise on each link? (c) 9600 baud 256-level signal
14.12 What determines the maximum rate at which informa- 14.24 The ISO reference model has seven layers. Is that too
tion can be transmitted over a data link? many, too few, or just right?
14.13 Why cannot users transmit any type of signal they wish 14.25 Define an open system and provide three examples of
(i.e. amplitude, frequency characteristics) over the PSTN? open systems?
14.14 What is the difference between DTE and DCE? 14.26 What are the relative advantages and disadvantages of
satellite links in comparison with fiber optic cables?
14.15 What are the advantages and disadvantages of the fol-
lowing communications media: fibre optic link, twisted pair, and 14.17 If a signal has a signal-to-noise ratio of 50 dB and the
satellite link? power of the signal is 1 mW, what is the power of the noise
component?
14.16 Why is a SYN character required by character-oriented
data link, and why is a SYN character not required by a 14.28 For the network of Fig. 14.46 calculate the lowest cost
bit-oriented data link? route between any pairs of nodes.
14.17 What is bit stuffing and how is it used to ensure 14.29 Suppose the network of Fig. 14.46 used flooding to route
transparency? its packets. Show what would happen if a packet were to be sent
from node F to node C.
14.18 What are the advantages and disadvantages of LANs
based on the ring and bus topologies? 14.30 A network has a bandwidth of 3400 Hz and a signal-to-
noise ratio of 40 dB. What is the maximum theoretical data rate
14.19 What is the meaning of CSMA/CS in the context of a
that the channel can support?
mechanism for handling collisions on a LAN?
14.31 Shannon’s work on the capacity of a channel relates to
14.20 The maximum range of a line-of-sight microwave link, d,
so-called white Gaussian noise (e.g. thermal noise). Many tele-
is given by the formula d2 2r ⋅ h h2, where r is the radius of
phone channels suffer from impulse noise (switching transients
the Earth and h is the height of the antenna above the Earth’s
that appear as clicks). Do you think that (for the same noise
surface. This formula assumes that one antenna is at surface
power) such a channel would have a better information-carrying
level and the other at height h. Show that this formula is correct.
capacity than predicted by Shannon?
Hint: it’s a simple matter of trigonometry.
14.21 For each of the following bit rates determine the period 14.32 Why is a checksum error detector so much worse than a
of 1 bit in the units stated. cyclic redundancy code?
ACKNOWLEDGEMENTS
Few books are entirely the result of one person’s unaided impact. Alan Knowles of Manchester University read drafts of
efforts and this is no exception. I would like to thank all those both the second and third editions with a precision well
who wrote the books about computers on which my own beyond that of the average reviewer. Paul Lambert, one of my
understanding is founded. Some of these writers conveyed the colleagues at The University of Teesside, wrote the 68K cross-
sheer fascination of computer architecture that was to change assembler and simulator that I use in my teaching. In this
the direction of my own academic career. It really is amazing edition we have used a Windows-based graphical 68K
how a large number of gates (a circuit element whose opera- simulator kindly provided by Charles Kelly.
tion is so simple as to be trivial) can be arranged in such a way Dave Barker, one of my former students and an excellent
as to perform all the feats we associate computers with today. programmer, wrote the logic simulator called Digital Works
I am grateful for all the comments and feedback I’ve that accompanies this book. I would particularly like to thank
received from my wife, colleagues, students, and reviewers Dave for providing a tool that enables students to construct
over the years. Their feedback has helped me to improve the circuits and test them without having to connect wires
text and eliminate some of the errors I’d missed in editing. together.
More importantly, their help and enthusiasm has made the One of the major changes to the third edition was the
whole project worthwhile. chapter on the ARM processor. I would like to thank Steve
Although I owe a debt of gratitude to a lot of people, I would Furber of Manchester University (one of the ARM’s design-
like to mention four people who have had a considerable ers) for encouraging me to use this very interesting device.
BIBLIOGRAPHY
Morrison, T. P. (1997). The Art of Computerized Measurement. Oxford Halsall, Fred (1995). Data Communications, Computer Networks and
University Press, Oxford. Open Systems (4th edition). Addison-Wesley.
Schultz, Jerome S. (1991). Biosensors. Scientific American, August 1991, Shay, William A. (1995). Understanding Data Communications Systems.
pp64–69. International Thomson Publishing.
Stallings, William (2003). Data and Computer Communications,
(Seventh Edition). Prentice Hall.
Communications Tanenbaum, Andrew S. (2002). Computer Networks (Fourth edition).
Prentice Hall.
Comer, Douglas E. (2003). Computer Networks and Internets
(4th Edition). Prentice Hall.
THE HISTORY OF THIS BOOK
Like people, books are born. Principles of Computer Hardware processors in the PC world, Motorola’s 68K family of micro-
was conceived in December 1980. At the end of their first semes- processors is much better suited to teaching computer archi-
ter our freshmen were given tests to monitor their progress. The tecture. In short, it supports most of the features that
results of the test in my ‘Principles of computer hardware’course computer scientists wish to teach students, and just as impor-
were not as good as I’d hoped, so I decided to do something tantly, it’s much easier to understand. The 68K family and its
about it. I thought that detailed lecture notes written in a style derivatives are widely used in embedded systems.
accessible to the students would be the most effective solution. By the mid-1990s the second edition was showing its age.
Having volunteered to give a course on computer commu- The basic computer science and the underlying principles
nications to the staff of the Computer Center during the were still fine, but the actual hardware had changed dramati-
Christmas vacation, I didn’t have enough free time to produce cally over a very short time. The most spectacular progress
the notes. By accident I found that the week before Christmas was in the capacity of hard disks—by the late 1990s disk
was the cheapest time of the year for vacations. So I went to capacity was increasing by 60% per year.
one of the Canary Islands for a week, sat down by the pool, This third edition included a 68K cross-assembler and
surrounded by folders full of reference material, with a bottle of simulator allowing students to create and run 68K programs
Southern Comfort, and wrote the core of this book—number on any PC. It also added details of interesting microprocessor
bases, gates, Boolean algebra, and binary arithmetic. Shortly architecture, the ARM, which provides an interesting con-
afterwards I added the section on the structure of the CPU. trast to the 68K.
These notes produced the desired improvement in the When I used the second edition to teach logic design to my
end-of-semester exam results and were well received by the students, they built simple circuits using logic trainers—boxes
students. In the next academic year my notes were transferred with power supplies and connectors that allow you to wire a
from paper to a mainframe computer and edited to include handful of simple chips together. Dave Barker, one of my for-
new material and to clean up the existing text. mer students, has constructed a logic simulator program as
I decided to convert the notes into a book. The conversion part of his senior year project called Digital Works, which
process involved adding topics, not covered by our syllabus, runs under Windows on a PC. Digital Works allows you to
to produce a more rounded text. While editing my notes, I place logic elements anywhere within a window and to wire
discovered what might best be called the inkblot effect. Text the gates together. Inputs to the gates can be provided manu-
stored in a computer tends to expand in all directions because ally (via the mouse) or from clocks and sequence generators.
it’s so easy to add new material at any point; for example, you You can observe the outputs of the gates on synthesized LEDs
might write a section on disk drives. When you next edit the or as a waveform or table. Moreover, Digital Works permits
section on disks, you can add more depth or breadth. you to encapsulate a circuit in a macro and then use this
The final form of this book took a breadth before depth macro in other circuits. In other words, you can take gates
approach. That is, I covered a large number of topics rather and build simple circuits, and take the simple circuits and
than treating fewer topics in greater depth. It was my intention build complex circuits, and so on.
to give students taking our introductory hardware/architecture I began writing a fourth edition of this text in late 2003.
course a reasonably complete picture of the computer system. The fundamental principles have changed little since the
The first edition of Principles of Computer Hardware third edition, but processors had become faster by a factor of
proved successful and I was asked to write a second edition, 10 and the capacity of hard disks has grown enormously. This
which was published in 1990. The major change between the new edition is necessary to incorporate some of the advances.
first and second editions was the adoption of the 68K micro- After consultation with those who adopt this book, we have
processors as a vehicle to teach computer architecture. I have decided to continue to use the 68K family to introduce the
retained this processor in the current edition. Although computer instruction set because this processor still has one
members of the Intel family have become the standard of the most sophisticated of all instruction set architectures.
The CD
The Software Contained on the CD NOTE The ARM software also includes a substantial amount
of documentation including the ARM Reference Manual in
The enclosed CD contains four major items of software, all of the subdirectory PDF. Note also that I have already unzipped the
which run on IBM PCs and their clones. I have tested the ARM software on the CD and you will be able to find the
software on several PCs under Windows 98 for the third edition documentation in ArmSim\ARM202U\PDF. The documentation
and under Windows XP for this fourth edition. One item runs goes well beyond the level of this text and has been included to
only under DOS. allow readers to delve more deeply into the ARM’s architecture.
●
A 68000 processor DOS-based cross-assembler and simulator The digital logic simulator, Digital Works, must be installed on
your system. Similarly, you must unzip the ARM logic simulator
●
A 68000 processor Windows-based editor, cross-assembler
files and install them on your hard disk.
and simulator
●
A digital logic simulator The following is the testing schedule that was used to test this
CD. Further information about the packages can be found in
●
A simulator for the ARM microprocessor
the CD’s files and in the body of the text.
●
Documentation for the 68000 processor family
OUP have set up an online resource centre to support this
These items are in separate directories and have appropriate book. Its URL is: www.oxfordtextbooks.co.uk/orc/clements4e
readme files. You also need Adobe Acrobat Reader to view some
of the information such as the Motorola and ARM’s user I can be contacted by email at [email protected]
manuals. The CD also contains copies of the Adobe Acrobat
Reader that you can install if you do not already have it.
CD Testing Schedule
IT IS IMPORTANT THAT YOU APPRECIATE THAT NONE
OF THE SOFTWARE IS OWNED BY OXFORD UNIVERSITY This “testing schedule” has been devised to allow my “pre-
PRESS. ALL THE SOFTWARE WAS KINDLY SUPPLIED BY release testers” to examine the software on this disk before it is
THIRD PARTIES FOR USE BY THE READERS OF THIS released with Principles of Computer Hardware. It should also
BOOK. help other readers to get the software going. This software
THIS SOFTWARE IS SUBJECT TO THE INDIVIDUAL contains third-party utilities, simulators, and documentation
CONDITIONS STATED BY THE APPROPRIATE (in Adobe’s Portable Document Format).
COPYRIGHT HOLDERS. 1. Read the Readme.txt file in the root directory.
THE SOFTWARE HAS BEEN SUPPLIED TO OUP ON THE 2. Install Adobe Acrobat Reader. The CD contains
CONDITION THAT IT IS NOT SUPPORTED. AdbeRdr70_enu_full.exe that will install Version 7 on a PC
ONE ITEM OF SOFTWARE ON THE CD, WINZIP, IS with Windows XP.
SUPPLIED AS A DEMONSTRATION COPY AND MAY NOT You can also install Version 4 Adobe Acrobat Reader (for
BE USED FOR MORE THAN 21 DAYS WITHOUT PAYMENT. compatibility with the 3rd edition of the book) of using
This software is required only if you cannot unzip the ARM one of the two files ar40eng.exe or rs40eng.exe. The former
development software. is a Windows 95 version and the latter a Windows 98
version.
The four directories on the CD containing the above items are
●
68Ksim 3. If you have Adobe acrobat Reader already installed or have
●
Digital just installed it, open the 68Kdocs directory and click on the
●
ArmSim 68Kprm.pdf file. This should enable you to read Motorola’s
●
68Kdocs definitive document on the 68000 family.
●
Easy68K_4ed 4. Test the 68K simulator. Open directory 68Ksim and click on
the pdf document sim.pdf. This will open the guide to the
I suggest that you copy 68Kdocs and 68Ksim to your hard disk.
use of the simulator software in Adobe Acrobat Reader.
The DOS-based 68K simulator software simply has to be copied
to your system and does not require any installation procedure. 5. Examine the other .txt files in directory 68Ksim.
You simply run the appropriate X68K.EXE or E68K.EXE file 6. Use an ASCII text editor to create a file, for example,
from your DOS prompt. The Windows-based 68K simulator TEST.X68 that contains a minimal 68K assembly language
has to be installed. program. (You can use one of the ‘demo’ files provided on
the CD.) Go into the DOS command-line mode on your PC
NOTE When I tested the DOS-based 68K simulator I found
and assemble the program with the command line X68K
that some of the demonstration files had become “read-only” in
TEST -L.
the transfer to the CD. This means that you will get an error
message when you try and assemble them or run them. You can Note that you MUST not provide the extension .X68 or the
solve their problem by changing the attribute from read-only to assembly with fail. The extension, -L, is used to generate a
read/write. this problem affected only the demonstration/test listing file. That is, X68K TEST -L will generate TEST.BIN
files. (if assembly is successful) and TEST.LIS.
The CD
If assembly succeeds (i.e. there are no errors in your source I have created CLEMENTS.S for you to test. You should
code), invoke the simulator from the DOS command line have copied this to the \BIN directory.
with the command E68K TEST. f. Note that the system needs to know where the compiler,
You can test the simulator (if you have read the etc., is. Click on ‘Options’ and select ‘Directories’. You
documentation) and then exit by using the Q (quit) will probably have to give the path of the compiler, etc.
command. This takes you back to the DOS command level. on your own system if you have not used the path
If you run a program that puts you in an infinite loop, you C:\ARM200\BIN.
can get out by hitting the escape key. g. From the Project pull-down menu select ‘Build
NOTE that this directory contains several test files (i.e. name.APJ’, where ‘name’ is the name of the project. You
Demo1.bin and Demo2.x68). You can assemble Demo1.x68 should get a ‘Build complete’ message if your source
with the command X68K DEMO1 -L. You can then run code had no errors.
the binary file with E68K DEOM1. To execute a program in h. From the Project pull-down menu select ‘Debug
the simulator type GO followed by a carriage return (i.e., the name.APJ’ to enter the debugger/simulator mode.
“enter” key).
i. In the debugger you can use the ‘View’ pull-down menu
7. Test Digital Works. Open the directory Digital and double to see registers, etc. Select the ‘User registers’ menu. This
click on dw20_95.exe to install Digital Works into the system loads the program at 8080 hexadecimal. Change
directory of your choice. If you change to the directory where the PC to 8080 by clicking on it.
Digital Works is located, double clicking on Digital.exe will
j. You can now run the code line-by-line with the step into
run Digital Works. Note that Digital Works also puts a
command (one of the icons on the debugger toolbar).
command on the Windows 98 Start/Programs menu.
k. Note—from the “Project” pull-down menu you can edit
The simplest way of testing Digital Works is to select a gate
your source code.
my moving the cursor to it and then clicking on that gate’s
icon. Then move the cursor to the work area and then click 9. Test the 68K Windows-based simulator. This is a system
again. A copy of the gate should be moved to the work area. created by a team led by Chuck Kelly. The software is available
in the public domain and I would suggest that you obtain the
8. Test the ARM simulator. This is the most complex software
latest version from the Internet at www.monroeccc.edu/
on the CD and, for the purpose of The Principles of
ckelly/easy68k.htm. The version in this CD has been included
Computer Hardware you will be using only a fraction of its
to ensure that all readers have a copy of this software.
capabilities. Note that the package includes considerable
documentation in Adobe’s PDF format. a. Click on SetupEASy68K/exe to install EASy68K.
Installation puts the software in a sub-directory
You must first install the ARM software. I have provided
EASy68k (we’ve created this directory on the CD).
202u_w32_v1.zip which is the package I downloaded from
ARM’s university web site. The directory ARM202U was b. The sub-directory EASy68k contains several files
created by unzipping 202u_w32_v1.zip. including EDIT68k.exe and SIM68k.exe. If you double-
click on EDIT68k.exe, you will invoke a text editor that
When I tested this package, I first unzipped the files to C:\
uses a template for a 68K assembly language program.
which created the directory ‘C:\ARM202U’ containing the
You can type your 68K assembly language into this
unzipped files and subdirectories. I then changed the name
template and save it. The EDIT68k program is intuitive
of the directory to ‘C:\ARM200’ to suit the software’s initial
to use and has a ‘Help’ function.
default paths to its \BIN and \LIB directories.
c. You can assemble a program from within the editor.
The following provides an introduction to testing this
Select the ‘Project’ tab in the editor window to get the
software:
‘Assemble source’ option. Left-click on this and your
a. Put the file clements.s (the test program written by program will be assembled. If you make any errors, you
me and located in directory ARMsim) in will have to re-edit the source. If there are no errors, you
C:\ARM200\BIN. can select the ‘Close’ button and exit, or the ‘Execute’
b. Run the simulator package from Windows by clicking on button to enter the simulator.
Apm.exe in the \BIN directory. d. If you select the ‘Execute’ button, the 68K simulator is
c. Use the Project pull-down menu and select ‘New Project’ invoked. Now you can run the code to its completion or
d. Give the project a name and save the project in the execute it line-by-line. The simulator displays the 68K’s
C:\ARM200\BIN directory. This will create an ‘Edit register and you can also open memory or stack
Project’ window that asks for files to include. windows. The F7 function key can be used to execute
code an instruction at a time.
e. If you have created a source file with the extension .S
(e.g. CLEMENTS.S) add it to the project and click OK. Last modified on 14 July 2005
The CD
CD-ROM conditions of use and copyrights THIS CD-ROM IS PROVIDED ‘AS IS’ WITHOUT WARRANTY
OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT
Please read these terms before proceeding with the CD installa- LIMITED TO IMPLIED WARRANTIES OF SATISFACTORY
tion. By installing the CD you agree to be bound by these terms, QUALITY OR FITNESS FOR A PARTICULAR PURPOSE. IN
including the terms applicable to the software described below. NO EVENT SHALL ANYONE ASSOCIATED WITH THIS
The enclosed CD contains four major items of software, all of PRODUCT BE LIABLE FOR ANY DIRECT, INDIRECT,
which run on IBM PCs and their clones. One item runs only SPECIAL, CONSEQUENTIAL, OR INCIDENTAL DAMAGES
under DOS. RESULTING FROM ITS USE.
●
A 68000 cross-assembler and simulator THIS SOFTWARE IS SUBJECT TO THE INDIVIDUAL
CONDITIONS STATED BY THE APPROPRIATE COPYRIGHT
●
A digital logic simulator
HOLDERS WHICH ARE GIVEN BELOW AND ON THE
●
A simulator the ARM microprocessor CD WALLET COVER.
●
Documentation for the 68000 family
THE SOFTWARE IS NOT SUPPORTED.
These items are in separate directories and have appropriate
ONE ITEM OF SOFTWARE ON THE CD, WINZIP, IS
“readme” files. You also need Adobe Acrobat Reader to view
SUPPLIED AS A DEMONSTRATION COPY AND MAY NOT
some of the information such as Motorola and ARM’s user man-
BE USED FOR MORE THAN 21 DAYS WITHOUT PAYMENT.
uals. The CD also contains a copy of the Adobe Acrobat Reader
This software is required only if you cannot unzip the ARM
that you can install if you do not already have it.
development software.
The materials contained on this CD-ROM have been supplied by
the author of the book. Whilst every effort has been made to DIGITAL WORKS 95 VERSION 2.04 is © John Barker 2000.
check the software routines and the text, there is always the pos- TERMS OF USE: Digital Works 95 version 2.04 (The Product)
sibility of error and users are advised to confirm the information shall only be used by the individual who purchased this book.
in this product through independent sources. The Product may not be used for profit or commercial gain. The
Product shall only be installed on a single machine at any one
Alan Clements and/or his licensors grant you a non-exclusive
time. No part of the Product shall be made available over a
licence to use this CD to search, view and display the contents of
Wide Area Network or the internet. The title and copyright in all
this CD on a single computer at a single location and to print off
parts of the Product remain the property of David John Baker.
multiple screens from the CD for your own private use or study.
The Product and elements of the Product may not be reverse
All rights not expressly granted to you are reserved to Alan
engineered, sold, lent, displayed, hired out or copied. It shall only
Clements and/or his licensors, and you shall not adapt, modify,
be installed on a single machine at any one time.
translate, reverse engineer, decompile or disassemble any part of
M6800PM/AD—MOTOROLA M68000 FAMILY PROGRAMMERS
the software on this CD, except to the extent permitted by law.
REFERENCE MANUAL Copyright of Motorola. Used by
These terms shall be subject to English laws and the English permission.
courts shall have jurisdiction.
The Cd
3. No Support
Schedule 2
For the avoidance of doubt, this license to use the Software does not provide you
Shrinkwrap Agreement with any right to receive any support and maintenance in respect of the Software.
End User Licence Agreement for the ARM Software Development Toolkit 4. Restrictions on Transfer of Licensed Rights
2.02u Version 2
The rights granted to you under this agreement may not be assigned, sublicensed
or otherwise transferred by you to any third party without the prior written
IMPORTANT READ CAREFULLY PRIOR TO ANY INSTALLATION OR
consent of ARM. You shall not rent or lease the Software.
USE OF THE SOFTWARE
You are in possession of certain software (“Software”) identified in the 5. Limitation of Liability
attached Schedule 1. The Software is owned by ARM Limited (“ARM”) or its THE SOFTWARE IS LICENSED “AS IS”. ARM EXPRESSLY DISCLAIMS ALL
licensors and is protected by copyright laws and international copyright REPRESENTATIONS, WARRANTIES, CONDITIONS OR OTHER TERMS,
treaties as well as other intellectual property laws and treaties. The Software is EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION THE
licensed not sold. You were advised, at the time that the Software was provided IMPLIED WARRANTIES OF NON-INFRINGEMENT, SATISFACTORY
to you, that any use, by you, of the Software will be regulated by the terms and QUALITY AND FITNESS FOR A PARTICULAR PURPOSE.
conditions of this Agreement (“Agreement”).
TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, IN
NO EVENT SHALL ARM BE LIABLE FOR ANY INDIRECT, SPECIAL,
ACCEPTANCE INCIDENTAL OR CONSEQUENTIAL DAMAGES (INCLUDING LOSS OF
If you agree with and accept the terms and conditions of this Agreement it PROFITS) ARISING OUT OF THE USE OR INABILITY TO USE THE
shall become a legally binding agreement between you and ARM Limited and SOFTWARE WHETHER BASED ON A CLAIM UNDER CONTACT, TORT
you may proceed to install, copy and use the Software in accordance with the OR OTHER LEGAL THEORY, EVEN IF ARM WAS ADVISED OF THE
terms and conditions of the Agreement. POSSIBILITY OF SUCH DAMAGES. ARM does not seek to limit or exclude
liability for death or personal injury arising from ARM's negligence and
REJECTION AND RIGHT TO A REFUND because some jurisdictions do not permit the exclusion or limitation of
liability for consequential or incidental damages the above limitation relating
If you do not agree with or do not wish to be bound by the terms and conditions
to liability for consequential damages may not apply to you.
of this Agreement you may NOT install, copy or use the Software.
6. Term and Termination
TERMS AND CONDITIONS
This Agreement shall remain in force until terminated by you or by ARM.
1. Software Licence Grant
Without prejudice to any of its other rights if you are in breach of any of the
ARM hereby grants to you, subject to the terms and conditions of this terms and conditions of this Agreement then ARM may terminate the
Agreement, a non-exclusive, non-transferable, worldwide licence, solely for Agreement immediately upon giving written notice to you.
non-commercial purposes, to;
You may terminate this Agreement at any time.
●
use and copy the Software identified in Schedule 1 Part A and Schedule 1
Upon termination of this Agreement by you or by ARM you shall stop using
Part B; the Software and destroy all copies of the Software in your possession together
●
incorporate into software application programs that you develop, the with all documentation and related materials.
Software identified in Schedule 1 Part B; and The provisions of Clauses 5, 6 and 7 shall survive termination of the Agreement.
●
use the documentation identified in Schedule 1 Part C.
7. General
2. Restrictions on Use of the Software This Agreement is governed by English Law.
Except for the making of one additional copy of the Software for backup pur- This is the only agreement between you and ARM relating to the Software and
poses only, copying of the Software by you is limited to the extent necessary it may only be modified by written agreement between you and ARM. This
for; (a) use of the Software on a single computer; and (b) incorporation into Agreement may not be modified by purchase orders, advertising or other
software application programs developed by you as permitted under the representation by any person.
terms of this Agreement.
If any Clause in this Agreement is held by a court of law to be illegal or
Except to the extent that such activity is permitted by applicable law you shall unenforceable the remaining provisions of the Agreement shall not be
not reverse engineer, decompile or disassemble any of the Software identified affected thereby.
in Schedule 1 Part A. If the Software was provided to you in Europe you shall
not reverse engineer, decompile or disassemble any of the Software identified The failure by ARM to enforce any of the provisions of this Agreement, unless
in Schedule 1 Part A for the purposes of error correction. waived in writing, shall not constitute a waiver of ARM’s rights to enforce
such provision or any other provision of the Agreement in the future.
You shall only use the Software on a single computer connected to a single
monitor at any one time except that you may use the Software from a common Use, copying or disclosure by the US Government is subject to the restrictions
disc running on a server and shared by multiple computers provided that set out in subparagraph (c)(1)(ii) of the Rights in Technical Data and
one authorised copy of the Software has been licensed for each computer Computer Software clause at DFARS 252.227 7013 or subparagraphs (c)(1)
concurrently using the Software. and (2) of the Commercial Computer Software – Restricted Rights at 48 CFR
52.227-19, as applicable.
You shall not make copies of the documentation identified in Schedule 1 Part B.
You agree that you will not export or re-export the Software to any country,
You acquire no rights to the Software other than as expressly provided by this person or entity or end user subject to U.S.A. export restrictions. Restricted
Agreement. countries currently include, but are not necessarily limited to, Cuba, Iran,
You shall not remove from the Software any copyright notice or other notice Iraq, Libya, North Korea, Syria and the Federal Republic of Yugoslavia (Serbia
and shall ensure that any such notice is reproduced in any copies of the whole and Montenegro, U.N. Protected Areas and areas of Bosnia and Herzegovina
or any part of the Software made by you. under the control of Bosnian Serb forces).
I dedicate this edition to all those who have helped me run the IEEE Computer Society’s
International Design Competition since 2001. In particular, I express my gratitude
to the following who have become my friends and mentors.
Andy Bernat
Simon Ellis
Jerry Engle
Robert Graham
David Hennage
Ivan Joseph
Anne Marie Kelly
Kathy Land
Mike Lutz
Fernando Maymi
Stacy Saul
Deborah Scherrer
Janie Schwark
Steve Seidman
READING GUIDE
We’ve already said that this book provides a traditional Chapter 8 is concerned with the quest for performance. We
introductory course in computer architecture plus additional look at how performance is measured and describe three
material to broaden its scope and fill in some of the gaps left techniques used to accelerate processors. All students should
in such courses. To help students distinguish between fore- read about the first two acceleration techniques, pipelining
ground and background material, the following guide will and cache memory, but may omit parallel processing.
help to indicate the more fundamental components of the Chapter 9 describes two contrasting computer architectures.
course. Introductory texts on computer architecture are forced to
Chapter 2 introduces the logic of computers and deals with concentrate on one processor because students do not have
essential topics such as gates, Boolean algebra, and Karnaugh the time to plow through several different instruction sets.
maps. Therefore this chapter is essential reading. However, if we don’t cover other architectures, students can
Chapter 3 introduces sequential circuits such as the counter end the course with a rather unbalanced view of processors.
that steps through the instructions of a program and demon- In this chapter we provide a very brief overview of several
strates how sequential circuits are designed. We first intro- contrasting processors. We do not expect students to learn
duce the bistable (flip-flop) used to construct sequential the fine details of these processors. The purpose of this chap-
circuits such as registers and counters. We don’t provide a ter is to expose students to the range of processors that are
comprehensive introduction to the design of sequential cir- available to the designer.
cuits; we show how gates and flip-flops can be used to create Chapter 10 deals with input/output techniques. We are inter-
a computer. ested in the way in which information is transferred between
Chapter 4 deals with the representation of numbers and a computer and peripherals. We also examine the buses, or
shows how arithmetic operations are implemented. Apart data highways, along which data flows. This chapter is essen-
from some of the coding theory and details of multiplication tial reading.
and division, almost all this chapter is essential reading. Chapter 11 introduces some of the basic peripherals you’d
Multiplication and division can be omitted if the student is find in a typical PC such as the keyboard, display, printer, and
not interested in how these operations are implemented. mouse, as well as some of the more unusual peripherals that,
Chapter 5 is the heart of the book and is concerned with the for example, can measure how fast a body is rotating.
structure and operation of the computer itself. We examine Although these topics are often omitted from courses in com-
the instruction set of a processor with a sophisticated puter hardware, students should scan this chapter to get some
architecture. insight into how computers control the outside world.
Chapter 6 provides an overview of assembly language pro- Chapter 12 looks at the memory devices used to store data in
gramming and the design of simple 68K assembly language a computer. Information isn’t stored in a computer in just
programs. This chapter relies heavily on the 68K cross- one type of storage device. It’s stored in DRAM and on disk,
assembler and simulator provided with the book. You can use CD-ROM, DVD, and tape. This chapter examines the operat-
this software to investigate the behavior of the 68K on a PC. ing principles and characteristics of the storage devices found
in a computer. There’s a lot of detail in this chapter. Some
Chapter 7 begins with a description of the functional units
readers may wish to omit the design of memory systems (for
that make up a computer and the flow of data during the exe-
example, address decoding and interfacing) and just concen-
cution of an instruction. We then describe the operation of
trate on the reasons why computers have so many different
the computer’s control unit, which decodes and executes
types of memory.
instructions. The control unit may be omitted on a first read-
ing. Although the control unit is normally encountered in a Chapter 13 deals with hardware topics that are closely related
second- or third-level course, we’ve included it here for the to the computer’s operating system. The two most important
purpose of completeness and to show how the computer elements of a computer’s hardware that concern the operating
turns a binary-coded instruction into the sequence of events system are multiprogramming and memory management.
that carry out the instruction. These topics are intimately connected with interrupt handling
Reading guide vii
and data storage techniques and serve as practical examples of computer networks are not always covered by first-level texts
the use of the hardware described elsewhere. Those who on computer architecture. However, the growth of both local
require a basic introduction to computer hardware may omit area networks and the Internet have propelled computer
this chapter, although it best illustrates how hardware and communications to the forefront of computing. For this rea-
software come together in the operating system. son we would expect students to read this chapter even if
Chapter 14 describes how computers can communicate with some of it falls outside the scope of their syllabus.
each other. The techniques used to link computers to create
INDEX
ABC computer 11
Absolute address 215 260
Accelerating performance 325
Access time 497 499
Access time, disk 527
Accumulator 296 369
Accuracy 182
ACIA 426 586
ACIA format control 430
ACIA organization 429
ACIA status register 431
Acquisition time 479
Active matrix LCD 459
Actuator 525
Ada Gordon 8
ADC 218 468
ADC, error function 470
ADC, integrating 484
ADC, parallel 479
ADC, performance 469
ADC, potentiometric network 475
ADC, ramp feedback 481
ADC, successive
approximation 482
ADC, tracking 482
Add with carry 218
ADD 235
ADDA 236
ADDEQ 380
Adder, full 171
Adder, half 170
Adder, parallel 173
Adder, serial 173
Addition, extended 219
Addition, words 173
Additive colors 460
Address decoder 508
Address decoder 68K 516 517 518
Address decoder, PROM 512
Address field, HDLC 601
Address mapper 319
Address mapping table 564 565
Address path 294
Address register indirect 215 216 250
Address register indirect,
applications 252
Address register indirect, ARM 383
Address register indirect,
overview 251
Associative law 57
Associative memory 21
Associative-mapped cache 348
Asynchronous counter 128 129
Asynchronous system 114
Asynchronous transmission 428
ATA 529
Atanasoff, John 11
ATN (attention) 408
Attenuation 587
Audio visual drive 529
Autohandshaking, PIA 427
Auto-indexing 384
Automatic control 17
Avalanche effect, memory 506
Axioms, Boolean algebra 56 57
Babbage, Charles 7 10
Backplane 403
Band-gap device 463
Bandwidth 497 576
Base, number 148
Batch mode OS 548
Baudrate 576
Benchmark 326
BEQ 298
Berkeley RISC 330
Berners-Lee, Tim 15
Best effort service 582
BGE 246
B-H characteristic 529
Biased exponent 183 184
Bidirectional, data path 21
Big Endian 235
Binary arithmetic 169
Cable 8
Cable, coaxial 593
Cable, copper 592
Cable terminology 595
Cache, associative-mapped 348
Cache, design considerations 350
Cache, direct-mapped 346
CSMA/CD 595
CTS 593
Current processor status
register 376
Curriculum, hardware 2 3
Cursor 437
Cycle stealing DMA 422
Cycle time, memory 500
Cycles per instruction 326
Cyclic redundancy code 533
Cylinder, disk 526
D flip-flop 109
D flip-flop, use in registers 110
D flip-flop circuit 110
DAC 473
DAC, basic principles 473
DAC, R-2R 475
DAC errors 476
Daisy chaining 421
Data carried detect 429 431
Data compressing code 161
Data density, disk 530
Data dependency, pipeline 338
Data direction register, PIA 425
Data encoding, recording 521
Data link layer 581 599
Data link layer, Ethernet 603
Data movement instructions 218
Data path 209
Data processor, computer 15 16
Data registers 235
Data setup time 501
Data structures 4
Data transfer, closed-loop 401
applications 488
Digital to analog converter, see DAC
Digital Works, 172
Digital Works, binary up counter 130
Digital Works, clock speed 47
Digital Works, connecting
gates 43
Digital Works, creating a circuit 41
Digital Works, creating a register 111
Digital Works, embedded circuits 50
Digital Works, introduction 40
Digital Works, logic history 47
Digital Works, macro 50 52
Digital Works, pulse generator 131
Digital Works, recording outputs 46
Digital Works, running 45 46
Digital Works, sequence generator 48
Digital Works, tri-state gate 90
Diode bridge 478
Direct memory access 422
Directive, assembler 229
Direct-mapped cache 346
Disc capacity 519 526
Discrete signal 26
Disk, data density 530
Disk, head assembler 526
Disk, Winchester 527
Disk data structures 533
Disk drive history 525
Disk drive principles 524
Disk drive progress 530
Disk interface 529
Disk mirroring 531
Disk shock 528
Displacement, addressing 252
Displacement, relative 259
Faggin, Federico 13
Fast page mode DRAM 504
FAT 535
Fault 97
Fault, OS 555
Fault, undetectable 97
FCS field 602
Feedback ADC 480
Feedback memory 496
Ferrite core 11
Ferromagnetic material 517
Ferromagnetism 498
Fetch cycle 21
Fetch-execute cycle 296 301 320
Fetch-execute, flip-flop 314
Fiber optic links 595
Field (display) 445
History, computer 6
Hit, cache 344
Hit ratio 345
Holerith, Herman 10
Hue 457
Huffman code 164 574
Hybrid topology 359
Hypercube topology 357
Hysteresis loop 519
Jacquard, Joseph 7
JK flip-flop 120 132
JK flip-flop, circuit 122
JK flip-flop, state diagram 128
Job control language 548
Joystick 440 442
JSR 266
Jump, delayed 338
LAN 570
LAN characteristics 571
Land, CD 537
Laser 537
Laser, color 461
Laser printer 455
Latency 404 497
LCD, color 459
LCD, transmissive mode 449
LDC, reflective mode 449
LCD cell 449
M68HC12 368
MAC 601 604
Machine, von Neumann 20
Machine level 206
Macroinstruction 315
Magnetic core 519
Magnetic disk 495
Magnetic surface recording 515
Magnetic tape 495
Magnetism 498
Magneto-optical disk 541
Magnetoresistive head 530
Mainframe 10
Majority logic 34 173
Malware 567
Manchester encoding 523
Mantissa 183
NaN 185
NAND gate 31
NAND logic 65
Navigation 6
N-bit 299
NDAC, IEEE bus 409
Negative logic 32
Negative number 175
Nematic liquid crystal 448 449
Nested subroutines 225
Network interface card 572
Network layer 581
Noise 584 590
Noise, quantization 469
Noise immunity 27 522
Non-linear error 477
Non-maskable interrupt 418
Non-restoring division 195
Non-return to zero encoding 522
NOR gate 31
NOR logic 65
Normalization 183
Normalize 186
Not a number 185
NOT gate 31
Noyce, Robler 12
NRDF, IEEE bus 409
Null byte 255
Number, floating point 181
ORG 230
Organic display 451
Organization, multiprocessor 353
Organization and architecture 205
Originate/answer 592
OSI 578
Output enable 499
Overflow 179
Push 264
QAM 590
Quad precision 185
Quadtree 167
Quantization 468
Quantization noise 469
QWERTY 10 436
Redundancy 98
Redundant bits 157 163
Reed relay 436
Reference model for OSI 578
Refreshing RAM 502
Register 207
Register, address 236
Register, CCR 218
Register, index 369
Register, link 376
Register, shadow 376
Register, using D flip-flops 110
Register selection, PIA 425
Register set 68K 211 217
Register sets 365
Register to register architecture 213 294
Register transfer language 208
Register window 330
Register windows, parameters 332
Registers 68K 234
Registers, ARM 375
Registers, windowed 333
Relative addressing 259
Relative branch 260
Relay 31
Relay, reed 436
Remnant magnetism 519
Request to send 429
Reset, logic element 110
Resolution, monitor 459
Restoring division 194
Return to zero encoding 522
Return from exception 558
Return to bias recording 522
Return, subroutine 225
Reverse subtract instruction 377
T flip-flop 121
Table search 556
Tachometer 462
Talker, IEEEE 488 bus 407
Task control block 553
TCP/IP 607
Telegraph 8
Telegraph distortion 9
Telephone 9
Telephone network, origins 575
Templates, control structures 246
Ten’s complement arithmetic 176
Test equivalence instruction 378
Testing digital circuits 96
The last mile 591
Theorems, of Boolean algebra 56
Thermal printer 453
Thermal wax printer 461
Thermistor 463
Thermocouple 463
Thermoelectric effect 463
Thermoelectric junction 463
Thin film transistor 460
Three address instruction 211
Three-wire handshake 408
Time-division multiplexing 585
Time-to-live, routing 608
Timing delay 113
Timing diagram, memory 499
Timing diagram, static Ram 500
Timing pulse generator 313
Token rings 606
Toner 455
Topology, bus 356
Topology, cluster 360
Topology, hybrid 359
Topology, hypercube 357
Topology, ring 357
Topology, star 357
Topology, unconstrained 356
TOS 262
Total internal reflection 595
WAN 580
This page has been reformatted by Knovel to provide easier navigation.
Index Terms Links
Xerography 455
XGA 448
Z transform 489
Z-bit 299
Zero address machine 213 366
Zilog 13
Zoning 528
Zuse, Konrad 10
This appendix provides details of the 68000’s most important Finally, we have included the effect of the instruction on
instructions (we have omitted some of the instructions that the 68000’s condition code register. Each instruction either
are not relevant to this book). sets/clears a flag bit, leaves it unchanged, or has an ‘undefined’
In each case, we have given the definition and assembly effect, which is indicated by the symbols *, -, and U, respec-
language format of the instruction. We have also provided its tively. A 0 in the CCR indicates that the corresponding bit is
size (byte, word, or longword) and the addressing modes it always cleared.
takes for both source and destination operands.
Application: ASL multiplies a two’s complement number by 2. ASL is almost identical to the corresponding
logical shift, LSR. The only difference between ASL and LSL is that ASL sets the V-bit of the CCR
if overflow occurs, whereas LSL clears the V-bit to zero. An ASR divides a two’s complement
number by 2. When applied to the contents of a memory location, all 68000 shift operations
operate on a word.
Condition codes: X N Z V C
* * * * *
The X-bit and the C-bit are set according to the last bit shifted out of the operand. If the shift
count is zero, the C-bit is cleared. The V-bit is set if the most-significant bit is changed at any
time during the shift operation and cleared otherwise.
Destination operand addressing modes
616 Appendix: The 68000 instruction set
Data register direct addressing, Dn, uses a longword operand. Other modes use a byte operand.
Condition codes: X N Z V C
- - * - -
Z: set if the bit tested is zero, cleared otherwise.
Destination operand addressing mode for BSET Dn,ea form
CMP Compare
Operation: [destination]-[source]
Syntax: CMP ea,Dn
Sample syntax: CMP (Test,A6,D3.W),D2
Attributes: Sizebyte, word, longword
Description: Subtract the source operand from the destination operand and set the condition codes accord-
ingly. The destination must be a data register. The destination is not modified by this
instruction.
Condition codes: X N Z V C
- * * * *
Source operand addressing modes
(Ffalse, and Ttrue). Many assemblers permit the mnemonic DBF to be expressed as DBRA (i.e.
decrement and branch back).
The condition tested by the DBcc instruction works in the opposite sense to a Bcc. For exam-
ple, BCC means branch on carry clear, whereas DBcc means continue (i.e. exit the loop) on carry
clear. That is, the DBcc condition is a loop terminator. If the termination condition is not true, the
low-order 16 bits of the specified data register are decremented. If the result is 1, the loop is not
taken and the next instruction is executed. If the result is not 1, a branch is made to ‘label’. The
label is a 16-bit signed value, permitting a branch range of 32 to 32 kbyte. The loop may be exe-
cuted up to 64K times.
We can use the instruction DBEQ, decrement and branch on zero, to mechanize the high-level
language construct REPEAT...UNTIL.
LOOP ... REPEAT
...
... [D0]:[D0]1
...
DBEQ D0,REPEAT UNTIL [DO]1 OR [Z]1
Application: Suppose we wish to input a block of 512 bytes of data (the data is returned in register D1). If the
input routine returns a value zero in D1, an error has occurred and the loop must be exited.
LEA Dest,A0 Set up a pointer to the data destination
MOVE.W #511,D0 512 bytes to be input
AGAIN BSR INPUT Get a data value in D1
MOVE.B D1,(A0)⫹ Store it
DBEQ D0,AGAIN REPEAT until D10 OR 512 times
Condition codes: X N Z V C
- - - - -
Not affected.
The X-bit is not affected by a division. The N-bit is set if the quotient is negative. The Z-bit is set
if the quotient is zero. The V-bit is set if division overflow occurs (in which case the Z- and
N-bits are undefined). The C-bit is always cleared.
Source operand addressing modes
Description: Exchange the contents of two registers. This is a longword operation because the entire 32-bit
contents of two registers are exchanged. The instruction permits the exchange of address regis-
ters, data registers, and address and data registers.
Application: One application of EXG is to load an address into a data register and then process it using
instructions that act on data registers. Then the reverse operation can be used to return the
result to the address register. Using EXG preserves the original contents of the data register.
Condition codes: X N Z V C
- - - - -
register A0, offset by the contents of data register D0. Note that JMP provides several addressing
modes, while BRA provides a single addressing mode (i.e. PC relative).
Condition codes: X N Z V C
- - - - -
Source operand addressing modes
X-bits of the CCR. The shift count may be specified in one of three ways. The count may be a
literal, the contents of a data register, or the value 1. An immediate count permits a shift of 1
to 8 places. If the count is in a register, the value is modulo 64—from 0 to 63. If no count is
specified, one shift is made (e.g. LSL ea shifts the word at the effective address one posi-
tion left).
Description: Move the contents of the user stack pointer to an address register or vice versa. This is a privileged
instruction and allows the operating system running in the supervisor state either to read the con-
tents of the user stack pointer or to set up the user stack pointer.
Condition codes: X N Z V C
- - - - -
Condition codes: X N Z V C
- * * 0 0
Source operand addressing modes
NEG Negate
Operation: [destination] ← 0[destination]
Syntax: NEG <ea>
Attributes: Sizebyte, word, longword
Description: Subtract the destination operand from 0 and store the result in the destination location. The dif-
ference between NOT and NEG is that NOT performs a bit-by-bit logical complementation,
whereas NEG performs a 2’s complement arithmetic subtraction. All bits of the condition code
register are modified by a NEG operation; e.g. if D3.B 111001112, the logical operation
NEG. B D3 results in D3000110012 (XNZVC10001) and NOT.B D3 results in D3
000110002 (XNZVC0000).
Condition codes: X N Z V C
* * * * * Note that the X-bit is set to the value of the C-bit.
Destination operand addressing modes
NOP No operation
Operation: None
Syntax: NOP
Attributes: Unsized
Description: The no operation instruction NOP performs no computation. Execution continues with the
instruction following the NOP instruction. The processor’s state is not modified by an NOP.
632 Appendix: The 68000 instruction set
Application: NOPs can be used to introduce a delay in code. Some programmers use them to provide space
for patches—two or more NOPs can later be replaced by branch or jump instructions to fix a
bug. This use of the NOP is seriously frowned upon, as errors should be corrected by reassem-
bling the code rather than by patching it.
Condition codes: X N Z V C
- - - - -
OR OR logical
Operation: [destination] ← [source][destination]
Syntax: OR <ea>,Dn
OR Dn,<ea>
Attributes: Size byte, word, longword
Description: OR the source operand to the destination operand and store the result in the destination
location.
Application: The OR instruction is used to set selected bits of the operand. For example, we can set the four
most-significant bits of a longword operand in D0 by executing:
OR.L #$F0000000,D0
Condition codes: X N Z V C
- * * 0 0
Source operand addressing modes
ORI OR immediate
Operation: [destination] ← <literal>[destination]
Syntax: ORI #<data>,<ea>
Attributes: Sizebyte, word, longword
Description: OR the immediate data with the destination operand. Store the result in the destination
operand.
Condition codes: X N Z V C
- * * 0 0
Application: ORI forms the logical OR of the immediate source with the effective address, which may be a
memory location. For example,
ORI.B #%00000011,(A0)⫹
Destination operand addressing modes
Condition codes: X N Z V C
- * * 0 *
The X-bit is not affected and the C-bit is set to the last bit rotated out of the operand (C is set to
zero if the shift count is 0).
Destination operand addressing modes
Condition codes: X N Z V C
* * * 0 *
The X- and the C-bit are set to the last bit rotated out of the operand. If the rotate count is zero,
the X-bit is unaffected and the C-bit is set to the X-bit.
Destination operand addressing modes
636 Appendix: The 68000 instruction set
TRAP Trap
Operation: s ← 1;
[SSP] ← [SSP]4; [[SSP]] ← [PC];
[SSP]←[SSP]2; [[SSP]] ← [SR];
[PC] ←vector
Syntax: Trap #<vector>
Appendix: The 68000 instruction set 639
Attributes: Unsized
Description: This instruction forces the processor to initiate exception processing. The vector number used by
the TRAP instruction is in the range 0 to 15 and, therefore, supports 16 traps (i.e. TRAP #0 to
TRAP #15).
Application: The TRAP instruction is used to perform operating system calls and is system independent. That
is, the effect of the call depends on the particular operating environment. For example, the
University of Teesside 68000 simulator uses TRAP #15 to perform I/O. The ASCII character in
D1.B is displayed by the following sequence.
MOVE.B #6,D0 Set up to display a character parameter in D0
TRAP #15 Now call the operating system
Condition codes: X N Z V C
- - - - -
UNLK Unlink
Operation: [SP] ← [An]; [An] ← [[SP]]; [SP] ← [SP]4
Syntax: UNLK An
Attributes: Unsized
Description: The stack pointer is loaded from the specified address register and the old contents of the pointer
are lost (this has the effect of collapsing the stack frame). The address register is then loaded with
the longword pulled off the stack.
Application: The UNLK instruction is used in conjuction with the LINK instruction. The LINK creates a
stack frame at the start of a procedure, and the UNLK collapses the stack frame prior to a return
from the procedure.
Condition codes: X N Z V C
- - - - -