0% found this document useful (0 votes)
2 views

Notes Day8

The document discusses the construction and operation of basic logic gates using transistors, particularly focusing on CMOS technology for digital design. It explains how controlled switches, like transistors, are utilized to create logic gates such as inverters, NAND, and NOR gates, emphasizing the principle of complementarity in CMOS. Additionally, it covers integrated circuit manufacturing processes, including crystal growth, doping, photolithography, and the importance of clean rooms to ensure high yield in semiconductor fabrication.

Uploaded by

ly3924266
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Notes Day8

The document discusses the construction and operation of basic logic gates using transistors, particularly focusing on CMOS technology for digital design. It explains how controlled switches, like transistors, are utilized to create logic gates such as inverters, NAND, and NOR gates, emphasizing the principle of complementarity in CMOS. Additionally, it covers integrated circuit manufacturing processes, including crystal growth, doping, photolithography, and the importance of clean rooms to ensure high yield in semiconductor fabrication.

Uploaded by

ly3924266
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

MSc Electronic and Computer Engineering

Digital Design

38 Technologies for Logic Implementation


So far we have looked at how to write high-level descriptions of systems in VHDL,
which can then be synthesized into netlists of logic gates or other suitable building
blocks. In this unit we will look at how basic logic gates are constructed.

38.1 Switch logic


Let’s begin with the simplest case possible. How could we make an inverter? The key
to this is to find a device that can behave as a controlled switch.

Left Right

Control

The switch connects the left contact to the right contact. The state of the switch (i.e.
open or closed) depends on what is happening at the control contact. If the control
contact is at one logic state, the switch is closed. If it is at the other logic state, the
switch is open. Now we can build an inverter:

1 1
Output
Output
0 0
Input=1 Input=0

We have used two different versions of our controlled switch. One is closed when the
input is 1; the other is open when the input is 1. This arrangement steers a 0 to the
output when the input is 1, and steers a 1 to the output when the input is 0. It functions
as an inverter.

38.2 The transistor as a switch


There are many possible ways to make controlled switches; in the past mechanical
mechanisms, magnetic relays and vacuum tubes have been used. All modern digital
technology is based around using semiconductor transistors as the controlled
switches. Logic gates are built up from transistors, which act as controlled switches
that steer a 1 or 0 onto the output. Various logic families exist; they differ in the types
of transistors they use, and the configurations in which these transistors are used. In
this lecture we will look at the CMOS family, which is the technology used for almost
all commercial microprocessors and microcontrollers.

The desirable features of a logic family are as follows


 High speed
 Low power dissipation (to avoid overheating, and to maximize battery lifetime)
 Low cost

1
 The ability to pack large numbers of gates onto a single chip
The relative priority of these features depends on the intended application. For most
applications, the best compromise is achieved by logic families based on MOS
transistors. These score well on power dissipation, cost and packing density, but
compromise somewhat on speed. The two most important types of MOSFET are
shown below:

Type Symbol Behaviour


D S D S D
n-channel d <= s when g=’1’
G=0 G=1 else ‘z’;
G
S
S S D S D
p-channel d <= s when g=’0’
G=0 G=1 else ‘z’;
G
D

The MOSFETs have three terminals, the gate (G), the source (S) and the drain (D).
For the n-channel transistor, the behaviour is as follows:
 The region between the drain and the source behaves like a switch.
 If the gate is 1, then the switch is closed, the source is connected to the drain, and
the transistor is said to be on.
 If the gate is zero, then the switch is open, the source is disconnected from the
drain, and the transistor is said to be off.

The p-channel device responds to the gate in exactly the opposite manner. It is turned
on by a 0 at the gate, and off by a 1 at the gate.

38.3 The CMOS inverter


The basic inverter is shown below.

V
DD

Pull-up

Input
Output

Pull-down

V
SS

VDD is the high voltage supply (which can be regarded as a source of logic 1’s). VSS is
the reference voltage, usually taken to be zero (and can be regarded as a source of
logic 0’s).

2
Whatever is connected between the output and VSS is called the pull-down network,
because it tends to pull the output down to 0. For this simple gate, the pull-down is a
single transistor. Similarly, whatever tends to connect the output to VDD is called the
pull-up network because it tends to pull the output up to 1.

The operation of the inverter is very simple. When Input=1, the pull-down is switched
on and the pull-up is off, so the output is pulled to 0. When Input=0, the pull-down is
switched off and the pull-up is on, so the output is pulled to 1.

38.4 The CMOS NAND and NOR gates


The techniques used for the inverter can be generalized to more complicated gates, by
putting more transistors into the pull-up and pull-down networks. For example, the
NAND gate is shown below.

V
DD

A B

Out

V
SS

CMOS NAND gate

Out  NOT( A AND B)  A NANDB  A.B

A B A NAND B
0 0 1
0 1 1
1 0 1
1 1 0

The two pull-down transistors are in series. So there will only be a conducting path
from Out to VSS if both transistors are turned on. The pull-down transistors are both
on if A=1 and B=1.

The pull-up transistors are in parallel, so there will be a conducting path from Out to
VDD if either of the transistors is switched on. This will happen if A=1 or B=1.

We can figure out how to build a NOR gate by using a similar procedure.

3
V
DD
A

Out

A B

V
SS

CMOS NOR gate

Out  NOT( A OR B)  A OR B  A  B

A B A NOR B
0 0 1
0 1 0
1 0 0
1 1 0

38.5 Complementarity
If you look carefully at the CMOS gates above, you will see that for any combination
of inputs, either the pull-up or the pull-down is on. They are never on simultaneously
or off simultaneously. This is the principle of complementarity, which gives the
CMOS family its name. CMOS stands for Complementary MOS. Most logic families
other than CMOS will sometimes create a low resistance path between the power
rails, which causes high power dissipation, draining the battery and overheating the
chip. CMOS does not have this undesirable property, because complementarity means
that the pull-up and pull-down will not be on at the same time, no matter what the
input condition. For this reason, almost all high density digital integrated circuits are
built using the CMOS approach.

You should now know...


 How transistor can be used to form controlled switches
 How logic gates can be built from transistors
 How CMOS ensures complementarity

4
39 Integrated Circuit Manufacture
In this lecture we will look at silicon is process to create transistors, and what factors
impact the yield of the working devices that result from the manufacturing process.
This will lead to come important conclusions that govern the cost and impact on the
design process.

39.1 Integrated circuits


So far we have looked at how to split a complicated specification into logic gates, and
how to turn these logic gates into real circuits using transistors. In practice, these logic
circuits are manufactured as integrated circuits (“silicon chips”). An integrated circuit
is simply a piece of semiconductor (usually silicon) within which a number of devices
have been made. The dimension of these devices can be made very small: much less
than a micron with present technology. It is therefore possible to place a very large
number of such devices onto a single small piece of silicon, and connect them up to
form a complicated circuit.

The key to the widespread deployment of cheap computing is the ability to place an
entire computer, or at least the major sub-systems of a computer, onto a single tiny
silicon chip. During the last three decades, enormous progress has been made in
increasing the complexity and speed of the systems that can be placed onto a chip.

An integrated circuit (IC) is a circuit fabricated on a small piece (usually called a chip
or a die) of very pure semiconductor (usually silicon). The conduction properties of
the silicon can be altered by introducing impurity atoms (dopants) in a process known
as doping. By altering the conduction properties in an appropriate pattern, various
devices can be made. Also, various layers can be deposited on top of the silicon.
Insulating layers are formed by deposits of glass (silicon dioxide), and conducting
layers are formed by metal (usually aluminium or copper).

A chip sits inside a package:

Moulding compound

Lead frame

Bond wire

Chip

Pins

The package has the following functions


 to provide mechanical support,
 to protect it from the outside world (ICs are easily corroded by moisture),
 to efficiently conduct away any heat generated by the chip during its operation.
(Failure mechanisms are greatly accelerated by elevated temperatures).
 to enable the chip to make easy and reliable electrical contact to the outside world.

5
39.2 Integrated circuit fabrication

39.2.1 Crystal growth


Silicon crystals are grown in cylindrical ingots. These are cut down into wafers,
which are usually about 20 cm in diameter, and less than 1 mm thick. The integrated
circuits are formed on the wafer by the process that will be introduced in the next
section. ICs are typically 1cm x 1cm, so a large number of ICs can be made on one
wafer. Once the ICs have been completed, the wafer is sawn up into individual dice or
chips, each of which contains one of the integrated circuits. Finally, the chip is placed
in its package.

Silicon ingot Silicon wafer Wafer with integrated


circuits fabricated
39.2.2 Doping
Electrical current is the movement of charges. Pure silicon has few mobile charges,
and is not a particularly good conductor. The conduction properties of silicon can be
modified by introducing impurity atoms. This process is called doping, and the added
impurities are called dopants.

When neighbouring regions of silicon contain different amounts of dopants, there


tends to be a barrier to the free flow of charge between the regions. By controlling the
precise location and amount of dopants, we can control where these barriers form and
how high they are, and so we can build transistors, devices that act as controlled
switches.

39.2.3 Photolithography
The main stages involved in making an integrated circuit are introduced in this
section. The main idea is to have a picture of the size and shape of the feature that we
want to make. The feature could be a region of metal, insulator, or doping. This
picture is called a mask. The process used to transfer the pattern of the mask into a
feature on the silicon is a form of photographic process called photolithography. We
will illustrate the process by showing how we would make an doped region of a
particular shape.

The process starts by taking a very pure piece of silicon, and heating it up in an
oxygen atmosphere. The top surface of the silicon oxidises, forming a layer of silicon
dioxide (the stuff that common glass is made from). This oxide layer is used as a
barrier to provide protection for the silicon.

6
On top of this oxide we place a layer of photoresist. This is a photographic chemical
that changes is solubility properties on exposure to light.

SiO2 Photoresist
Si
SiO2
Si

We then expose the top surface to ultra-violet light. We shield some of the top surface
by using a mask, which is opaque in places. Where the top surface was exposed to UV
light, the photoresist becomes insoluble. In the region that was shadowed by the mask,
the photoresist remains soluble. We then dissolve off the photoresist using an organic
solvent. The photoresist leaves an exposed silicon dioxide surface in exactly the
pattern that was present on the mask.

UV radiation
Developed image

Mask

Next we use an acid that attacks silicon dioxide but not photoresist. We use this to
etch a hole in the oxide layer to expose the surface of the silicon. We then use a
different type of solvent to get rid of the remaining photoresist.

SiO2 etched Photoresist removed

Finally we heat the chip up , and expose its surface to a gas that contains atoms of the
dopant. The dopant atoms will soak slowly into the top of the silicon, giving rise to a
doped region that has exactly the shape of the opaque region on the mask.

Expose to
gaseous form
of dopant

Doped region

This process is repeated many times to build up a series of layers of the desired
properties.

7
39.3 The need for clean rooms
The processing steps must be carried out in clean rooms. These are rooms that are
isolated from the outside world in order to prevent contamination. People who work
in these rooms must wear special clothing to prevent their dandruff, skin flakes,
exhaled moisture, and skin grease from contaminating the clean environment.

To illustrate this, imagine that we have removed the photoresist and are ready to
expose the surface to gaseous dopant. At this point a small speck of dust lands on the
surface. The dust will mask the surface from the dopants, and the doped region will be
the wrong shape.

Dust

Doped region is
interrupted under the dust

Such a fault would cause the entire integrated circuit to be useless.

39.4 The size-yield trade-off


Our chips can also be spoilt by imperfections in the silicon crystal. Imperfections can
be reduced by using very careful processing, but the presence of some imperfections
is unavoidable. The imperfections tend to occur in a random manner across the wafer.
These imperfections impose an upper limit on the size of chip that we can
economically make. For example, imagine that we have a wafer with following defect
pattern

The crosses represent faults in the crystal. Now let's look at three possible scenarios
for how we can use this wafer:
1. Making a large number of small chips
2. Making a medium number of medium chips
3. Making a small number of large chips

8
1 2 3

Wherever there is a cross, that indicates a fault, and that chip will not work. Let's
calculate what percentage of our chips work (this number is called the yield of our
process).

1. 104 chips work out of a total of 112. The yield is 93%


2. 16 chips work out of a total of 24. The yield is 67%
3. None of the chips work. The yield is zero.
This illustrates the problem of making chips too large. In practice, it is rare to make a
chip much larger than 1cm x 1cm.

This leads to an important conclusion. If we want to place more and more powerful
computers onto a single chip, we need to increase the number of transistors on the
chip. But we cannot increase the size of the chip, so the only way to fit more
transistors in is to reduce the size of the individual transistors.

You should now know...


 How integrated circuits are manufactured,
 Why IC manufacture must be done in a clean room
 Why yield considerations impose an upper limit on die size.

9
40 Trends in IC Power Dissipation
For the devices used in computers and smartphones, perhaps the most important
factor limiting these devices is their power dissipation. For desktop and laptop
computers, this is because overheating can cause the device to malfunction and there
are limits to how much cooling can be added to the computer within a reasonable
budget. For smartphones, tablets and laptops, this is because battery lifetime is highly
valued by potential customers, and a device with very poor lifetime won’t sell. In this
lecture we’ll look at the key factors that establish how much power is dissipated by an
integrated circuit. To do this, we will need to look at MOSFET operation

40.1 MOSFET construction


We control electrical conduction within a semiconductor by introducing dopants at
the time of device manufacture. Depending on which atomic species we use for the
doping, we can get two different types of semiconductor:
 n-type is rich in mobile electrons which carry negative charge
 p-type is rich in mobile holes (the lack of a binding electron where normally
we would expect to find one) which carry positive charge

The manufacturing process for an n-channel MOSFET is shown below:

We start with a piece of silicon that


has been doped p-type

We oxidise the top surface to produce


a very thin layer of silicon dioxide,
which is an insulator. This will
become the gate of the transistor

We use a photolithographic process to


diffuse n-type dopants into two
regions which will become the source
and the drain

We deposit metal on the top surface to


make a good electrical contact to the
wiring for the source, the gate and the
drain

10
40.2 MOSFET operation
In an n-channel MOSFET, current flow between source and drain is caused by a
continuous flow of mobile electronics. These are abundant in the n-type regions of the
source and the drain. However, the region under the gate is p-type. Mobile electrons
experience an energy barrier on trying to cross over from an n-type region to a p-type
region, and are unable to cross unless voltages are applied to the gate in a way that
reduces this barrier.

Suppose we put a logic 1 on the drain and a logic 0 on the gate, and let’s assume that
we are using a 3V power supply:

This gate voltage doesn’t provide energy to electrons to assist them to cross through
the p-region between the source and the drain. Electron flow is blocked and the device
is turned off.

Now, suppose that we put a logic 1 on the gate:

The positive voltage on the gate drives positive charges onto the gate metal. These
positive charges are attractive to electrons, so it becomes energetically favourable for
electrons to be in the region under the gate. The barrier to electrons moving from the
source toward the drain is therefore lowered. Electronics form a conducting channel
between the source and the drain, and the device is switched on.

40.3 MOSFET energy storage


We now want to analyse how much energy is stored as the device switches. The
important dimensions are the length L and width W of the gate.

11
The positive charge on the gate metal and the negative charge in the MOSFET
channel are separated by an insulator. This forms a parallel plate capacitor:

The capacitance C of the parallel plate capacitor is

𝜖𝑊𝐿
𝐶=
𝑡

 is the permittivity, a fundamental property of the material used to form the


insulating layer, in this case silicon dioxide.

When we charge the capacitor up to a voltage V, the energy E stored is

𝐶𝑉
𝐸=
2

40.4 Power dissipation of an inverter


Suppose we take an inverter and apply a logic 1 to its input:

Charge will flow from the VDD rail and the transistors’ gates will charge up to a
voltage of VDD. Each transistor will store

12
𝐶𝑉
𝐸=
2
Now if we apply a logic 0 to the gate, the charge on the gates will transfer to the VSS
rail. The energy that has been transferred from VDD to VSS is burned off as heat.

Suppose we charge and discharge many times per second at a frequency of f. The
power P dissipated by each transistor is:
𝐶𝑉 𝑓
𝑃=
2

The capacitance is proportional to the length L and the width W of the transistors. So
power P is proportional to:
𝑃 ∝ 𝑊𝐿𝑉 𝑓

40.5 Power dissipation of an integrated circuit


Suppose we have an integrated circuit containing N transistors operating from a clock
frequency fclock. The overall power dissipation P of the integrated circuit will be
proportional to

𝑃 ∝ 𝑁𝑊𝐿𝑉 𝑓

This is an important relation, and tells us what factors tend to increase the power
dissipation of a circuit:
 N: Number of transistors on the IC
 W: Width of the transistors
 L: Length of the transistors
 VDD: Supply voltage
 fclock: Clock frequency

Simple devices, such as microcontrollers that must operate for many months from a
single battery charge, can keep their power dissipation low by using a very simple
design (low value of N) and a very modest clock frequency.

The devices used in computers, tablets and smartphones need maximum


computational speed and complexity, so N and fclock are inevitably as high as we can
make them. This means that we need to reduce supply voltage VDD. Early PCs ran
from a 5V supply, but the core of modern processors run from supplies in the range
0.9V to 1.3V. We can’t go much lower than this without losing speed (because
voltage is what pushes charges to move quickly) and noise immunity (because a small
voltage difference between a 0 and a 1 means that random noise voltages can cause
bits to be flipped to the wrong value). As a result, the only effective strategy left to us
is to shrink the transistor dimensions W and L.

You should now know...


 The factors that determine power dissipation of an integrated circuit
 Why high speed devices need miniaturization of their transistors

13
41 Trends and Trade-Offs in Integrated Circuit Manufacture
We gave seen in previous unts that number of transistors that we can fit onto a single
chip is limited by yield considerations. Miniaturising the transistors used for an
integrated circuit maximises the transistor count within a given chip size, and also
gives gains in speed and power dissipation. However, extreme miniaturisation makes
the manufacturing process very costly. In this lecture we will look at the various
factors involved in trading off these issues.

41.1 The benefits of miniaturisation


There are a number of reasons why it is useful to place an entire computer sub-system
onto a single chip. The most important are:
 reduced volume and weight (essential for portable and wearable equipment)
 reduced cost
 reduced power consumption (less battery drain)
 increased speed
The increased speed comes about mainly because everything is closer together in the
circuit. Signals travel at a finite speed, and in an IC the signals need travel only a few
microns, whereas in a discrete circuit they must travel a few centimetres

The following graphs show the progress that has been made in the miniaturisation of
microprocessor chips1. The first graph shows the number of transistors on a single
chip, and the second shows the size of the basic features (e.g. the transistors and
wires) on the chips.

Transistors on a single chip (millions) Transistor size (microns)


10000 10
1000
100
1
10
1
0.1
0.1
0.01
0.001 0.01
1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020

The two graphs on the next page show the speed that has been achieved. The left hand
graph shows the clock frequency, and the right hand graph shows the length of the
clock cycle. In all of these categories, massive progress has been sustained in the past
thirty years. The current state of the art is that speeds of about 3.8 GHz can be
achieved using transistors about 5 nm in size and chips containing a few billion
transistors. Clock frequencies have stopped rising in recent years because the power
dissipation of a CMOS chip scales linearly with its clock frequency. Once we go
above a sustained 3.5 GHz clock speed, the power dissipated within the chip rises to

1
These figures refer to the processors used in cheap PCs, e.g. Intel 80X86 / Pentium / Core.

14
about 150W. Even with very large and powerful fans attached to the processor, it is
hard to stop the processor from overheating, causing malfunction.

Clock frequency (MHz) Cycle time (ns)


10000 10000

1000 1000

100 100

10 10

1 1

0.1 0
1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020

To set up a manufacturing facility that can produce at this state of the art can cost a
few billion dollars. Once the factory has been set up, it can produce many millions of
chips per year, so the cost can be recouped quite quickly. Only the very biggest
companies, such as Intel and IBM, can afford to build their own manufacturing plants,
so most companies hire time on someone else’s manufacturing facility. Facilities that
are used mainly for manufacturing third party designs are silicon foundries. The most
well-known silicon foundry group is TSMC, whose most advanced process currently
manufactured is 3 nm. Some very large companies choose not to construct their own
fabrication plants, relying instead on silicon foundries, so that they can focus their
efforts on the design of the integrated circuits. Examples of these “fabless” companies
include Apple (for the processors in iPhones and iPads) and AMD (the main rival to
Intel for processors that go into PCs).

41.2 The impact of production volume


For a modern production process (transistor size below 65 nm), it costs about
£500,000 to produce the set of mask required to fabricate a design. The fabrication of
each individual chip then costs a modest amount. As an example, let’s assume a figure
of £25 and assume that chip is small enough to have a 100% manufacturing yield.
Let’s look at the amount that we would have to charge for our chips simply in order to
cover our production costs:

Number of chips produced Cost price of a chip


1 £500,025
100 £5,025
10,000 £75
100,000 £30
1,000,000 £25.50

You can see from this table that it rarely makes sense to produce a special purpose
chip unless the intended production volume exceeds 100,000. This sort of production
volume is associated with markets such as PCs, mobile phones, WiFi markets, digital
TV and MP3 players. Programmable chips (e.g. microprocessors, PLDs and FPGAs)

15
can be produced in very large quantities, and then sold to a large number of different
customers, each of whom intends to programme their chips to perform a different
function. Designs that will be deployed in volumes of less than 100,000 will therefore
normally be implemented in a mass produced programmable chip.

41.3 Handling the cost-yield trade-off


Advanced processes are expensive which means that we need most of the chips
produced to be saleable. One approach to achieving this is binning. We use a single
production line manufacturing chips containing several replicated units and a high
target frequency. We then have a differentiated product line at different price points,
which correspond to different numbers of units working and different operating
frequencies. If all the units on a chip work, we categorise that chip as being a member
of a premium offering at the premium price. If some of the units don’t work on the
chip, or the operating frequency is lower than target, then we will still sell it, but it
will be categorised as lower spec product and it will be sold at a lower price.

Another approach is to use overprovisioning. We put more units than required on a


chip, in the expectation that not all will work. The non-working devices can be
switched out after test; as the number of units that work meets the spec guaranteed to
the customer, then the chip is OK. This allows the use of cheaper manufacturing
processes which lead to larger die sizes.

You should now know...


 Why integration and miniaturization of systems tends to increase their
performance
 That a manufacturing facilities’ cost depends on the transistor size, and extreme
miniaturization of transistors is very expensive
 Why special purpose chips will normally only be economical to produce if a large
number of units are required

16
42 Computer Operation
In this lecture we look at the basic ideas of computer operation and introduce the
ideas of the memory map and the fetch-execute cycle.

42.1 The stored program concept


The key to the extreme flexibility of the computer is that all of the possible
instructions that it can perform can be expressed as simple binary numbers. Similarly,
all of the data that it can manipulate can also be expressed in simple binary numbers.
There is no fundamental difference between instructions and data items. They are held
in the same storage structure. Indeed, what one program treats as a data item can be
regarded as an instruction by another program.

These items, either instructions or data, expressed as binary numbers, are held in a
memory.

R/W
Item 0
Enable
Item 1
Item 2
Data
Item 3
Item 4
Item 5
Item 6
Address

We can access any location within the memory by supplying an address1. Say we
wanted to access item 5; we would supply the address 101 to the address input. If this
is a read operation, then the contents of item 5 would be sent to the data lines. If it is a
write operation, then the contents of the data lines would be stored in location 5, over-
writing its previous contents. The memory knows whether it should perform a read or
a write from the value of the R / W input. If this has a value of 1, a read is performed;
if it 0, then a write is performed. R / W is a control line; it tells the memory what to
do. Most memories also have a second control line, Enable. If Enable=1, the memory
behaves as normal. If Enable=0, then the memory does nothing, and does not respond
to any new addresses. This is useful because sometimes activity on the address and
data busses will be destined not for the memory, but for some other device within the
computer.

Typically, the memory will contain many programs and many pieces of data.

1
This type of memory is called RAM (random access memory). Random access means that we can
access any location in memory with equal speed and ease, simply by providing a different address. This
is in contrast to sequential memory (e.g. a video tape) where an item at the beginning of the tape can be
accessed quickly, but an item at the end of the tape can only be accessed after we have fast forwarded
through the entire tape, which takes a long time.

17
Address 000000
R/W
Accounting program
Address 057120 Enable
Word processor program
Address 091600 Data
Game program
Address 103112
Payroll data
Address 120562
Address
Book text

By simply supplying an address, and then telling the processor to start executing from
that address, our computer can become a word processor, a games machine, or an
accountancy machine.

In order to be able to step through the memory locations that correspond to the
program, the processor needs to keep track of how far it has got. This is done through
a register called the program counter. This contains the address of the instruction to
be executed. After the completion of an instruction, the program counter is
automatically incremented to contain the address of the next instruction.

42.2 Memory mapped devices


RAM is used to store the instructions and data for the programs that are currently in
execution. However, programs also need to communicate with input/output devices
(e.g. graphics, network) and also mass storage for files (hard disk, solid state drive).
This is normally achieved by taking certain ranges of addresses and allocating them to
be used by these devices as buffers for data transfer. So in order to send data to, for
example, graphical output the processor would write the data into a particular set of
memory addresses. The graphics system would then recognise anything destined for
those addresses as data that it should process. The addresses borrowed for use by the
memory mapped devices are thus unavailable for as address space for the RAM.

42.3 Logical and Physical addresses


The addresses that can sent to the RAM and to memory mapped devices are real
voltage signals that appear on real busses; these are called physical addresses. It is
usual for software programs to be forbidden from directly using physical addresses.
Instead programs work on a notional set of addresses that are called logical addresses.
The memory management hardware and the operating system manage a translation
process between the logical addresses (used by software) and the physical addresses
(used by hardware). This separation of logical and physical addresses has many
advantages. These include allowing the operating system to enforce security by
preventing badly behaved programs from writing into memory areas that belong to
other programs. Also, it allows programs to use a larger address space than there is
available physical memory.

18
42.4 What’s special about computers?
The fact that the hardware isn’t specialised to any particular function, but can in
principle be used to accomplish any task, may sound trivial but it is revolutionary in
its impact. The computer is in several important respects completely unlike any other
engineering creation.

Firstly, computer manufacturers don’t have to produce one product for the games
market, another for the accountancy market, another for the secretarial market, and so
on. Instead they can mass produce one standard product that is suitable for every
market. The huge production runs associated with manufacture of computer chips are
the reason why it is tolerable for companies such as Intel to lay out a billion dollars on
a manufacturing plant. Without the economies of scale that result from having
hardware that is suitable for every market, the advances in speed and capability of
computer hardware would have ground to a halt due to the enormous set-up costs of
an integrated circuit manufacturing plant.

Secondly, much of the value on a computer system is in the form of intellectual


property: software and data. The production of software requires no manufacturing
equipment, no inventory of parts, and little or no unskilled labour. This is in sharp
contrast to manufacturing industries, and means that it can be easy for new start up
companies to enter the market place, and production costs can be kept very low. The
problem with intellectual property is that it is very easy to copy illicitly.

42.5 Computer organisation


Most general purpose computers have the conceptual organisation shown below.

Processor Storage and Input and


( CPU ) memory output devices

Address bus

Data bus

Control bus

This type of computer where the processor is separate from the memory, but the
memory is unified and holds both instructions and data without distinction, is called a
von Neumann machine. The terms “Stored Program Computer” and “von Neumann
machine” are used almost interchangeably. The name comes from John von
Neumann, the Hungarian mathematician, who in 1945 became the first to publish
theoretical analyses of the capabilities of this type of machine. (Though critics point
out that the hard work of inventing such a machine had already been done by other
people a couple of years earlier.)

19
42.6 The instruction cycle
During the running of a program, the sequence of events is something like this1:
 Firstly, the processor must get an instruction. To do this, the processor takes the
address stored in the program counter and places it on the address bus. It also
activates the appropriate lines on the control bus to activate the memory, and to
enable a read operation.
 The memory responds by placing the contents of the memory location
corresponding to the address onto the data bus. This is the instruction.
 The processor inspects (“decodes”) the instruction to find out what operands it
needs, and where they can be found.
 In order to get the operand, the processor places an address on the address bus; it
also activates the appropriate lines on the control bus to activate the memory, and
to enable a read operation.
 The memory responds by placing the contents of the memory location
corresponding to the address onto the data bus.
 The above two steps may be repeated if more operands are needed.
 The processor acts upon the instruction, producing a result
 The processor writes this result back into memory by placing the appropriate
address on the address bus; it also activates the appropriate lines on the control
bus to activate the memory, and to enable a write operation.
 This instruction is now finished with, so the program counter is incremented in
order to point to the next instruction.

This procedure is followed repeatedly until an instruction is encountered that marks


the end of the program.

42.7 The sequence of operations


The sequence of operations carried out can be summarised as follows:
1. Instruction fetch
2. Instruction decode (i.e. figuring out what the instruction means, and what
resources it requires) and operand fetch
3. Execution (of the instruction)
4. Write back (of completed results).

42.8 Timing of operations


For any synchronous digital system, actions are synchronised to a clock signal. The
frequency of the clock is a good measure of the speed of the system.

The processor normally runs at a very high clock speed; a clock frequency of 3 GHz
would be normal for a reasonable microprocessor nowadays. Inside the processor
there is one (or more) arithmetic logic unit(s) that run at the full processor clock
speed. There is also a very small amount of memory, called register, which can be
accessed at the full processor speed. This is used to keep local copies within the
processor of data that is held within the main memory.

The busses of the computer run at a much lower clock speed (typically a few hundred
MHz). The sequence of operations whereby the processor drives an address onto the

1
It varies slightly, depending on the type of computer, and the type of instruction being executed.

20
address lines, then acts on the data lines is called a bus cycle. A bus cycle will
typically take several cycles of the bus clock.

So in a modern computer system, if the data needed by the processor is held in


register, the operand fetch stage can be computed extremely fast. If, on the other
hand, the operands are not stored within the registers in the processor, then an external
bus cycle must be run to access the main memory, and this entails a large delay.

You should now know...


 The characteristics of the Von Neumann architecture
 How the fetch execute cycle works
 How a computer memory amp is organised

21
43 Computer Memory
One of the key issues in the design of computer systems is the way that data is
communicated between memory and processor. In this lecture we will look in detail at
how memories are constructed, how the processor communicates with memory, and
the impact that this has on computer design.

43.1 Memory organisation


In this section, we will look very briefly at the basic ideas of memory with a simple
8x8bit example. The address is decoded by a device that is built as follows:

w0

w1

w2

w3

w4

w5

w6

w7

a2 a1 a0

This decodes a 3-bit address, but the generalisation to larger addresses is


straightforward. For any possible combinations of the address inputs ( a2 … a0), one
of the word lines (w7…w0) will go high, and all others will go low.
These word lines are then fed across to a bank of memory cells, like this:

Enable

Address
decoder

a2 a1 a0 d7 d6 d5 d4 d3 d2 d1 d0

The simplest type of memory (static RAM) uses a latch in each memory cell. So each
word of memory is a bank of eight 8-bit latches. The word lines act as the enable
signal to the latches. Whichever latch is enabled will connect to the data lines d7…d0.
(This memory has used an 8-bit word, which is an unusual size. However, the
generalisation to other word length is obvious.)

22
43.1 Types of Memory
Each intersection of the word line and the data line contains a small circuit. Different
types of circuit give rise to different types of memory. The key characteristics that
differentiate different types of memory are:

Voltatile/non-volatile: Non-volatile memory retains its memorised data when the


power is switched off. Examples include ROM (read-only memory) and flash
memory. Volatile memory loses its data at power-off. RAM is volatile. The initial
boot process of a PC (or smartphone, SmartTV, games console, etc.) has to be
handled by software stored in a ROM (because that software needs to retained when
the device is switched off). This initial boot process will load the operating system
from hard disk or flash memory into a RAM and then the RAM will take over the
running of the operating system and application programs.

Speed/Density: We would like memory to be as fast as possible. Also, we would like


to squeeze as many bits as possible onto a single low-cost integrated circuit. However,
speed and density are inversely related, so we have to choose to optimise for one
criterion or the other.

Static/dynamic: These are two different types of RAM. Static RAM (SRAM) stores
each data bit in a small latch. It retains it data value for as long as the power is
applied. Static memory tends to be very fast, but low density (because it takes 6
transistors to build the latch to store one bit). Dynamic RAM (DRAM) stores a data
bit as a charge on the parasitic capacitance of a transistor. It requires only 1 transistor
to store 1 bit, so we can fit a very large number of bits onto a single RAM chip.
However, it is very slow (since the capacitance is not supplying any active drive to the
signal input/output). The name “dynamic” refers to the fact that the charges stored on
the capacitances leak away (over a timescale of about 1 ms) thus losing their stored
data. In order to avoid this loss of data, DRAM chips contain internal circuitry that
automatically reads and restores (“refreshes”) the values of the memory bits at a high
enough rate to complete the refresh process before the charge has leaked away.

43.2 Cache memory


Static RAM is much faster than dynamic RAM, but also much more expensive. The
speed of a RAM is also influenced by its size: the larger its capacity, the slower it is
likely to be. It is important that the main memory be large, so as store large datasets
and programs. It would be too expensive to build the main memory from static RAM,
so dynamic RAM is normally used for main memory. However, a large dynamic
RAM (~1-4 GByte) is very slow compared to the processor speed. This imbalance of
speed creates a potential problem. The processor may spend almost all of its time
simply waiting for the main memory to respond to its requests, and will spend very
little of its time doing useful work.

This problem is solved by using a small amount of very high speed static RAM to
hold operands and instructions that the processor is likely to need in the near future.
This is called the cache memory. The fact that it is small means that it is economic to
use the fastest type of RAM, i.e. static RAM. The system now has the following
appearance.

23
Processor Cache Memory Main Memory
( CPU ) memory Controller memory Buffer

Data bus

Address bus

Control bus

When an address (and the corresponding read/write control signal) is sent by the
processor, this is not sent directly to the memory elements. Instead it goes to a
memory controller, which checks whether the required item is held in cache. If it is,
then the cache is instructed by the controller to perform the required read or write
signal directly onto the data bus. If the item is not in cache, then the controller
instructs the slower main memory to carry out the required read or write.

As long as all the data that the program needs is stored in the cache, then execution
can proceed quickly. However, the computer system can’t know for sure at the outset
of execution what data will be needed as the program progresses. So occasionally the
program will reach a point where the required data has not been loaded into the cache.
This is called a cache miss, and the processor will have to stop and wait whilst the
required data items are loaded from the slow main memory into the high speed cache.

43.3 Multilevel caches


If the cache could be put onto the same chip as the processor, then it could be
accessed at the full processor speed. This is what is normally done on modern high-
performance microprocessor. There is an extremely high speed cache integrated onto
the processor die, which runs at (or close to) the full processor speed, called the level
1 cache. This is driven by an internal cache bus that operates at much higher speed
than the external bus. On a PC, the external processor-memory bus is normally called
the front side bus (FSB), and the internal cache bus is the back side bus (BSB).

Sometimes it is too expensive to put a large cache onto the processor die, so there is
another cache (bigger but slower) on the external memory bus, which is called the
level 2 cache.

Main Processor chip


memory Cache bus

Memory Level 2 Level 1


( CPU ) cache
controller cache

Processor-memory bus

24
On modern systems, a level 2 cache (and perhaps a level 3 cache) is also integrated
into the processor chip.

43.4 Locality of reference


We have seen that a cache system guesses what data is likely to be used next and
ensures that a copy of this data is kept in the high speed cache. If its guesses are good,
then we get a good speed. If the guesses are poor then we have many cache misses
and the data has to be fetched from main memory and we lose speed. How does a
computer guess what data is likely to be needed next? It uses a principle called
locality of reference. This means that if a programme accesses one data item from
memory, then often the next few items that are needed are laid out in the following
memory locations. So when a data item is fetched from memory, the controller will
not only return this item, but also transfer the following block of memory (typically
64 bytes) into the cache.

Computer programmes that have long run time tend to spend most of their time in
loops operating on arrays or similar data structures. Here is a simple example:
/* Compute total cost of 100 32-bit items */
total=0;
for (i=0; i<100; i++) {
total = total + item[i]; }

When the programme is compiled the array will be laid out in memory as a sequence
of bytes. Each integer in the array is 32 bits in size, and thus requires 4 bytes to store.
Suppose the base address of the array is 1000. Then the elements of the array are
stored in memory like this:

Array item: Memory address where it is stored:


Item[0] bytes 1000-1003
Item[1] bytes 1004-1007
Item[2] bytes 1008-1011
Item[3] bytes 1012-1015
And so on

When the program runs it will step through an array fetching one item at a time.
When we process item 0, it is not in cache and has to be fetched (slowly) from main
memory. The main memory system will transfer not only the data item that was
requested, but will also shift the surrounding 64 bytes into cache at the same time.
When we come to access items 1, 2 through to 15 in the following iterations of the
loop, they are all in cache by the time we need them. They are therefore accessed
quickly and we see a substantial gain of speed.

43.5 Modern RAM architecture: Synchronous dynamic RAM (SDRAM)


Modern dynamic RAM memories contain a very large number of cells, each storing a
single bit. However, each bit cell is extremely slow to read or write when compared to
the speed of the rest of the system (processor, front side bus, etc.). Such a large
number of cells requires a wide address, but too many address pins would increase the
complexity and expense of the memory chips. As a result, the address is multiplexed;
a typical example arrangement for a 64 Mbit array is shown below. On the first cycle,
16 bits of the address are applied and are used to select one row of 1024 cells. All

25
1024 cells are read into the output buffer. On the next cycle, 10 bits of the address are
applied to select one column’s data bit and this bit is then written or read. In this way,
only 16 address pins are needed on the chip. The chip will contain another 7 similar
arrays, each using the same addresses to produce bits d7 to d0 of the data byte to give a
64 Mbyte memory chip.

0
1
16 2
Row Array of
address RAM bit cells

Row select 65535

0 1 2 … 1023
Output buffer
Column
address Column select
10
data bit
A very important feature of this arrangement is that although the memory array
responds very slowly, the output buffer is constructed from standard electronics and is
very fast. When we read one bit from the array (rather slowly), its 1023 neighbours
are simultaneously made available in the very high speed output buffer. The output
buffer can then burst these neighbouring data items out across a high speed bus into
the cache. So although the latency of RAM may be poor, the throughput can be high.

You should now know...


 The distinction between DRAM and SRAM
 That SRAM is fast, but expensive and low capacity
 That DRAM is slow, but cheap and high capacity
 How caches are used to speed up memory accesses
 How locality of reference is used to speed up accesses using caches and SDRAM

26
44 Managing Data Movement in a Computer
In this lecture we will look at how the flow of data around a PC is organised. This will
lead us to the idea of the PC chipset, a collection of bridges and associated devices,
that manage the routing and flow control of data.

44.1 Evolution of PC architecture

44.1.1 Shared bus architectures


So far, in all our diagrams we have used a single shared bus connecting the CPU,
memory and IO together. In reality, modern computers are not normally built this
way. The reason for this is that a shared bus can be used by only one device at a time.
Components of a computer vary drastically in speed. Processor and cache memories
can respond in nanosecond timescales. Hard disks and DVDs may take milliseconds
to respond. The keyboard and mouse operate on timescales of hundreds of
milliseconds. In general, it is a bad idea to mix devices of greatly differing speed on
the same bus. The slow devices tend to hold up the fast ones. Also, busses capable of
the highest clock speeds are very expensive to construct and limited in the number of
devices they can connect. So in a real computer it is normal to split the busses into
separate unshared busses, and group the busses for fast devices around one hub and
the slower devices’ busses around a different hub.

44.1.2 Hub-based architectures


This bottleneck associated with a single shared bus approach can be solved by star-
connecting all of the devices to a hub. Each device has an unshared link to the hub,
thus simplifying and speeding up signalling on the link. The hub then has the rather
complicated job of switching the addresses and data items between its various input
and output ports to enable the devices to communicate as required.

CPU

Output Memory
devices

Hub

Input
Storage
devices

The hub also performs the functions of a bridge. A bridge is a device that connects
two busses together. The bridge “translates” from the requirements of one bus to the
requirements of another. So, for example, the bridge may provide buffering in order to
temporarily store data that builds up in the bridge due to mismatch between the speed
of the input and output bus. The bridge will probably also provide some form of flow
control, i.e. a way of telling a bus that is delivering too much data too quickly that it
must temporarily suspend the transfer until the receiving bus has managed to catch up.
Also, the different busses are likely to be using different methods (“protocols”) to

27
attract attention from other devices, and to set-up transfers. The bridge will provide
translation between the protocols of the different busses.

44.2 Parallel bus or serial bus?


Suppose we have 32-bit data to transfer, and use 32-bit addresses to specify which
data item should be transferred. The natural way to do this is bit-parallel: we have
busses that are wide enough to simultaneously accommodate the entire data word or
address word, i.e. 32 parallel wires make up each of the busses. An alternative is bit-
serial: we use a single wire and send the bits one after the other, one new bit on each
clock cycle. Thus it takes 32 clock cycles to transfer one 32-bit word. (We can also
compromise and use, for example, 4 bit-serial links operating in parallel. We send the
data 4 bits at a time, taking 8 clock cycles to transfer the entire 32-bit data item.)

Historically bit-parallel has been seen as the desirable way to operate as it takes just
one clock cycle to transfer an entire data word. However, the movement away from
shared bus architectures to hub-based architectures means that using a full 32 bits for
every link into and out of the hub gives a very high number of bus wires. This has
made bit-serial communication more desirable.

Another issue that has increased the use of serial busses is that parallel busses are very
difficult to operate reliably at speeds much above a few hundred MHz. This is
because all bits on a parallel bus must maintain exact synchronisation as they move
from one end of the bus to the other. If some bits arrive much earlier than others, we
may end up with the bits from one data word getting muddled with bits from the
following data word. The longer the wires are, the greater this problem becomes.
Furthermore, if a bus has to go around a corner then the outer wires will have further
to travel than the inner wires so they will arrive at the destination later. So very high
speed bit-parallel busses are nowadays only used for very short straight
communication pathways with very predictable electrical loads. This effectively limits
them to links between processor and main memory.

By contrast, serial busses can operate at extremely high clock speeds: 8-16 GHz is
quite normal nowadays. Most devices other than memory are connected using some
form of high speed bit-serial bus.

44.3 The classic north bridge/south bridge architecture


These ideas are illustrated in the diagram below which shows the classic PC
architecture from about 1996-2008.

28
CPU

Frontside bus

Graphics North Main


slot bridge memory
Graphics bus Memory bus

Internal bus
Network
connector
Peripheral South
slots bridge
Peripheral bus
Disk slots

USB
connectors Universal serial bus Legacy bus
Slow devices
& boot ROM

The high-speed bridges and memory controller have been integrated into a single chip
called the North Bridge (also known as the Memory Controller Hub). Another chip,
the “South Bridge” (also known as the IO Controller Hub) acts as the link between the
lower speed busses. The whole architecture revolves around these two bridges, which
control the movement of data around the computer. The caches are contained
completely inside the processor chip.

The motherboard or main board of a computer is a printed circuit board that carries
all the wires associated with the busses, and also the chips that implement the bridge
circuitry. The bridge chips are often referred to as the chipset of the motherboard.
Different manufacturers make slightly different chipsets, with slightly different
capabilities and performance. The motherboard contains many slots that can be used
to plug in devices (such as graphics cards, Wi-Fi cards, etc.). The motherboard also
has connectors that protrude through the computer case to provide sockets for USB or
network cables to be plugged in.

The overall speed of the computer is limited by the amount of different traffic
(memory, graphics, I/O) that must pass through the front side bus. The FSB therefore
becomes a bottleneck that limits overall system speed.

44.4 Current PC architecture


In recent years, the number of transistors that can be integrated onto a single chip has
risen high enough that the Memory Controller Hub (North Bridge) can be integrated
inside the processor chip. This gives architecture shown below, which is characteristic
of PCs based on the modern Intel and AMD processors.

29
Index
38 Technologies for Logic Implementation 1
38.1 Switch logic 1
38.2 The transistor as a switch 1
38.3 The CMOS inverter 2
38.4 The CMOS NAND and NOR gates 3
38.5 Complementarity 4

39 Integrated Circuit Manufacture 5


39.1 Integrated circuits 5
39.2 Integrated circuit fabrication 6
39.2.1 Crystal growth 6
39.2.2 Doping 6
39.2.3 Photolithography 6
39.3 The need for clean rooms 8
39.4 The size-yield trade-off 8

40 Trends in IC Power Dissipation 10


40.1 MOSFET construction 10
40.2 MOSFET operation 11
40.3 MOSFET energy storage 11
40.4 Power dissipation of an inverter 12
40.5 Power dissipation of an integrated circuit 13

41 Trends and Trade-Offs in IC Manufacture 14


41.1 The benefits of miniaturisation 14
41.2 The impact of production volume 15
41.3 Handling the cost-yield trade-off 16

42 Computer Operation 17
42.1 The stored program concept 18
42.2 Memory mapped devices 18
42.3 Logical and Physical addresses 18
42.4 What’s special about computers? 19
42.5 Computer organisation 19
42.6 The instruction cycle 20
42.7 The sequence of operations 20
42.8 Timing of operations 20

43 Memory Hierarchies 22
43.1 Memory organisation 22
43.2 Types of Memory 23
43.3 Cache memory 23
43.4 Multilevel caches 24
43.5 Locality of reference 25
43.6 Modern RAM architecture: SDRAM 26

30
44 Managing Data Movement in a Computer 27
44.1 Evolution of PC architecture 27
44.2 Shared bus architectures 27
44.3 Hub-based architectures 27
44.4 Parallel bus or serial bus? 29
44.5 The classic north bridge/south bridge architecture 29
44.6 Current PC architecture 29

31

You might also like