COMPUTER
ORGANIZATION
AND
ARCHITECTURE
BASIC STRUCTURE OF COMPUTERS
Functional Units
Arithmetic
and logic
Input
Memory
Output
I/O
Control
Processor
3
Input Unit
Input Unit reads the data
The most common Input devices are Keyboard,
joystick, trackballs, microphone and mouse
Output Unit
Counterpart of I/P unit
Its function is to send processed results to
outside world
The familiar example of output device is
printer (various types)
Memory Unit
The function of memory unit is to store programs and
data
There are 2 classes of storage:
Primary Storage:
Fast memory that operates at electronic speeds
The memory contains a large number of semiconductor
storage cells, each capable of storing 1 bit of information
These cells are processed in groups of fixed size called word
The number of bits in each word is known as word length
Range from 16 to 64 bits
To provide easy access to any word in the memory, a
distinct address is associated with each word
location
Addresses are numbers that identify successive locations
6
Memory in which any location can be reached in a
short and fixed amount of time after specifying its
address is RAM
Time required to access one word is called the
memory access time
Memory of a computer is normally implemented as
a memory hierarchy of 3 or 4 levels of
semiconductor RAM units with different speeds &
sizes
The small, fast, units are called caches
The largest & slowest unit is referred to as the
main memory
Primary storage is expensive
7
Secondary Storage:
Is used when large amount of data and
many programs have to be stored
It contains infrequently accessed
information
Additional & cheaper memory
Ex: Magnetic disks and tapes & optical
disks (CD-ROMs)
Arithmetic And Logic Unit
ALU performs all the arithmetic and logic
operations
For ex: addition, multiplication, division,
comparison etc
Any operation is initiated by bringing the
required operands into the processor, where
the operation is performed by the ALU
When operands are brought into the
processor, they are stored in high- speed
storage elements called registers
Each register can store one word of data
Access time to registers are faster than cache
unit
CU & ALU are many times faster than other
devices connected to a computer system
This enables a single processor to control a
number of external devices such as keyboards,
displays, magnetic & optical disks, sensors &
mechanical controllers
10
Control Unit
It controls the entire operations of the computer
The control unit is the nerve centre that sends control
signals to other units and senses their states
The timing signals that govern the I/O transfers are
generated by control circuits
Timing signals are signals that determine when a
given action is to take place
Data transfers b/w processor & memory are also
controlled by CU through timing signals
A large set of control lines (wires) carries the signals
used for timing & synchronization of events in all
units
11
Operation of a Computer - Summarized
The computer accepts information in the form of
pgms & data through an I/P unit & stores it in the
memory
Information stored in the memory is fetched, under
pgm control, into an ALU, where it is processed
Processed information leaves the computer through
an O/P unit
All activities inside the machine are directed by the
CU
12
Information Vs Instructions
Instructions/ Machine instructions :
Are explicit commands that
Governs transfer of information within the computer as
well as between computer and its I/O devices
Specify arithmetic and logic operations to be performed
Data:
Numbers and encoded characters that are used as operands
by the instructions
Any digital information
Programs can also be considered as data if it is to be
processed by another pgm
Ex: Compiling a HLL source pgm into machine language pgm,
called the object pgm
(Source pgm is I/P data to compiler & object pgm is O/P data)
The processed data is called information
13
Information must be encoded in a suitable format
Each number, char/ instruction encoded as a string of
binary digits called bits, each having either 0 or 1
Ex:
BCD (Binary - Coded Decimal)
Each decimal digit is encoded by 4 bits
ASCII (American standard Code for Information
Interchange)
Each char is represented as a 7-bit code
EBCDIC ( Extended Binary- Coded Decimal
Interchange Code)
8 bits are used to denote a char
14
Basic Operational Concepts
For processing, individual instructions are brought from memory into
the processor, which executes the specified operations
Data to be used as operands are also stored in memory
A typical instruction may be:
Add A,R0
Adds the operand at memory location A to the operand in register R0
and store the result in R0
The original content of A is preserved, whereas R0 are overwritten
Instruction requires several steps
Load A, R1
Add R1,R0
The first instruction transfers the contents of A into the processor
register R1
The second instruction adds the contents of R1 and R0 and places the content
in R0
15
Connection between the processor and memory
Control
Unit
Arithmetic
Logic
Unit
MAR - Memory Address Register
PC - Program Counter
MDR - Memory Data Register
IR - Instruction Register
16
CPU = ALU + CU + registers
Diff: types of registers are:
IR (instruction register):
Holds the instruction that is currently being executed
Its o/p is available to control circuits, which generate the timing
signals that control the various processing elements involved in
executing the instruction
PC (program counter):
Keeps track of the execution of a pgm
Contains the memory address of next instruction to be fetched &
executed
During the execution of an instruction, the contents of PC are
updated to correspond to the address of next instruction that is to
be fetched from the memory
PC points to the next instruction that is to be fetched from memory
17
n general- purpose registers (R0 thru Rn-1):
are used for holding data, intermediate results of operations.
They are also known as scratch-pad registers.
MAR (memory address register):
Facilitates communication with memory
Holds the address of the location to be accessed
MDR (memory data register):
Facilitates communication with memory
Contains data to be written into or read out of the addressed
location
18
Execution of an instruction
Execution of an instruction by CPU during pgm
execution involves the following steps:
The CU takes the address of the next instruction to be
executed from the PC register & reads instruction from
corresponding memory address into the instruction register
of CU
The CU sends the operation part & address part of
instruction to the decoder & MAR respectively
The decoder interprets the instruction and accordingly the
CU sends signals to the appropriate unit which needs to be
involved in carrying out the task specified in the instruction
Ex: if it is arithmetic/logical operation, the signal is send to ALU
As each instruction is executed, the address of the next
instruction to be executed will be automatically loaded into
the PC register & steps 1 to 4 are repeated
19
Normal execution of pgms may be preempted
if some device requires urgent servicing
To deal with the situation immediately, normal
execution of current pgm must be interrupted
To do this, the device raises an interrupt
signal
Interrupt
Is a request from an I/O device for service by the
processor
Processor provides the requested service by
executing an appropriate interrupt-service
routine
20
Computer Instructions
Assembly Language
MOVE NUM1,R1
MOVE #1,R2
ADD #1,R1
ADD R1,R2
Register Transfer
Notation
R1 [NUM1]
R2 1
R1 1 + [R1]
R2 [R1] + [R2]
21
The fetch-execute cycle
Fetch the instruction whose address is
in the program counter
Increment the PC so it holds the
address of the next instruction
Execute the instruction just fetched
Fetch the next instruction
Etc.
22
Example Instruction
MOVE NUM1,R1
Fetch
MAR [PC]
PC [PC] + 1
MDR [MEM([MAR])]
IR [MDR]
Execute
MAR NUM1
MDR [MEM([MAR])]
R1 [MDR]
23
Another Example
ADD #1,R1
Fetch
MAR [PC]
PC [PC] + 1
MDR [MEM([MAR])]
IR [MDR]
Execute
R1 1 + [R1]
24
Bus Structures
A group of lines that serves as a
connecting path for several devices is
called a bus
Bus must have lines for
Data
Address
Control
25
Single-bus Structure
Input
Output
Memory
Processor
The simplest way to interconnect functional units to use
a single bus
Since the bus can be used for only one transfer at a
time, only two units can actively use the bus at any
given time
26
Its basic feature is its low cost and flexibility for
attaching peripheral devices
Systems containing multiple buses increase its
performance capability (by concurrency in
operations) but at an increased cost
Buffer registers
to hold the information during transfers with the
devices
Allows processor to switch rapidly from one device
to another
Ex: use of printer buffer during printing
27
Transfer of a character from a processor to a
character printer
Processor sends the character to the
printer buffer
Once buffer is loaded, the printer can start
printing without intervention by the
processor
The system bus is also called the front
side bus, memory bus, local bus, or
host bus.
28
29
Simplified Illustration of a Bus
30
Two-Bus Structure
I/O bus
Input
Memory
Processor
Output
31
The bus is said to perform two distinct functions
by connecting the I/O units with memory and
processor unit with memory. The processor
interacts with the memory through a memory
bus and handles input/output functions over I/O
bus.
The main advantage of this structure is good
operating speed but on account of more cost.
32
Software
A set of instructions/ programs is known as Software
Software's are of 2 types:
System Software:
It is responsible for the coordination of all activities in a
computing system
It perform functions such as:
Receiving & interpreting user commands
Entering & editing application pgms & storing them
as files in secondary storage devices
Managing the storage & retrieval of files in
secondary storage devices
33
Running standard application pgms such as
word processors, spreadsheets or games
with data supplied by the user
Controlling I/O units to receive input
information & produce output results
Translating pgms from source form
prepared by the user into object form
consisting of machine instructions
Linking & running user-written application
pgms with existing standard library routines,
such as numerical computation packages
34
Ex: for System Software
Compiler
High-level Language Machine Language
Assembler
Assembly Language Machine Language
Operating System
Control Sharing & Interaction
Assign & Manage Resources
Memory
Disk Space
Handle I/O
35
Application Software:
Application software is used for
implementing a particular application
Ex: MSWord, Excel
36
Time-line diagram (an ex:)
Printer
Disk
OS
routines
Program
t0
t1
t2
t3
t4
Time
t5
1.4. User program and OS routine sharing of the processor.
UserFigure
program
and OS routine sharing of the processor
37
Multiprogramming/multitasking
OS manages the concurrent execution of several
application pgms to make best possible use of
computer resources
38
Performance
The most important measure of the
performance in a computer is how quickly it
can execute programs
The speed of execution depends on the
design of its hardware and its machine
language instructions and also the compiler
For better performance all components
should be designed in a coordinated way
39
Elapsed time
Total time required to execute the pgm
Measure of computers performance
Depends on speed of processor, disk &
printer
Processor time
Period during which the processor is
active to execute the pgm
40
The Processor Cache
Main
memory
Cache
memory Processor
Bus
41
At the start of execution, all pgm instructions
& the required data are stored in main
memory
As execution proceeds, instructions are
fetched one by one over the bus into the
processor, and a copy is placed in the cache
When the execution of an instruction calls for
data located in the main memory, the data
are fetched & a copy is placed in the cache
Later, if the same instruction/ data item are
needed a second time, it is read directly from
the cache
42
A program will be executed faster if the
movement of instructions and data between
the processor and main memory is
minimized, which is achieved by using
cache
43
Processor Clock
Processor circuits are controlled by a timing signal
called a clock
The regular time intervals are known as clock cycles
To execute machine instructions the processor divides
the action into a sequence of steps such that each step
can be completed in one clock cycle
The length P of one clock cycle is an important
performance parameter of processor
Clock rate, R=1/P, measured in cycles/sec
(Hertz/Hz)
44
Million - Mega (M)
Billion - Giga (G)
500 million cycles/sec 500 MHz
Clock period is 2 ns
1250 million cycles/sec 1.25 GHz
Clock period is 0.8 ns
45
Basic Performance Equation
T=(N x S) / R
T is the processor time required to execute a program
N is the actual number of instruction executions
S is the average number of basic steps needed to execute
one machine instruction
R is the clock rate
To achieve high performance, reduce the value of T,
which means reducing N & S and increasing R
46
Performance Equation
MIPS: Millions of instructions per
second
Megaflops: Millions of floating point
operations per second
Megahertz: Millions of clock cycles per
second
47
Pipelining and Superscalar Operation
Assumed that instructions are executed one after another
Improvement in performance can be achieved by
overlapping the successive instructions using a
technique called pipelining
Consider the instruction Add R1,R2,R3
Which adds the contents of registers R1 & R2 and places the sum into
R3
Contents of R1 & R2 are first transferred to the inputs of ALU
After add operation is performed, sum is transferred to R3
The processor can read the next instruction while the addition operation
is being performed
Then, if that instruction also uses ALU, its operands can be transferred
to the ALU inputs at the same time that the result of Add instruction is
being transferred to R3
48
Pipelining
I1
F1
I2
E1
F2
I3
E2
F3
E3
Sequential Execution
I1
I2
I3
F1
E1
F2
E2
F3
E3
Pipelined Execution
49
For purpose of computing, effective value of S is 1
(cant attain in practice)
Pipelining increases rate of executing instructions
significantly & causes the value of S to approach 1
A higher degree of concurrency can be achieved if
multiple instruction pipelines are implemented in the
processor
Means that multiple functional units are used, creating
parallel paths through which different instructions can be
executed in parallel
With this, it becomes possible to start the execution of
several instructions in every clock cycle
This mode of operation is called Superscalar execution
50
Clock Rate
There are 2 possibilities for increasing the
clock rate, R
First, improving the Integrated Circuit (IC)
technology makes logic circuits faster, which
reduces the time needed to complete a basic step
This allows the clock period, P to be reduced and the
clock rate, R to be increased
Second, reducing the amount of processing done in
one basic step makes it possible to reduce clock
period, P
51
Instruction Set: CISC & RISC
Simple instructions require small number of basic
steps to execute (Reduced Instruction Set
Computers)
Complex instructions involve a larger number of
basic steps (Complex Instruction Set Computers)
For RISC a large number of instruction is needed,
lead to large value for N, and small value for S
For CISC individual instructions perform more
complex operations with fewer instructions,
leading to lower value of N and larger value of S
It is not obvious if one choice is better than the
other
52
CISC vs RISC
Complex Instruction Set Computers
(CISC)
Smaller N
Larger S
Reduced Instruction Set Computers
(RISC)
Larger N
Smaller S
Easier to Pipeline
53
Compiler
High-level Language Machine Language
To reduce N, suitable machine instruction set + compiler
that makes good use of it
An optimizing compiler must reduce the number of clock
cycles needed to execute a program
The number of clock cycles is dependent not only on the
choice of instructions but also on the order in which they
appear in the program
The compiler may rearrange the instructions to achieve
better performance
Such changes must not affect the result of the computation
Ultimate objective is to reduce the total no: of clock cycles
needed to perform a required pgmg task
54
Performance Measurement
The only parameter that describes the performance of
a computer is the execution time T
A nonprofit organization called System
Performance Evaluation Corporation (SPEC)
publishes representative application programs for
different application domains, together with test
results for many commercially available computers
55
SPEC rating = Running time on the
reference computer
Running time on the
computer under test
Spec rating of 50 means that computer under test
is 50 times as fast as UltraSPARC10 for this
particular benchmark
56
The overall SPEC rating for the computer is
n
SPEC rating = (SPEC i)1/n
i=1
Where n is the no: of pgms in the suite
Bcoz the actual execution time is measured,
the SPEC rating is a measure of the
combined effect of all factors affecting
performance, including the compiler, the OS,
the processor & the memory of the computer
system being tested
57
Multiprocessors
Larger computer systems may contain a number of
processor units, in which case they are called
Multiprocessor Systems
These systems either execute a number of different
application programs in parallel or they execute the
subtasks of a single large task in parallel
All processors have access to all of the memory in
such systems and the term shared memory
multiprocessor systems is often used
High performance of the system increased
complexity and cost
58
Multicomputers
It is possible to use an interconnected group of
complete computers to achieve high total
computational power
These computers have access to their own memory
units
When the tasks they are executing need to communicate
data, they do so by exchanging messages over a
communication network
This property leads to the name message-passing
multicomputers
59
Steps involving Instruction Fetch & Execution
Pgms reside in the memory and usually get
there through the i/p unit
INSTRUCTION FETCH
Execution of a program starts by setting the
PC to point to the first instruction of the
program
The contents of PC are transferred to the
MAR and a Read control signal is sent to the
memory
60
The addressed word (here it is the first
instruction of the program) is read out of
memory and loaded into the MDR
The contents of MDR are transferred to
the IR
Now the instruction is ready to be decoded &
executed
61
INSTRUCTION EXECUTION
The operation field of the instruction in IR
is examined to determine the type of
operation to be performed by the ALU
The specified operation is performed by
obtaining the operand(s) from the memory
locations or from GP registers in the
processor
62
Fetching the operands from the memory
requires sending the memory location address
to the MAR and initiating a Read cycle
The operand is read from the memory into the
MDR and then from MDR to the ALU
The ALU performs the desired operation on one
or more operands fetched in this manner and
sends the result either to memory location or to
a GP register
63
If the result of this operation is to be stored in the
memory, then the result is sent to MDR
The address of the location where the result is to
be stored is sent to MAR and a Write cycle is
initiated
Thus, the execute cycle ends for the current
instruction and the PC is incremented to point to
the next instruction for a new fetch cycle.
64