Computer Organization Architecturek
Computer Organization Architecturek
COMPUTER
ORGANIZATION AND
ARCHITECTURE
Page 1
Computer Organization and Architecture
Computer Architecture:
Computer Architecture deals with giving operational attributes of the computer or Processor
to be specific. It deals with details like physical memory, ISA (Instruction Set Architecture) of
the processor, the number of bits used to represent the data types, Input Output mechanism
and technique for addressing memories.
Computer Organization:
Computer Organization is realization of what is specified by the computer architecture .It
deals with how operational attributes are linked together to meet the requirements specified
by computer architecture. Some organizational attributes are hardware details, control
signals, peripherals.
EXAMPLE:
Say you are in a company that manufactures cars, design and all low-level details of the car
come under computer architecture (abstract, programmers view), while making it’s parts
piece by piece and connecting together the different components of that car by keeping the
basic design in mind comes under computer organization (physical and visible).
Page 2
Computer Organization and Architecture
Computer architecture (a
Often called microarchitecture (low level)
bit higher level)
Programmer view (i.e.
Transparent from programmer (ex. a programmer does
Programmer has to be
not worry much how addition is implemented in
aware of which instruction
hardware)
set used)
Logic (Instruction set,
Physical components (Circuit design, Adders, Signals,
Addressing modes, Data
Peripherals)
types, Cache optimization)
What to do ? (Instruction
How to do ? (implementation of the architecture)
set)
GENERATIONS OF A COMPUTER
Generation in computer terminology is a change in technology a computer is/was being
used. Initially, the generation term was used to distinguish between varying hardware
technologies. But nowadays, generation includes both hardware and software, which
together make up an entire computer system.
There are totally five computer generations known till date. Each generation has been
discussed in detail along with their time period and characteristics. Here approximate dates
against each generations have been mentioned which are normally accepted.
Following are the main five generations of computers
First Generation
1
The period of first generation: 1946-1959. Vacuum tube based.
Second Generation
2
The period of second generation: 1959-1965. Transistor based.
Third Generation
3
The period of third generation: 1965-1971. Integrated Circuit based.
Fourth Generation
4
The period of fourth generation: 1971-1980. VLSI microprocessor based.
Fifth Generation
5
The period of fifth generation: 1980-onwards. ULSI microprocessor based
First generation
Page 3
Computer Organization and Architecture
The period of first generation was 1946-1959. The computers of first generation used
vacuum tubes as the basic components for memory and circuitry for CPU (Central Processing
Unit). These tubes, like electric bulbs, produced a lot of heat and were prone to frequent
fusing of the installations, therefore, were very expensive and could be afforded only by very
large organizations. In this generation mainly batch processing operating system were used.
Punched cards, paper tape, and magnetic tape were used as input and output devices. The
computers in this generation used machine code as programming language.
Second generation
The period of second generation was 1959-1965. In this generation transistors were used
that were cheaper, consumed less power, more compact in size, more reliable and faster
than the first generation machines made of vacuum tubes. In this generation, magnetic cores
were used as primary memory and magnetic tape and magnetic disks as secondary storage
devices. In this generation assembly language and high-level programming languages like
FORTRAN, COBOL were used. The computers used batch processing and multiprogramming
operating system.
Page 5
Computer Organization and Architecture
Third generation
The period of third generation was 1965-1971. The computers of third generation used
integrated circuits (IC's) in place of transistors. A single IC has many transistors, resistors
and capacitors along with the associated circuitry. The IC was invented by Jack Kilby. This
development made computers smaller in size, reliable and efficient. In this generation
remote processing, time-sharing, multi-programming operating system were used. High-
level languages (FORTRAN-II TO IV, COBOL, PASCAL PL/1, BASIC, ALGOL-68 etc.) were used
during this generation.
Page 6
Computer Organization and Architecture
Fourth generation
The period of fourth generation was 1971-1980. The computers of fourth generation used
Very Large Scale Integrated (VLSI) circuits. VLSI circuits having about 5000 transistors and
other circuit elements and their associated circuits on a single chip made it possible to have
microcomputers of fourth generation. Fourth generation computers became more powerful,
compact, reliable, and affordable. As a result, it gave rise to personal computer (PC)
revolution. In this generation time sharing, real time, networks, distributed operating system
were used. All the high-level languages like C, C++, DBASE etc., were used in this generation.
Page 7
Computer Organization and Architecture
Fifth generation
The period of fifth generation is 1980-till date. In the fifth generation, the VLSI technology
became ULSI (Ultra Large Scale Integration) technology, resulting in the production of
microprocessor chips having ten million electronic components. This generation is based on
parallel processing hardware and AI (Artificial Intelligence) software. AI is an emerging
branch in computer science, which interprets means and method of making computers think
like human beings. All the high-level languages like C and C++, Java, .Net etc., are used in this
generation.
AI includes:
• Robotics
• Neural Networks
• Game Playing
• Development of expert systems to make decisions in real life situations.
• Natural language understanding and generation.
• ChromeBook
COMPUTER TYPES
Classification based on Operating Principles
Based on the operating principles, computers can be classified into one of the following types:
-
1) Digital Computers
2) Analog Computers
3) Hybrid Computers
Analog Computers:- An analog computer is a form of computer that uses the continuously
changeable aspects of physical phenomena such as electrical, mechanical, or
hydraulic quantities to model the problem being solved. In contrast, digital
computers represent varying quantities symbolically, as their numerical values change.
and provides logical operations, while the analog component normally serves as a solver
of differential equations.
Page 10
Computer Organization and Architecture
Notebook: - These computers are as powerful as desktop but size of these computers are
comparatively smaller than laptop and desktop. They weigh 2 to 3 kg. They are more costly
than laptop.
Palmtop (Hand held): - They are also called as personal Digital Assistant (PDA). These
computers are small in size. They can be held in hands. It is capable of doing word processing,
spreadsheets and hand writing recognition, game playing, faxing and paging. These
computers are not as powerful as desktop computers. Ex: - 3com palmV.
Wearable computer: - The size of this computer is very small so that it can be worn on the
body. It has smaller processing power. It is used in the field of medicine. For example pace
maker to correct the heart beats. Insulin meter to find the levels of insulin in the blood.
Page 12
Computer Organization and Architecture
Page 13
Computer Organization and Architecture
Page 14
Computer Organization and Architecture
COMPUTER TYPES
A computer can be defined as a fast electronic calculating machine that accepts the
(data) digitized input information process it as per the list of internally stored instructions
and produces the resulting information. List of instructions are called programs & internal
storage is called computer memory.
Page 15
Computer Organization and Architecture
BASIC TERMINOLOGY
•Input: Whatever is put into a computer system.
•Data: Refers to the symbols that represent facts, objects, or ideas.
•Information: The results of the computer storing data as bits and bytes; the words, umbers,
sounds, and graphics.
•Output: Consists of the processing results produced by a computer.
•Processing: Manipulation of the data in many ways.
•Memory: Area of the computer that temporarily holds data waiting to be processed, stored,
or output.
•Storage: Area of the computer that holds data on a permanent basis when it is not
immediately needed for processing.
•Assembly language program (ALP) –Programs are written using mnemonics
•Mnemonic –Instruction will be in the form of English like form
•Assembler –is a software which converts ALP to MLL (Machine Level Language)
•HLL (High Level Language) –Programs are written using English like statements
•Compiler -Convert HLL to MLL, does this job by reading source program at once
•Interpreter –Converts HLL to MLL, does this job statement by statement
•System software –Program routines which aid the user in the execution of programs eg:
Assemblers, Compilers
•Operating system –Collection of routines responsible for controlling and coordinating all
the activities in a computer system
FUNCTIONAL UNIT
Page 16
Computer Organization and Architecture
Input device accepts the coded information as source program i.e. high level
language. This is either stored in the memory or immediately used by the processor to
perform the desired operations. The program stored in the memory determines the
processing steps. Basically the computer converts one source program to an object program.
i.e. into machine language.
Finally the results are sent to the outside world through output device. All of these
actions are coordinated by the control unit.
Page 17
Computer Organization and Architecture
Input unit: -
The source program/high level language program/coded information/simply data is
fed to a computer through input devices keyboard is a most common type. Whenever a key is
pressed, one corresponding word or number is translated into its equivalent binary code
over a cable & fed either to memory or processor.
Memory unit: -
Its function into store programs and data. It is basically to two types
1. Primary memory
2. Secondary memory
Word:
In computer architecture, a word is a unit of data of a defined bit length that can be addressed
and moved between storage and the computer processor. Usually, the defined bit length of a
word is equivalent to the width of the computer's data bus so that a word can be moved in a
single operation from storage to a processor register. For any computer architecture with an
eight-bit byte, the word will be some multiple of eight bits. In IBM's evolutionary
System/360 architecture, a word is 32 bits, or four contiguous eight-bit bytes. In Intel's PC
processor architecture, a word is 16 bits, or two contiguous eight-bit bytes. A word can
contain a computer instruction, a storage address, or application data that is to be
manipulated (for example, added to the data in another word space).
The number of bits in each word is known as word length. Word length refers to the
number of bits processed by the CPU in one go. With modern general purpose computers,
word size can be 16 bits to 64 bits.
The time required to access one word is called the memory access time. The small, fast,
RAM units are called caches. They are tightly coupled with the processor and are often
contained on the same IC chip to achieve high performance.
Page 18
Computer Organization and Architecture
1. Primary memory: - Is the one exclusively associated with the processor and operates at
the electronics speeds programs must be stored in this memory while they are being
executed. The memory contains a large number of semiconductors storage cells. Each
capable of storing one bit of information. These are processed in a group of fixed site called
word.
To provide easy access to a word in memory, a distinct address is associated with
each word location. Addresses are numbers that identify memory location.
Number of bits in each word is called word length of the computer. Programs must
reside in the memory during execution. Instructions and data can be written into the
memory or read out under the control of processor. Memory in which any location can be
reached in a short and fixed amount of time after specifying its address is called random-
access memory (RAM).
The time required to access one word in called memory access time. Memory which is
only readable by the user and contents of which can’t be altered is called read only memory
(ROM) it contains operating system.
Page 19
Computer Organization and Architecture
Caches are the small fast RAM units, which are coupled with the processor and are
often contained on the same IC chip to achieve high performance. Although primary storage
is essential it tends to be expensive.
2 Secondary memory: - Is used where large amounts of data & programs have to be stored,
particularly information that is accessed infrequently.
Examples: - Magnetic disks & tapes, optical disks (ie CD-ROM’s), floppies etc.,
The control and the ALU are may times faster than other devices connected to a
computer system. This enables a single processor to control a number of external devices
such as key boards, displays, magnetic and optical disks, sensors and other mechanical
controllers.
Output unit:-
These actually are the counterparts of input unit. Its basic function is to send the
processed results to the outside world.
Control unit:-
It effectively is the nerve center that sends signals to other units and senses their
states. The actual timing signals that govern the transfer of data between input unit,
processor, memory and output unit are generated by the control unit.
Page 20
Computer Organization and Architecture
The preceding add instruction combines a memory access operation with an ALU
Operations. In some other type of computers, these two types of operations are performed by
separate instructions for performance reasons.
Load LOCA, R1
Add R1, R0
Transfers between the memory and the processor are started by sending the address
of the memory location to be accessed to the memory unit and issuing the appropriate control
signals. The data are then transferred to or from the memory.
Page 21
Computer Organization and Architecture
The fig shows how memory &the processor can be connected. In addition to the ALU & the
control circuitry, the processor contains a number of registers used for several different
purposes.
Register:
It is a special, high-speed storage area within the CPU. All data must be represented in
a register before it can be processed. For example, if two numbers are to be multiplied, both
numbers must be in registers, and the result is also placed in a register. (The register can
contain the address of a memory location where data is stored rather than the actual data
itself.)
The number of registers that a CPU has and the size of each (number of bits) help
determine the power and speed of a CPU. For example a 32-bit CPU is one in which each
register is 32 bits wide. Therefore, each CPU instruction can manipulate 32 bits of
data. In high-level languages, the compiler is responsible for translating high-level operations
into low-level operations that access registers.
Instruction Format:
Computer instructions are the basic components of a machine language program. They are
also known as macro operations, since each one is comprised of sequences of micro
operations.
Each instruction initiates a sequence of micro operations that fetch operands from registers
or memory, possibly perform arithmetic, logic, or shift operations, and store results in
registers or memory.
Page 22
Computer Organization and Architecture
Instructions are encoded as binary instruction codes. Each instruction code contains of
a operation code, or opcode, which designates the overall purpose of the instruction (e.g. add,
subtract, move, input, etc.). The number of bits allocated for the opcode determined how
many different instructions the architecture supports.
In addition to the opcode, many instructions also contain one or more operands, which
indicate where in registers or memory the data required for the operation is located. For
example, and add instruction requires two operands, and a not instruction requires one.
15 12 11 65 0
+ ---------------------------------- +
| Opcode | Operand | Operand |
+ ---------------------------------- +
The opcode and operands are most often encoded as unsigned binary numbers in order to
minimize the number of bits used to store them. For example, a 4-bit opcode encoded as a
binary number could represent up to 16 different operations.
The control unit is responsible for decoding the opcode and operand bits in the instruction
register, and then generating the control signals necessary to drive all other hardware in the
CPU to perform the sequence of micro operations that comprise the instruction.
INSTRUCTION CYCLE:
Page 23
Computer Organization and Architecture
The instruction register (IR):- Holds the instructions that are currently being executed. Its
output is available for the control circuits which generates the timing signals that control the
various processing elements in one execution of instruction.
Besides IR and PC, there are n-general purpose registers R0 through Rn-1.
The other two registers which facilitate communication with memory are: -
1. MAR – (Memory Address Register):- It holds the address of the location to be
accessed.
2. MDR – (Memory Data Register):- It contains the data to be written into or read out
of the address location.
Page 24
Computer Organization and Architecture
4. After the time required to access the memory elapses, the address word is read out of
the memory and loaded into the MDR.
5. Now contents of MDR are transferred to the IR & now the instruction is ready to be
decoded and executed.
6. If the instruction involves an operation by the ALU, it is necessary to obtain the
required operands.
7. An operand in the memory is fetched by sending its address to MAR & Initiating a
read cycle.
8. When the operand has been read from the memory to the MDR, it is transferred from
MDR to the ALU.
9. After one or two such repeated cycles, the ALU can perform the desired operation.
10. If the result of this operation is to be stored in the memory, the result is sent to MDR.
11. Address of location where the result is stored is sent to MAR & a write cycle is
initiated.
12. The contents of PC are incremented so that PC points to the next instruction that is to
be executed.
The Diversion may change the internal stage of the processor its state must be saved
in the memory location before interruption. When the interrupt-routine service is completed
the state of the processor is restored so that the interrupted program may continue
The task of entering and altering programs for the ENIAC was extremely tedious. The
programming process can be easy if the program could be represented in a form suitable for
storing in memory alongside the data. Then, a computer could get its instructions by reading
them from memory, and a program could be set or altered by setting the values of a portion of
memory. This idea is known a the stored-program concept. The first publication of the idea
Page 25
Computer Organization and Architecture
was in a 1945 proposal by von Neumann for a new computer, the EDVAC (Electronic Discrete
Variable Computer).
In 1946, von Neumann and his colleagues began the design of a new stored-program
computer, referred to as the IAS computer, at the Princeton Institute for Advanced Studies.
The IAS computer, although not completed until 1952, is the prototype of all subsequent
general-purpose computers.
It consists of
❖ A main memory, which stores both data and instruction
❖ An arithmetic and logic unit (ALU) capable of operating on binary data
❖ A control unit, which interprets the instructions in memory and causes them to be
executed
❖ Input and output (I/O) equipment operated by the control unit
BUS STRUCTURES:
Bus structure and multiple bus structures are types of bus or computing. A bus is basically a
subsystem which transfers data between the components of Computer components either
within a computer or between two computers. It connects peripheral devices at the same
time.
Page 26
Computer Organization and Architecture
- A multiple Bus Structure has multiple inter connected service integration buses and for each
bus the other buses are its foreign buses. A Single bus structure is very simple and consists of
a single server.
- A bus cannot span multiple cells. And each cell can have more than one buses. - Published
messages are printed on it. There is no messaging engine on Single bus structure
I) In single bus structure all units are connected in the same bus than connecting different
buses as multiple bus structure.
II) Multiple bus structure's performance is better than single bus structure. Iii)single bus
structure's cost is cheap than multiple bus structure.
Group of lines that serve as connecting path for several devices is called a bus (one bit per
line).
Individual parts must communicate over a communication line or path for exchanging
data, address and control information as shown in the diagram below. Printer example –
processor to printer. A common approach is to use the concept of buffer registers to hold the
content during the transfer.
Buffer registers hold the data during the data transfer temporarily. Ex: printing
Types of Buses:
1. Data Bus:
Data bus is the most common type of bus. It is used to transfer data between different
components of computer. The number of lines in data bus affects the speed of data transfer
between different components. The data bus consists of 8, 16, 32, or 64 lines. A 64-line data
bus can transfer 64 bits of data at one time.
Page 27
Computer Organization and Architecture
2. Address Bus:
Many components are connected to one another through buses. Each component is assigned a
unique ID. This ID is called the address of that component. It a component wants to
communicate with another component, it uses address bus to specify the address of that
component. The address bus is a unidirectional bus. It can carry information only in one
direction. It carries address of memory location from microprocessor to the main memory.
3. Control Bus:
Control bus is used to transmit different commands or control signals from one component to
another component. Suppose CPU wants to read data from main memory. It will use control is
also used to transmit control signals like ASKS (Acknowledgement signals). A control signal
contains the following:
1 Timing information: It specifies the time for which a device can use data and address bus.
2 Command Signal: It specifies the type of operation to be performed.
Suppose that CPU gives a command to the main memory to write data. The memory sends
acknowledgement signal to CPU after writing the data successfully. CPU receives the signal
and then moves to perform some other action.
SOFTWARE
If a user wants to enter and run an application program, he/she needs a System Software.
System Software is a collection of programs that are executed as needed to perform functions
such as:
Page 28
Computer Organization and Architecture
Types of software
A layer structure showing where Operating System is located on generally used software
systems on desktops
System software
System software helps run the computer hardware and computer system. It includes a
combination of the following:
• device drivers
• operating systems
• servers
• utilities
• windowing systems
• compilers
• debuggers
• interpreters
• linkers
The purpose of systems software is to unburden the applications programmer from the often
complex details of the particular computer being used, including such accessories as
communications devices, printers, device readers, displays and keyboards, and also to
partition the computer's resources such as memory and processor time in a safe and stable
manner. Examples are- Windows XP, Linux and Mac.
Application software
Application software allows end users to accomplish one or more specific (not directly
computer development related) tasks. Typical applications include:
onal software
Page 29
Computer Organization and Architecture
Application software exists for and has impacted a wide variety of topics.
PERFORMANCE
The most important measure of the performance of a computer is how quickly it can
execute programs. The speed with which a computer executes program is affected by the
design of its hardware. For best performance, it is necessary to design the compiles, the
machine instruction set, and the hardware in a coordinated way.
The total time required to execute the program is elapsed time is a measure of the
performance of the entire computer system. It is affected by the speed of the processor, the
disk and the printer. The time needed to execute a instruction is called the processor time.
Just as the elapsed time for the execution of a program depends on all units in a
computer system, the processor time depends on the hardware involved in the execution of
individual machine instructions. This hardware comprises the processor and the memory
which are usually connected by the bus as shown in the fig c.
The pertinent parts of the fig. c are repeated in fig. d which includes the cache
memory as part of the processor unit.
Page 30
Computer Organization and Architecture
Let us examine the flow of program instructions and data between the memory and
the processor. At the start of execution, all program instructions and the required data are
stored in the main memory. As the execution proceeds, instructions are fetched one by one
over the bus into the processor, and a copy is placed in the cache later if the same instruction
or data item is needed a second time, it is read directly from the cache.
The processor and relatively small cache memory can be fabricated on a single IC
chip. The internal speed of performing the basic steps of instruction processing on chip is
very high and is considerably faster than the speed at which the instruction and data can be
fetched from the main memory. A program will be executed faster if the movement of
instructions and data between the main memory and the processor is minimized, which is
achieved by using the cache.
For example:- Suppose a number of instructions are executed repeatedly over a short period
of time as happens in a program loop. If these instructions are available in the cache, they can
be fetched quickly during the period of repeated use. The same applies to the data that are
used repeatedly.
Processor clock: -
Processor circuits are controlled by a timing signal called clock. The clock designer
the regular time intervals called clock cycles. To execute a machine instruction the processor
divides the action to be performed into a sequence of basic steps that each step can be
completed in one clock cycle. The length P of one clock cycle is an important parameter that
affects the processor performance.
Processor used in today’s personal computer and work station have a clock rates that
range from a few hundred million to over a billion cycles per second.
Page 31
Computer Organization and Architecture
corresponds to the source program. Assume that complete execution of the program requires
the execution of N machine cycle language instructions. The number N is the actual number
of instruction execution and is not necessarily equal to the number of machine cycle
instructions in the object program. Some instruction may be executed more than once, which
in the case for instructions inside a program loop others may not be executed all, depending
on the input data used.
Suppose that the average number of basic steps needed to execute one machine cycle
instruction is S, where each basic step is completed in one clock cycle. If clock rate is ‘R’
cycles per second, the program execution time is given by
T=N*S/R
this is often referred to as the basic performance equation.
We must emphasize that N, S & R are not independent parameters changing one may
affect another. Introducing a new feature in the design of a processor will lead to improved
performance only if the overall result is to reduce the value of T.
Consider Add R1 R2 R3
This adds the contents of R1 & R2 and places the sum into R3.
The contents of R1 & R2 are first transferred to the inputs of ALU. After the addition
operation is performed, the sum is transferred to R3. The processor can read the next
instruction from the memory, while the addition operation is being performed. Then of that
instruction also uses, the ALU, its operand can be transferred to the ALU inputs at the same
time that the add instructions is being transferred to R3.
In the ideal case if all instructions are overlapped to the maximum degree possible
the execution proceeds at the rate of one instruction completed in each clock cycle.
Page 32
Computer Organization and Architecture
Individual instructions still require several clock cycles to complete. But for the purpose of
computing T, effective value of S is 1.
Clock rate
These are two possibilities for increasing the clock rate ‘R’.
1. Improving the IC technology makes logical circuit faster, which reduces the time of
execution of basic steps. This allows the clock period P, to be reduced and the clock
rate R to be increased.
2. Reducing the amount of processing done in one basic step also makes it possible to
reduce the clock period P. however if the actions that have to be performed by an
instructions remain the same, the number of basic steps needed may increase.
Increase in the value ‘R’ that are entirely caused by improvements in IC technology
affects all aspects of the processor’s operation equally with the exception of the time it takes
to access the main memory. In the presence of cache the percentage of accesses to the main
memory is small. Hence much of the performance gain excepted from the use of faster
technology can be realized.
to a lower value of N and a larger value of S. It is not obvious if one choice is better than the
other.
But complex instructions combined with pipelining (effective value of S ¿ 1) would
achieve one best performance. However, it is much easier to implement efficient pipelining in
processors with simple instruction sets.
RISC and CISC are computing systems developed for computers. Instruction set or
instruction set architecture is the structure of the computer that provides commands to the
computer to guide the computer for processing data manipulation. Instruction set consists of
instructions, addressing modes, native data types, registers, interrupt, exception handling and
memory architecture. Instruction set can be emulated in software by using an interpreter or
built into hardware of the processor. Instruction Set Architecture can be considered as a
boundary between the software and hardware. Classification of microcontrollers and
microprocessors can be done based on the RISC and CISC instruction set architecture.
Comparison between RISC and CISC:
RISC CISC
The calculations are faster and The calculations are slow and
Calculations
precise. precise.
Page 34
Computer Organization and Architecture
Page 35
Computer Organization and Architecture
The program selected range from game playing, compiler, and data base applications
to numerically intensive programs in astrophysics and quantum chemistry. In each case, the
program is compiled under test, and the running time on a real computer is measured. The
same program is also compiled and run on one computer selected as reference.
Page 36
Computer Organization and Architecture
multicomputer multiprocessors
1. A computer made up of several computers. 1. A computer that has more than one CPU on
2. Distributed computing deals with hardwareits motherboard.
and software systems containing more than2. Multiprocessing is the use of two or more
one processing element, multiple programs central processing units (CPUs) within a
3. It can run faster single computer system.
4. A multi-computer is multiple computers, 3. Speed depends on the all processors speed
each of which can have multiple processors. 4. Single Computer with multiple processors
5. Used for true parallel processing. 5. Used for true parallel processing.
6. Processor can not share the memory. 6. Processors can share the memory.
7. Called as message passing multi computers 7. Called as shared memory multi processors
8. Cost is more 8. Cost is low
Page 37
Computer Organization and Architecture
Instruction Codes
Computer instructions are the basic components of a machine language program. They
are also known as macro operations, since each one is comprised of sequences of
micro operations. Each instruction initiates a sequence of micro operations that fetch
operands from registers or memory, possibly perform arithmetic, logic, or shift
operations, and store results in registers or memory.
In addition to the opcode, many instructions also contain one or more operands,
which indicate where in registers or memory the data required for the operation is
located. For example, and add instruction requires two operands, and a not instruction
requires one.
Page 38
Computer Organization and Architecture
15 12 11 65 0
+ ---------------------------------- +
| Opcode | Operand | Operand |
+ ---------------------------------- +
The opcode and operands are most often encoded as unsigned binary numbers in
order to minimize the number of bits used to store them. For example, a 4-bit opcode
encoded as a binary number could represent up to 16 different operations.
The control unit is responsible for decoding the opcode and operand bits in the
instruction register, and then generating the control signals necessary to drive all
other hardware in the CPU to perform the sequence of microoperations that comprise
the instruction.
Control unit design and implementation can be done by two general methods:
• A hardwired control unit is designed from scratch using traditional digital logic
design techniques to produce a minimal, optimized circuit. In other words, the
control unit is like an ASIC (application-specific integrated circuit).
Page 40
Computer Organization and Architecture
• A micro-programmed control unit is built from some sort of ROM. The desired
control signals are simply stored in the ROM, and retrieved in sequence to drive
the micro operations needed by a particular instruction.
Hard-wired control:
Hardwired control is a control mechanism to generate control signals by using
appropriate finite state machine (FSM). The pair of "microinstruction-register" and
"control storage address register" can be regarded as a "state register" for the
hardwired control. Note that the control storage can be regarded as a kind of
combinational logic circuit. We can assign any 0, 1 values to each output
corresponding to each address, which can be regarded as the input for a
combinational logic circuit. This is a truth table.
Page 41
Computer Organization and Architecture
Instruction Cycle
In this chapter, we examine the sequences of micro operations that the Basic
Computer goes through for each instruction. Here, you should begin to understand
how the required control signals for each state of the CPU are determined, and how
they are generated by the control unit.
The CPU performs a sequence of micro operations for each instruction. The sequence
for each instruction of the Basic Computer can be refined into 4 abstract phases:
1. Fetch instruction
2. Decode
3. Fetch operand
4. Execute
Program execution can be represented as a top-down design:
1. Program execution
a. Instruction 1
i. Fetch instruction
ii. Decode
iii. Fetch operand
iv. Execute
b. Instruction 2
i. Fetch instruction
ii. Decode
iii. Fetch operand
iv. Execute
c. Instruction 3 ...
Page 42
Computer Organization and Architecture
Note that incrementing the PC at time T1 assumes that the next instruction is at
the next address. This may not be the case if the current instruction is a branch
instruction. However, performing the increment here will save time if the next
instruction immediately follows, and will do no harm if it doesn't. The incremented PC
value is simply overwritten by branch instructions.
Page 43
Computer Organization and Architecture
Page 44
Computer Organization and Architecture
How many CPU clock cycles are needed to transfer a character from the keyboard to
the INPR register? (tricky)
Are the clock pulses provided by the CPU master clock?
RS232, USB, Firewire are serial interfaces with their own clock independent of the
CPU. ( USB speed is independent of processor speed. )
• RS232: 115,200 kbps (some faster)
• USB: 11 mbps
• USB2: 480 mbps
• FW400: 400 mbps
• FW800: 800 mbps
• USB3: 4.8 gbps
OUTR inputs are connected to the bus in parallel, and the output is connected serially
to the terminal. OUTR is another shift register, and the printer/monitor receives an
end-bit during each clock pulse.
I/O Operations
Since input and output devices are not under the full control of the CPU (I/O events
are asynchronous), the CPU must somehow be told when an input device has new
input ready to send, and an output device is ready to receive more output. The FGI flip-
flop is set to 1 after a new character is shifted into INPR. This is done by the I/O
interface, not by the control unit. This is an example of an asynchronous input event
(not synchronized with or controlled by the CPU).
The FGI flip-flop must be cleared after transferring the INPR to AC. This must be
done as a micro operation controlled by the CU, so we must include it in the CU design.
The FGO flip-flop is set to 1 by the I/O interface after the terminal has finished
displaying the last character sent. It must be cleared by the CPU after transferring a
character into OUTR. Since the keyboard controller only sets FGI and the CPU only
clears it, a JK flip-flop is convenient:
Page 45
Computer Organization and Architecture
+------- +
Keyboard controller --->| J Q | ----->
| | |
+--------\-----\ | |
) or >----->|> FGI |
+--------/-----/ | |
| | |
CPU-------------------->| K |
+-------
How do we control the CK input on the FGI flip-flop? (Assume leading-edge
triggering.)
There are two common methods for detecting when I/O devices are ready,
namely software polling and interrupts. These two methods are discussed in the
following sections.
Stack Organization
Stack is the storage method of the items in which the last item included is the first one
to be removed/taken from the stack. Generally a stack in the computer is
a memory unit with an address register and the register holding the address of the
stack is known as the Stack Pointer (SP). A stack performs Insertion and Deletion
operation, were the operation of inserting an item is known as Push and operation of
deleting an item is known as Pop. Both Push and Pop operation results in
incrementing and decrementing the stack pointer respectively.
Register Stack
Register or memory words can be organized to form a stack. The stack pointer is
a register that holds the memory address of the top of the stack. When an item need
to be deleted from the stack, item on the top of the stack is deleted and the stack
pointer is decremented. Similarly, when an item needs to be added, the stack pointer is
incremented and writing the word at the position indicated by the stack pointer. There
are two 1 bit register; FULL and EMTY that are used for describing the
stack overflow and underflow conditions. Following micro-operations are performed
during inserting and deleting an item in/from the stack.
Page 46
Computer Organization and Architecture
Insert:
SP <- SP + 1 // Increment the stack pointer to point the next higher address//
M[SP] <- DR // Write the item on the top of the stack//
If (SP = 0) then (Full <- 1) // Check overflow condition //
EMTY <- 0 // Mark that the stack is not empty //
Delete:
DR <- M[SP] //Read an item from the top of the stack//
SP <- SP 1 //Decrement the stack pointer //
If (SP = 0) then (EMTY <- 1) //Check underflow condition //
FULL <- 0 //Mark that the stack is not full //
Get all the resource regarding the homework help and assignment help at
Transtutors.com. With our team of experts, we are capable of providing homework
help and assignment help for all levels. With us you can be rest assured the all the
content provided for homework help and assignment help will be original and
plagiarism free.
Register Stack:-
A stack can be placed in a portion of a large memory as it can be organized as a
collection of a finite number of memory words as register.
Page 47
Computer Organization and Architecture
In a 64- word stack, the stack pointer contains 6 bits because 26 = 64.
The one bit register FULL is set to 1 when the stack is full, and the one-bit register
EMTY is set to 1 when the stack is empty. DR is the data register that holes the binary
data to be written into on read out of the stack. Initially, SP is decide to O, EMTY is set
to 1, FULL = 0, so that SP points to the word at address O and the stack is masked
empty and not full.
PUSH SP ® SP + 1 increment stack pointer
M [SP] ® DR unit item on top of the Stack
It (SP = 0) then (FULL ® 1) check it stack is full
EMTY ® 0 mask the stack not empty.
POP DR ® [SP] read item trans the top of stack
SP ® SP –1 decrement SP
It (SP = 0) then (EMTY ® 1) check it stack is empty
FULL ® 0 mark the stack not full.
Three items are placed in the stack: A, B, and C in the order. item C is
on the top of the stack so that the content of sp is now 3. To remove the top item, the
stack is popped by reading the memory word at address 3 and decrementing the
content of SP. Item B is now on top of the stack since SP holds address 2. To insert a
new item, the stack is pushed by incrementing SP and writing a word in the next
higher location in the stack. Note that item C has read out but not physically removed.
This does not matter because when the stack is pushed, a new item is written in its
place.
In a 64-word stack, the stack pointer contains 6 bits because 26=64. since SP
has only six bits, it cannot exceed a number greater than 63(111111 in binary). When
Page 48
Computer Organization and Architecture
The stack pointer is incremented so that it points to the address of the next-higher
word. A memory write operation inserts the word from DR into the top of the stack.
Note that SP holds the address of the top of the stack and that M(SP) denotes the
memory word specified by the address presently available in SP, the first item stored
in the stack is at address 1. The last item is stored at address 0, if SP reaches 0, the
stack is full of item, so FULLL is set to 1.
This condition is reached if the top item prior to the last push was in location 63
and after increment SP, the last item stored in location 0. Once an item is stored in
location 0, there are no more empty register in the stack. If an item is written in the
stack, obviously the stack cannot be empty, so EMTY is cleared to 0.
Page 49
Computer Organization and Architecture
The top item is read from the stack into DR. The stack pointer is then decremented. if
its value reaches zero, the stack is empty, so Empty is set to 1. This condition is
reached if the item read was in location 1. Once this item is read out, SP is
decremented and reaches the value 0, which is the initial value of SP. Note that if a
pop operation reads the item from location 0 and then SP is decremented, SP changes
to 111111, which is equal to decimal 63. In this configuration, the word in address 0
receives the last item in the stack. Note also that an erroneous operation will result if
the stack is pushed when FULL=1 or popped when EMTY =1.
Memory Stack :
A stack can exist as a stand-alone unit as in figure 4 or can be implemented in
a random access memory attached to CPU. The implementation of a stack in the CPU
is done by assigning a portion of memory to a stack operation and using a processor
register as a stack pointer. Figure shows a portion of computer memory partitioned in
to three segment program, data and stack. The program counter PC points at the
address of the next instruction in the program. The address register AR points at an
array of data. The stack pointer SP points at the top of the stack. The three register
are connected to a common address bus, and either one can provide an address for
memory. PC is used during the fetch phase to read an instruction. AR is used during
the execute phase to read an operand. SP is used to push or POP items into or from
the stack.
As show in figure :4 the initial value of SP is 4001 and the stack grows with
decreasing addresses. Thus the first item stored in the stack is at address 4000, the
second item is stored at address 3999, and the last address hat can be used for the
stack is 3000. No previous are available for stack limit checks.
We assume that the items in the stack communicate with a data register DR. A new
item is inserted with the push operation as follows.
SP← SP-1
M[SP] ← DR
Page 50
Computer Organization and Architecture
The stack pointer is decremented so that it points at the address of the next word. A
Memory write operation insertion the word from DR into the top of the stack. A new
item is deleted with a pop operation as follows.
DR← M[SP]
SP←SP + 1
The top item is read from the stack in to DR. The stack pointer is then incremented to
point at the next item in the stack. Most computers do not provide hardware to check
for stack overflow (FULL) or underflow (Empty). The stack limit can be checked by
using two processor register: one to hold upper limit and other hold the lower limit.
After the pop or push operation SP is compared with lower or upper limit register.
Page 51
Computer Organization and Architecture
Reverse polish notation is also known as postfix notation is defined as: In postfix
notation operator is written after the operands. Examples of postfix notation are AB+
and CD-. Here A and B are two operands and the operator is written after these two
operands. The conversion from infix expression into postfix expression is shown
below.
▪ Convert the infix notation A x B + C x D + E x F into postfix notation?
SOLUTION
AxB+CxD+ExF
= [ABx] + [CDx] + [EFx]
= [ABxCDx] + [EFx]
= [ABxCDxEFx]
= ABxCDxEFx
So the postfix notation is ABxCDxEFx.
▪ Convert the infix notation {A – B + C x (D x E – F)} / G + H x K into postfix
notation?
{A – B + C x (D x E – F)} / G + H x K
= {A – B + C x ([DEx] – F)} / G + [HKx]
= {A – B + C x [DExF-]} / [GHKx+]
= {A – B + [CDExF-x]} / [GHKx+]
= {[AB-] + [CDExF-x]} / [GHKx+]
= [AB-CDExF-x+] / [GHKx+]
= [AB-CDExF-x+GHKx+/]
= AB-CDExF-x+GHKx+/
So the postfix notation is AB-CDExF-x+GHKx+/.
Now let’s how to evaluate a postfix expression, the algorithm for the evaluation of
postfix notation is shown below:
Page 52
Computer Organization and Architecture
ALGORITHM:
(Evaluation of Postfix notation) This algorithm finds the result of a postfix expression.
Step1: Insert a symbol (say #) at the right end of the postfix expression.
Step2: Scan the expression from left to right and follow the Step3 and Step4 for each of
the symbol encountered.
Step3: if an element is encountered insert into stack.
Step4: if an operator (say &) is encountered pop the top element A (say) and next to
top element B (say) perform the following operation x = B&A. Push x into the top of
the stack.
Step5: if the symbol # is encountered then stop scanning.
▪ Evaluate the post fix expression 50 4 3 x 2 – + 7 8 x 4 / -?
SOLUTION
Put symbol # at the right end of the expression: 50 4 3 x 2 – + 7 8 x 4 / – #.
Page 53
Computer Organization and Architecture
# – 46
_ # Result = 46
INSTRUCTION FORMATS
The most common fields found in instruction format are:-
(1) An operation code field that specified the operation to be performed
(2) An address field that designates a memory address or a processor registers.
(3) A mode field that specifies the way the operand or the effective address is
determined.
Computers may have instructions of several different lengths containing varying
number of addresses. The number of address field in the instruction format of a
computer depends on the internal organization of its registers. Most computers fall
into one of three types of CPU organization.
(1) Single Accumulator organization ADD X AC ® AC + M [×]
(2) General Register Organization ADD R1, R2, R3 R ® R2 + R3
(3) Stack Organization PUSH X
Page 54
Computer Organization and Architecture
Most common in commercial computers. Each address field can specify either a
processes register on a memory word.
MOV R1, A R1 ® M [A]
ADD R1, B R1 ® R1 + M [B]
MOV R2, C R2 ® M [C] X = (A + B) * ( C + D)
ADD R2, D R2 ® R2 + M [D]
MUL R1, R2 R1 ® R1 * R2
MOV X1 R1 M [X] ® R1
Page 55
Computer Organization and Architecture
MUL TOS ® (C + D) * (A + B)
POP X M [X] TOS
Addressing Modes
The operation field of an instruction specifies the operation to be performed. This
operation must be executed on some data stored in computer register as memory
words. The way the operands are chosen during program execution is dependent on
the addressing mode of the instruction. The addressing mode specifies a rule for
interpreting or modifying the address field of the instruction between the operand is
activity referenced. Computer use addressing mode technique for the purpose of
accommodating one or both of the following provisions.
(1) To give programming versatility to the uses by providing such facilities as
pointer to memory, counters for top control, indexing of data, and program relocation.
(2) To reduce the number of bits in the addressing fields of the instruction.
All computer architectures provide more than one of these addressing modes.
The question arises as to how the control unit can determine which addressing mode
is being used in a particular instruction. Several approaches are used. Often, different
opcodes will use different addressing modes. Also, one or more bits in the instruction
Page 56
Computer Organization and Architecture
format can be used as a mode field. The value of the mode field determines which
addressing mode is to be used.
Direct Addressing:
A very simple form of addressing is direct addressing, in which the address field
contains the effective address of the operand:
Page 57
Computer Organization and Architecture
EA = A
It requires only one memory reference and no special calculation.
Indirect Addressing:
With direct addressing, the length of the address field is usually less than the
word length, thus limiting the address range. One solution is to have the address field
refer to the address of a word in memory, which in turn contains a full-length address
of the operand. This is known as indirect addressing:
EA = (A)
Register Addressing:
Register addressing is similar to direct addressing. The only difference is that
the address field refers to a register rather than a main memory address:
EA = R
Page 58
Computer Organization and Architecture
The advantages of register addressing are that only a small address field is
needed in the instruction and no memory reference is required. The disadvantage of
register addressing is that the address space is very limited.
Register indirect addressing uses one less memory reference than indirect
addressing. Because, the first information is available in a register which is nothing
but a memory address. From that memory location, we use to get the data or
information. In general, register access is much more faster than the memory access.
Displacement Addressing:
Page 59
Computer Organization and Architecture
Relative Addressing:
For relative addressing, the implicitly referenced register is the program
counter (PC). That is, the current instruction address is added to the address field to
produce the EA. Thus, the effective address is a displacement relative to the address
of the instruction.
Base-Register Addressing:
Page 60
Computer Organization and Architecture
The reference register contains a memory address, and the address field
contains a displacement from that address. The register reference may be explicit or
implicit. In some implementation, a single segment/base register is employed and is
used implicitly. In others, the programmer may choose a register to hold the base
address of a segment, and the instruction must reference it explicitly.
Indexing:
The address field references a main memory address, and the reference
register contains a positive displacement from that address. In this case also the
register reference is sometimes explicit and sometimes implicit.
Generally index register are used for iterative tasks, it is typical that there is a
need to increment or decrement the index register after each reference to it. Because
this is such a common operation, some system will automatically do this as part of the
same instruction cycle.
This is known as auto-indexing. We may get two types of auto-indexing: -one is
auto-incrementing and the other one is -auto-decrementing. If certain registers are
devoted exclusively to indexing, then auto-indexing can be invoked implicitly and
automatically. If general purpose register are used, the auto index operation may need
to be signaled by a bit in the instruction.
In some machines, both indirect addressing and indexing are provided, and it is
possible to employ both in the same instruction. There are two possibilities: The
indexing is performed either before or after the indirection. If indexing is performed
after the indirection, it is termed post indexing
EA = (A) + (R)
Page 61
Computer Organization and Architecture
First, the contents of the address field are used to access a memory location
containing an address. This address is then indexed by the register value.
With pre indexing, the indexing is performed before the indirection:
EA = ( A + (R)
An address is calculated, the calculated address contains not the operand, but the
address of the operand.
Stack Addressing:
A stack is a linear array or list of locations. It is sometimes referred to as a
pushdown list or last-in-first-out queue. A stack is a reserved block of locations. Items
are appended to the top of the stack so that, at any given time, the block is partially
filled. Associated with the stack is a pointer whose value is the address of the top of
the stack. The stack pointer is maintained in a register. Thus, references to stack
locations in memory are in fact register indirect addresses.
The stack mode of addressing is a form of implied addressing. The machine
instructions need not include a memory reference but implicitly operate on the top of
the stack.
Value addition: A Quick View
Various Addressing Modes with Examples
The most common names for addressing modes (names may differ
among architectures)
Addressing Example
Meaning When used
modes Instruction
When a value is
Register Add R4,R3 R4 <- R4 + R3
in a register
Immediate Add R4, #3 R4 <- R4 + 3 For constants
Add R4, R4 <- R4 + Accessing local
Displacement
100(R1) Mem[100+R1] variables
Accessing using
Register a pointer or a
Add R4,(R1) R4 <- R4 + M[R1]
deffered computed
address
Useful in array
addressing:
Add R3, (R1 R3 <- R3 + R1 - base of
Indexed
+ R2) Mem[R1+R2] array
R2 - index
amount
Page 62
Computer Organization and Architecture
Useful in
Add R1,
Direct R1 <- R1 + Mem[1001] accessing static
(1001)
data
If R3 is the
Memory Add R1, R1 <- R1 + address of a
deferred @(R3) Mem[Mem[R3]] pointer p, then
mode yields *p
Useful for
stepping
through arrays
Auto- Add R1, R1 <- R1 +Mem[R2] in a loop.
increment (R2)+ R2 <- R2 + d R2 - start of
array
d - size of an
element
Same as
autoincrement.
Both can also
Auto- Add R1,- R2 <-R2-d
be used to
decrement (R2) R1 <- R1 + Mem[R2]
implement a
stack as push
and pop
Used to index
arrays. May be
applied to any
Add R1, R1<-
Scaled base
100(R2)[R3] R1+Mem[100+R2+R3*d]
addressing
mode in some
machines.
Notation:
<- - assignment
Mem - the name for memory:
Mem[R1] refers to contents of memory location whose address is given by the
contents of R1
Source: Self
Page 63
Computer Organization and Architecture
decision-making capabilities and change the path taken by the program when
executed in the computer.
Page 64
Computer Organization and Architecture
Subtract Sub
Multiply MUL
Divide DIV
Add with Carry ADDC
Subtract with Basses SUBB
Negate (2’s Complement) NEG
Page 65
Computer Organization and Architecture
The status register is a hardware register which contains information about the
state of the processor. Individual bits are implicitly or explicitly read and/or written
by the machine code instructions executing on the processor. The status register in a
traditional processor design includes at least three central flags: Zero, Carry, and
Overflow, which are set or cleared automatically as effects of arithmetic and bit
manipulation operations. One or more of the flags may then be read by a subsequent
conditional jump instruction (including conditional calls, returns, etc. in some
machines) or by some arithmetic, shift/rotate or bitwise operation, typically using the
carry flag as input in addition to any explicitly given operands. There are also
processors where other classes of instructions may read or write the fundamental
Page 66
Computer Organization and Architecture
Some CPU architectures, such as the MIPS and Alpha, do not use a dedicated flag
register. Others do not implicitly set and/or read flags. Such machines either do not
pass implicit status information between instructions at all, or do they pass it in a
explicitly selected general purpose register.
A status register may often have other fields as well, such as more specialized
flags, interrupt enable bits, and similar types of information. During an interrupt, the
status of the thread currently executing can be preserved (and later recalled) by
storing the current value of the status register along with the program counter and
other active registers into the machine stack or some other reserved area of memory.
Common flags:-
This is a list of the most common CPU status register flags, implemented in almost all
modern processors.
Page 67
Computer Organization and Architecture
Unsigned Compare(A-B):-
Mnemonics Branch Instruction Tested control
BHI Branch if Higher A>B
BHE Branch if Higher or Equal A >= B
BLO Branch if Lower A<B
BLE Branch if Lower or Equal A <= B
BE Branch if Equal A=B
BNE Branch if not Equal A not = B
Signed Compare(A-B):
Mnemonics Branch Instruction Tested control
Page 68
Computer Organization and Architecture
Interrupts allow the operating system to take notice of an external event, such
as a mouse click. Software interrupts, better known as exceptions, allow the OS to
handle unusual events like divide-by-zero errors coming from code execution.
to the same process, do housekeeping, etc. The timer tick interrupt provides the
foundation for the concept of preemptive multitasking.
TYPES OF INTERRUPTS
Generally there are three types of Interrupts those are Occurred For Example
1) Internal Interrupt
2) External Interrupt.
3) Software Interrupt.
1. Internal Interrupt:
• When the hardware detects that the program is doing something wrong, it will
usually generate an interrupt usually generate an interrupt.
– Arithmetic error - Invalid Instruction
– Addressing error - Hardware malfunction
– Page fault – Debugging
• A Page Fault interrupt is not the result of a program error, but it does require the
operating system to get control.
The Internal Interrupts are those which are occurred due to Some Problem in
the Execution For Example When a user performing any Operation which contains any
Error and which contains any type of Error. So that Internal Interrupts are those
which are occurred by the Some Operations or by Some Instructions and the
Operations those are not Possible but a user is trying for that Operation. And The
Software Interrupts are those which are made some call to the System for Example
while we are Processing Some Instructions and when we wants to Execute one more
Application Programs.
2. External Interrupt:
• I/O devices tell the CPU that an I/O request has completed by sending an interrupt
signal to the processor.
• I/O errors may also generate an interrupt.
Page 70
Computer Organization and Architecture
• Most computers have a timer which interrupts the CPU every so many interrupts the
CPU every so many milliseconds.
The External Interrupt occurs when any Input and Output Device request for any
Operation and the CPU will Execute that instructions first For Example When a
Program is executed and when we move the Mouse on the Screen then the CPU will
handle this External interrupt first and after that he will resume with his Operation.
3. Software interrupts:
These types if interrupts can occur only during the execution of an instruction. They
can be used by a programmer to cause interrupts if need be. The primary purpose of
such interrupts is to switch from user mode to supervisor mode.
A processor fault like access violation is triggered by the processor itself when it
encounters a condition that prevents it from executing code. Typically when it tries to
read or write from unmapped memory or encounters an invalid instruction.
CISC Characteristics
A computer with large number of instructions is called complex instruction set
computer or CISC. Complex instruction set computer is mostly used in scientific
computing applications requiring lots of floating point arithmetic.
➢ A large number of instructions - typically from 100 to 250 instructions.
➢ Some instructions that perform specialized tasks and are used infrequently.
➢ A large variety of addressing modes - typically 5 to 20 different modes.
➢ Variable-length instruction formats
Page 71
Computer Organization and Architecture
RISC Characteristics
A computer with few instructions and simple construction is called reduced
instruction set computer or RISC. RISC architecture is simple and efficient. The major
characteristics of RISC architecture are,
➢ Relatively few instructions
➢ Relatively few addressing modes
➢ Memory access limited to load and store instructions
➢ All operations are done within the registers of the CPU
➢ Fixed-length and easily-decoded instruction format.
➢ Single cycle instruction execution
➢ Hardwired and micro programmed control
Page 72
Computer Organization and Architecture
http://www.laureateiit.com/projects/bacii2014/projects/coa_anil/i_o_interface.html
Page 73
Computer Organization and Architecture
Control Memory:
Control memory is a random access memory(RAM) consisting of addressable storage
registers. It is primarily used in mini and mainframe computers. It is used as a temporary
storage for data. Access to control memory data requires less time than to main memory; this
speeds up CPU operation by reducing the number of memory references for data storage and
retrieval. Access is performed as part of a control section sequence while the master clock
oscillator is running. The control memory addresses are divided into two groups: a task mode
and an executive (interrupt) mode.
Addressing words stored in control memory is via the address select logic for each of
the register groups. There can be up to five register groups in control memory. These groups
select a register for fetching data for programmed CPU operation or for maintenance console
or equivalent display or storage of data via maintenance console or equivalent. During
programmed CPU operations, these registers are accessed directly by the CPU logic. Data
routing circuits are used by control memory to interconnect the registers used in control
memory. Some of the registers contained in a control memory that operate in the task and
the executive modes include the following: Accumulators Indexes Monitor clock status
indicating registers Interrupt data registers
• The control memory address register specifies the address of the microinstruction
• The control data register holds the microinstruction read from memory
• The microinstruction contains a control word that specifies one or more micro operations
for the data processor
• The location for the next micro instruction may, or may not be the next in sequence
• Some bits of the present micro instruction control the generation of the address of the next
micro instruction
• The next address may also be a function of external input conditions
• While the micro operations are being executed, the next address is computed in the next
address generator circuit (sequencer) and then transferred into the CAR to read the next
micro instructions
• A clock is applied to the CAR and the control word and next-address information are taken
directly from the control memory
• The address value is the input for the ROM and the control work is the output
• No read signal is required for the ROM as in a RAM
• The main advantage of the micro programmed control is that once the
hardware configuration is established, there should be no need for h/w or wiring changes
• To establish a different control sequence, specify a different set of microinstructions for
control memory
Addressing Sequencing:
Each machine instruction is executed through the application of a sequence of
microinstructions. Clearly, we must be able to sequence these; the collection of
microinstructions which implements a particular machine instruction is called a routine.
The MCU typically determines the address of the first microinstruction which
implements a machine instruction based on that instruction's opcode. Upon machine power-
up, the CAR should contain the address of the first microinstruction to be executed.
The MCU must be able to execute microinstructions sequentially (e.g., within routines), but
must also be able to ``branch'' to other microinstructions as required; hence, the need for a
sequencer.
Page 76
Computer Organization and Architecture
CAR
Control Address Register
control ROM
control memory (CM); holds CWs
opcode
opcode field from machine instruction
mapping logic
hardware which maps opcode into microinstruction address
branch logic
determines how the next CAR value will be determined from all the various possibilities
multiplexors
implements choice of branch logic for next CAR value
incrementer
generates CAR + 1 as a possible next CAR value
SBR
used to hold return address for subroutine-call branch operations
Conditional branches are necessary in the micro program. We must be able to perform
some sequences of micro-ops only when certain situations or conditions exist (e.g., for
conditional branching at the machine instruction level); to implement these, we need to be
able to conditional execute or avoid certain microinstructions within routines.
Subroutine branches are helpful to have at the micro program level. Many routines
contain identical sequences of microinstructions; putting them into subroutines allows those
routines to be shorter, thus saving memory. Mapping of opcodes to microinstruction
addresses can be done very simply. When the CM is designed, a ``required'' length is
determine for the machine instruction routines (i.e., the length of the longest one). This is
rounded up to the next power of 2, yielding a value k such that 2 k microinstructions will be
sufficient to implement any routine.
The first instruction of each routine will be located in the CM at multiples of this
``required'' length. Say this is N. The first routine is at 0; the next, at N; the next, at 2*N; etc.
This can be accomplished very easily. For instance, with a four-bit opcode and routine length
of four microinstructions, k is two; generate the microinstruction address by appending two
zero bits to the opcode:
Page 77
Computer Organization and Architecture
Alternately, the n-bit opcode value can be used as the ``address'' input of a 2n x M ROM; the
contents of the selected ``word'' in the ROM will be the desired M-bit CAR address for the
beginning of the routine implementing that instruction. (This technique allows for variable-
length routines in the CM.) >pp We choose between all the possible ways of generating CAR
values by feeding them all into a multiplexor bank, and implementing special branch logic
which will determine how the muxes will pass on the next address to the CAR.
As there are four possible ways of determining the next address, the multiplexor bank
is made up of N 4x1 muxes, where N is the number of bits in the address of a CW. The branch
logic is used to determine which of the four possible ``next address'' values is to be passed on
to the CAR; its two output lines are the select inputs for the muxes.
Page 78
Computer Organization and Architecture
THE MEMORY SYSTEM: Basic concepts, semiconductor RAM types of read ‐ only memory
(ROM), cache memory, performance considerations, virtual memory, secondary storage, raid,
direct memory access (DMA).
Book: Carl Hamacher, Zvonks Vranesic, SafeaZaky (2002), Computer Organization, 5th
edition, McGraw Hill: Unit-5 Pages: 292-366
If the smallest addressable unit of information is a memory word, the machine is called
word-addressable. If individual memory bytes are assigned distinct addresses, the computer
is called byte-addressable. Most of the commercial machines are byte addressable. For
example in a byte-addressable 32-bit computer, each memory word contains 4 bytes. A
possible word-address assignment would be:
Word Address Byte Address
0 0123
4 4567
8 8 9 10 11
. ….. . ….. . …..
With the above structure a READ or WRITE may involve an entire memory word or it may
involve only a byte. In the case of byte read, other bytes can also be read but ignored by the
CPU. However, during a write cycle, the control circuitry of the MM must ensure that only the
specified byte is altered. In this case, the higher-order 30 bits can specify the word and the
lower-order 2 bits can specify the byte within the word.
Page 79
Computer Organization and Architecture
each location will be ‘n’ bits wide, while the word length is equal to ‘n’ bits. During a “memory
cycle”, n bits of data may be transferred between the MM and CPU.
This transfer takes place over the processor bus, which has k address lines (address
bus), n data lines (data bus) and control lines like Read, Write, Memory Function completed
(MFC), Bytes specifiers etc (control bus). For a read operation, the CPU loads the address into
MAR, set READ to 1 and sets other control signals if required. The data from the MM is loaded
into MDR and MFC is set to 1. For a write operation, MAR, MDR are suitably loaded by the
CPU, write is set to 1 and other control signals are set suitably. The MM control circuitry loads
the data into appropriate locations and sets MFC to 1. This organization is shown in the
following block schematic.
Address Bus (k bits) Main Memory upto 2k addressable locations Word length = n bits Data
bus (n bits) Control Bus (Read, Write, MFC, Byte Specifier etc) MAR MDR CPU
Page 80
Computer Organization and Architecture
Cache Memory:-
The CPU of a computer can usually process instructions and data faster than they can be
fetched from compatibly priced main memory unit. Thus the memory cycle time become the
bottleneck in the system. One way to reduce the memory access time is to use cache memory.
This is a small and fast memory that is inserted between the larger, slower main memory and
the CPU. This holds the currently active segments of a program and its data. Because of the
locality of address references, the CPU can, most of the time, find the relevant information in
the cache memory itself (cache hit) and infrequently needs access to the main memory (cache
miss) with suitable size of the cache memory, cache hit rates of over 90% are possible leading
to a cost-effective increase in the performance of the system.
Memory Interleaving: -
This technique divides the memory system into a number of memory modules and arranges
addressing so that successive words in the address space are placed in different modules.
When requests for memory access involve consecutive addresses, the access will be to
different modules. Since parallel access to these modules is possible, the
average rate of fetching words from the Main Memory can be increased.
Virtual Memory: -
In a virtual memory System, the address generated by the CPU is referred to as a virtual or
logical address. The corresponding physical address can be different and the required
mapping is implemented by a special memory control unit, often called the memory
management unit. The mapping function itself may be changed during program execution
according to system requirements.
Because of the distinction made between the logical (virtual) address space and the
physical address space; while the former can be as large as the addressing capability of the
CPU, the actual physical memory can be much smaller. Only the active portion of the virtual
Page 81
Computer Organization and Architecture
address space is mapped onto the physical memory and the rest of the virtual address space
is mapped onto the bulk storage device used. If the addressed information is in the Main
Memory (MM), it is accessed and execution proceeds.
The following figure shows such an organization of a memory chip consisting of 16 words of 8
bits each, which is usually referred to as a 16 x 8 organization.
The data input and the data output of each Sense/Write circuit are connected to a single bi-
directional data line in order to reduce the number of pins required. One control line, the
R/W (Read/Write) input is used a specify the required operation and another control line, the
CS (Chip Select) input is used to select a given chip in a multichip
memory system. This circuit requires 14 external connections, and allowing 2 pins for power
supply and ground connections, can be manufactured in the form of a 16-pin chip. It can store
16 x 8 = 128 bits. Another type of organization for 1k x 1 format is shown below:
Page 82
Computer Organization and Architecture
The 10-bit address is divided into two groups of 5 bits each to form the row and column
addresses for the cell array. A row address selects a row of 32 cells, all of which are accessed
in parallel. One of these, selected by the column address, is connected to the external data
lines by the input and output multiplexers. This structure can store 1024 bits, can be
implemented in a 16-pin chip.
Page 83
Computer Organization and Architecture
Two transistor inverters connected to implement a basic flip-flop. The cell is connected to one
word line and two bits lines as shown. Normally, the bit lines are kept at about 1.6V, and the
word line is kept at a slightly higher voltage of about 2.5V. Under these conditions, the two
diodes D1 and D2 are reverse biased. Thus, because no current flows through the diodes, the
cell is isolated from the bit lines.
Read Operation:
Let us assume the Q1 on and Q2 off represents a 1 to read the contents of a given cell, the
voltage on the corresponding word line is reduced from 2.5 V to approximately 0.3 V. This
causes one of the diodes D1 or D2 to become forward-biased, depending on whether the
transistor Q1 or Q2 is conducting. As a result, current flows from bit line b when the cell is in
the 1 state and from bit line b when the cell is in the 0 state. The Sense/Write circuit at the
end of each pair of bit lines monitors the current on lines b and b’ and sets the output bit line
accordingly.
Write Operation:
While a given row of bits is selected, that is, while the voltage on the corresponding word line
is 0.3V, the cells can be individually forced to either the 1 state by applying a positive voltage
of about 3V to line b’ or to the 0 state by driving line b. This function is performed by the
Sense/Write circuit.
Page 84
Computer Organization and Architecture
Dynamic Memories:-
The basic idea of dynamic memory is that information is stored in the form of a charge on the
capacitor. An example of a dynamic memory cell is shown below:
When the transistor T is turned on and an appropriate voltage is applied to the bit line,
information is stored in the cell, in the form of a known amount of charge stored on the
capacitor. After the transistor is turned off, the capacitor begins to discharge. This is caused
by the capacitor’s own leakage resistance and the very small amount of current that still flows
through the transistor. Hence the data is read correctly only if is read before the charge on the
capacitor drops below some threshold value. During a Read
operation, the bit line is placed in a high-impedance state, the transistor is turned on and a
sense circuit connected to the bit line is used to determine whether the charge on the
capacitor is above or below the threshold value. During such a Read, the charge on the
capacitor is restored to its original value and thus the cell is refreshed with every read
operation.
Page 85
Computer Organization and Architecture
The cells are organized in the form of a square array such that the high-and lower-order 8 bits
of the 16-bit address constitute the row and column addresses of a cell, respectively. In order
to reduce the number of pins needed for external connections, the row and column address
are multiplexed on 8 pins.
To access a cell, the row address is applied first. It is loaded into the row address latch
in response to a single pulse on the Row Address Strobe (RAS) input. This selects a row of
cells. Now, the column address is applied to the address pins and is loaded into the column
address latch under the control of the Column Address Strobe (CAS) input and this address
selects the appropriate sense/write circuit. If the R/W signal indicates a Read operation, the
output of the selected circuit is transferred to the data output. Do. For a write operation, the
data on the DI line is used to overwrite the cell selected.
It is important to note that the application of a row address causes all the cells on the
corresponding row to be read and refreshed during both Read and Write operations. To
ensure that the contents of a dynamic memory are maintained, each row of cells must be
addressed periodically, typically once every two milliseconds. A Refresh circuit performs this
function. Some dynamic memory chips in-corporate a refresh facility the chips themselves
and hence they appear as static memories to the user! such chips are often referred to as
Pseudostatic.
Another feature available on many dynamic memory chips is that once the row
address is loaded, successive locations can be accessed by loading only column addresses.
Page 86
Computer Organization and Architecture
Such block transfers can be carried out typically at a rate that is double that for transfers
involving random addresses. Such a feature is useful when memory access follow a regular
pattern, for example, in a graphics terminal Because of their high density and low cost,
dynamic memories are widely used in the main memory units of computers. Commercially
available chips range in size from 1k to 4M bits or more, and are available in various
organizations like 64k x 1, 16k x 4, 1MB x 1 etc.
RAID arrays appear to the operating system (OS) as a single logical hard disk. RAID
employs the technique of disk mirroring or disk striping, which involves partitioning each
drive's storage space into units ranging from a sector (512 bytes) up to several megabytes.
The stripes of all the disks are interleaved and addressed in order.
In a single-user system where large records, such as medical or other scientific images,
are stored, the stripes are typically set up to be small (perhaps 512 bytes) so that a single
record spans all disks and can be accessed quickly by reading all disks at the same time.
In a multi-user system, better performance requires establishing a stripe wide enough to hold
the typical or maximum size record. This allows overlapped disk I/O across drives.
Page 87
Computer Organization and Architecture
RAID 1: Also known as disk mirroring, this configuration consists of at least two drives that
duplicate the storage of data. There is no striping. Read performance is improved since either
disk can be read at the same time. Write performance is the same as for single disk storage.
RAID 2: This configuration uses striping across disks with some disks storing error checking
and correcting (ECC) information. It has no advantage over RAID 3 and is no longer used.
RAID 3: This technique uses striping and dedicates one drive to storing parity information.
The embedded ECC information is used to detect errors. Data recovery is accomplished by
calculating the exclusive OR (XOR) of the information recorded on the other drives. Since an
I/O operation addresses all drives at the same time, RAID 3 cannot overlap I/O. For this
reason, RAID 3 is best for single-user systems with long record applications.
Page 88
Computer Organization and Architecture
RAID 4: This level uses large stripes, which means you can read records from any single
drive. This allows you to use overlapped I/O for read operations. Since all write operations
have to update the parity drive, no I/O overlapping is possible. RAID 4 offers no advantage
over RAID 5.
RAID 5: This level is based on block-level striping with parity. The parity information is
striped across each drive, allowing the array to function even if one drive were to fail. The
array’s architecture allows read and write operations to span multiple drives. This results in
performance that is usually better than that of a single drive, but not as high as that of a RAID
0 array. RAID 5 requires at least three disks, but it is often recommended to use at least five
disks for performance reasons.
RAID 5 arrays are generally considered to be a poor choice for use on write-intensive
systems because of the performance impact associated with writing parity information. When
a disk does fail, it can take a long time to rebuild a RAID 5 array. Performance is usually
degraded during the rebuild time and the array is vulnerable to an additional disk failure until
the rebuild is complete.
Page 89
Computer Organization and Architecture
RAID 6: This technique is similar to RAID 5 but includes a second parity scheme that is
distributed across the drives in the array. The use of additional parity allows the array to
continue to function even if two disks fail simultaneously. However, this extra protection
comes at a cost. RAID 6 arrays have a higher cost per gigabyte (GB) and often have slower
write performance than RAID 5 arrays.
In these situations, DMA can save processing time and is a more efficient way to move
data from the computer's memory to other devices. In order for devices to use direct memory
access, they must be assigned to a DMA channel. Each type of port on a computer has a set of
DMA channels that can be assigned to each connected device. For example, a PCI controller
and a hard drive controller each have their own set of DMA channels.
Page 90
Computer Organization and Architecture
For example, a sound card may need to access data stored in the computer's RAM, but since it
can process the data itself, it may use DMA to bypass the CPU. Video cards that support DMA
can also access the system memory and process graphics without needing the CPU. Ultra DMA
hard drives use DMA to transfer data faster than previous hard drives that required the data
to first be run through the CPU.
After each data transfer current address registers are decremented or incremented
according to current settings. The channel 1 current word count register is also decremented
by 1 after each data transfer. When the word count of channel 1 goes to FFFFH, a TC is
generated which activates EOP output terminating the DMA service.
Auto initialize
In this mode, during the initialization the base address and word count registers are loaded
simultaneously with the current address and word count registers by the microprocessor.
The address and the count in the base registers remain unchanged throughout the DMA
service.
After the first block transfer i.e. after the activation of the EOP signal, the original
values of the current address and current word count registers are automatically restored
from the base address and base word count register of that channel. After auto initialization
the channel is ready to perform another DMA service, without CPU intervention.
Page 91
Computer Organization and Architecture
DMA Controller
The controller is integrated into the processor board and manages all DMA data transfers.
Transferring data between system memory and an 110 device requires two steps. Data goes
from the sending device to the DMA controller and then to the receiving device. The
microprocessor gives the DMA controller the location, destination, and amount of data that is
to be transferred. Then the DMA controller transfers the data, allowing the microprocessor to
continue with other processing tasks.
When a device needs to use the Micro Channel bus to send or receive data, it competes
with all the other devices that are trying to gain control of the bus. This process is known as
arbitration. The DMA controller does not arbitrate for control of the BUS instead; the I/O
device that is sending or receiving data (the DMA slave) participates in arbitration. It is the
DMA controller, however, that takes control of the bus when the central arbitration control
point grants the DMA slave's request.
DMA vs. interrupts vs. polling
A diagram showing the position of the DMA in relation to peripheral devices, the CPU and
internal memory
• The DMA is used for moving large files since it would take too much of CPU capacity
Interrupt Systems
• Interrupts take up time of the CPU
Page 92
Computer Organization and Architecture
• they work by asking for the use of the CPU by sending the interrupt to which the CPU
responds
o Note: In order to save time the CPU does not check if it has to respond
Polling
Polling requires the CPU to actively monitor the process
• The major advantage is that the polling can be adjusted to the needs of the device
• polling is a low level process since the peripheral device is not in need of a quick response
Characteristics of Multiprocessors
A multiprocessor system is an interconnection of two or more CPU, with memory and
input-output equipment. As defined earlier, multiprocessors can be put under MIMD
category. The term multiprocessor is sometimes confused with the term multi computers.
Though both support concurrent operations, there is an important difference between a
system with multiple computers and a system with multiple processors.
In a multi computers system, there are multiple computers, with their own operating
systems, which communicate with each other, if needed, through communication links. A
multiprocessor system, on the other hand, is controlled by a single operating system, which
coordinate the activities of the various processors, either through shared memory or inter
processor messages.
A single job can be divided into independent tasks, either manually by the programmer, or by
the compiler, which finds the portions of the program that are data independent, and can be
executed in parallel. The multiprocessors are further classified into two groups depending on
the way their memory is organized. The processors with shared memory are called tightly
coupled or shared memory processors.
The information in these processors is shared through the common memory. Each of
Page 93
Computer Organization and Architecture
the processors can also have their local memory too. The other class of multiprocessors is
loosely coupled or distributed memory multi-processors. In this, each processor has their
own private memory, and they share information with each other through interconnection
switching scheme or message passing.
Page 94
Computer Organization and Architecture
This address activates a memory chip. The CPU then sends a read signal through the
control bus, in the response of which the memory puts the data on the data bus. Similarly, in a
multiprocessor system, if any processor has to read a memory location from the shared areas,
it follows the similar routine.
There are buses that transfer data between the CPUs and memory. These are called
memory buses. An I/O bus is used to transfer data to and from input and output devices. A
bus that connects major components in a multiprocessor system, such as CPUs, I/Os, and
memory is called system bus. A processor, in a multiprocessor system, requests the access of
a component through the system bus.
In case there is no processor accessing the bus at that time, it is given then control of
the bus immediately. If there is a second processor utilizing the bus, then this processor has
to wait for the bus to be freed. If at any time, there is request for the services of the bus by
more than one processor, then the arbitration is performed to resolve the conflict. A bus
controller is placed between the local bus and the system bus to handle this.
Page 95
Computer Organization and Architecture
A very common problem that can occur when two or more resources are trying to
access a resource which can be modified. For example processor 1 and 2 are simultaneously
trying to access memory location 100. Say the processor 1 is writing on to the location while
processor 2 is reading it. The chances are that processor 2 will end up reading erroneous
data. Such kind of resources which need to be protected from simultaneous access of more
than one processors are called critical sections. The following assumptions are made
regarding the critical sections:
- Mutual exclusion: At most one processor can be in a critical section at a time
-Termination : The critical section is executed in a finite time
- Fair scheduling: A process attempting to enter the critical section will eventually do so in a
finite time.
A binary value called a semaphore is usually used to indicate whether a processor is currently
Executing the critical section.
Cache Coherence
As discussed in unit 2, cache memories are high speed buffers which are inserted
between the processor and the main memory to capture those portions of the contents of
main memory which are currently in use. These memories are five to ten times faster than
main memories, and therefore, reduce the overall access time. In a multiprocessor system,
with shared memory, each processor has its own set of private cache.
Multiple copies of the cache are provided with each processor to reduce the access
time. Each processor, whenever accesses the shared memory, also updates its private cache.
This introduced the problem of cache coherence, which may result in data inconsistency. That
is, several copies of the same data may exist in different caches at any given time.
For example, let us assume there are two processors x and y. Both have the same copy of the
cache. Processor x, produces data 'a' which is to be consumed by processor y. Processor
update the value of 'a' in its own private copy of the cache. As it does not have any access to
Page 96
Computer Organization and Architecture
the private copy of cache of processor y, the processor y continues to use the variable 'a' with
old value, unless it is informed of the change.
Thus, in such kind of situations if the system is to perform correctly, every updation in
the cache should be informed to all the processors, so that they can make necessary changes
in their private copies of the cache.
Page 97