UNIT4 and 5 CAALP
UNIT4 and 5 CAALP
The following pin function descriptions are for the microprocessor 8086 in either minimum
or maximum mode.
These lines constitute the time multiplexed memory/IO address during the first clock cycle (T1)
and data during T2, T3 and T4 clock cycles. A0 is analogous to BHE for the lower byte of the
data bus, pins D0-D7. A0 bit is Low during T1 state when a byte is to be transferred on the lower
portion of the bus in memory or I/O operations. 8-bit oriented devices tied to the lower half
would normally use A0 to condition chip select functions. These lines are active high and float to
tri-state during interrupt acknowledge and local bus "Hold acknowledge".
During T1 state these lines are the four most significant address lines for memory operations.
During I/O operations these lines are low. During memory and I/O operations, status information
is available on these lines during T2, T3, and T4 states.S5: The status of the interrupt enable flag
bit is updated at the beginning of each cycle. The status of the flag is indicated through this bus.
S6:
When Low, it indicates that 8086 is in control of the bus. During a "Hold acknowledge" clock
period, the 8086 tri-states the S6 pin and thus allows another bus master to take control of the
status bus.
S3 & S4:
After the first clock cycle of an instruction execution, the A17/S4 and A16/S3 pins specify which
segment register generates the segment portion of the 8086 address., This feature also provides a
degree of protection by preventing write operations to one segment from erroneously
overlapping into another segment and destroying information in that segment.
During T1 state theBHE should be used to enable data onto the most significant half of the data
bus, pins D15 - D8. Eight-bit oriented devices tied to the upper half of the bus would normally
use BHE to control chip select functions. BHE is Low during T1 state of read, write and interrupt
acknowledge cycles when a byte is to be transferred on the high portion of the bus.
The S7 status information is available during T2, T3 and T4 states. The signal is active Low and
floats to 3-state during "hold" state. This pin is Low during T1 state for the first interrupt
acknowledge cycle.
RD (O): READ
The Read strobe indicates that the processor is performing a memory or I/O read cycle. This
signal is active low during T2 and T3 states and the Tw states of any read cycle. This signal
floats to tri-state in "hold acknowledge cycle".
TEST (I)TEST pin is examined by the "WAIT" instruction. If the TEST pin is Low, execution
continues. Otherwise the processor waits in an "idle" state. This input is synchronized internally
during each clock cycle on the leading edge of CLK.
It is a level triggered input which is sampled during the last clock cycle of each instruction to
determine if the processor should enter into an interrupt acknowledge operation. A subroutine is
vectored to via an interrupt vector look up table located in system memory. It can be internally
masked by software resetting the interrupt enable bit INTR is internally synchronized. This
signal is active HIGH.
An edge triggered input, causes a type-2 interrupt. A subroutine is vectored to via the interrupt
vector look up table located in system memory. NMI is not maskable internally by software. A
transition from a LOW to HIGH on this pin initiates the interrupt at the end of the current
instruction. This input is internally synchronized.
RESET (I)
Reset causes the processor to immediately terminate its present activity. To be recognised, the
signal must be active high for at least four clock cycles, except after power-on which requires a
50 Micro Sec. pulse. It causes the 8086 to initialize registers DS, SS, ES, IP and flags to all
zeros. It also initializes CS to FFFF H. Upon removal of the RESET signal from the RESET pin,
the 8086 will fetch its next instruction from the 20 bit physical address FFFF0H. The reset signal
to 8086 can be generated by the 8284. (Clock generation chip). To guarantee reset from power-
up, the reset input must remain below 1.5 volts for 50 Micro sec. after Vcc has reached the
minimum supply voltage of 4.5V.
READY (I)
Ready is the acknowledgement from the addressed memory or I/O device that it will complete
the data transfer. The READY signal from memory or I/O is synchronized by the 8284 clock
generator to form READY. This signal is active HIGH. The 8086 READY input is not
synchronized. Correct operation is not guaranteed if the setup and hold times are not met.
Clock provides the basic timing for the processor and bus controller. It is asymmetric with 33%
duty cycle to provide optimized internal timing. Minimum frequency of 2 MHz is required, since
the design of 8086 processors incorporates dynamic cells. The maximum clock frequencies of
the 8086-4, 8086 and 8086-2 are4MHz, 5MHz and 8MHz respectively.
Since the 8086 does not have on-chip clock generation circuitry, and 8284 clock generator chip
must be connected to the 8086 clock pin. The crystal connected to 8284 must have a frequency 3
times the 8086 internal frequency. The 8284 clock generation chip is used to generate READY,
RESET and CLK.
This pin indicates what mode the processor is to operate in. In minimum mode, the 8086 itself
generates all bus control signals. In maximum mode the three status signals are to be decoded to
generate all the bus control signals.
Minimum Mode Pins The following 8 pins function descriptions are for the 8086 in minimum
mode; MN/ MX = 1. The corresponding 8 pins function descriptions for maximum mode is
explained later.
This pin is used to distinguish a memory access or an I/O accesses. When this pin is Low, it
accesses I/O and when high it access memory. M / IO becomes valid in the T4 state preceding a
bus cycle and remains valid until the final T4 of the cycle. M/IO floats to 3 - state OFF during
local bus "hold acknowledge".
WR (O): Write
Indicates that the processor is performing a write memory or write IO cycle, depending on the
state of the M /IOsignal. WR is active for T2, T3 and Tw of any write cycle. It is active LOW,
and floats to 3-state OFF during local bus "hold acknowledge ".
It is used as a read strobe for interrupt acknowledge cycles. It is active LOW during T2, T3, and
T4 of each interrupt acknowledge cycle.
ALE is provided by the processor to latch the address into the 8282/8283 address latch. It is an
active high pulse during T1 of any bus cycle. ALE signal is never floated.
It is provided as an output enable for the 8286/8287 in a minimum system which uses the
transceiver. DEN is active LOW during each memory and IO access. It will be low beginning
with T2 until the middle of T4, while for a write cycle, it is active from the beginning of T2 until
the middle of T4. It floats to tri-state off during local bus "hold acknowledge".
Hold indicates that another master is requesting a local bus "HOLD". To be acknowledged,
HOLD must be active HIGH. The processor receiving the "HOLD " request will issue HLDA
(HIGH) as an acknowledgement in the middle of the T1-clock cycle. Simultaneous with the issue
of HLDA, the processor will float the local bus and control lines. After "HOLD" is detected as
being Low, the processor will lower the HLDA and when the processor needs to run another
cycle, it will again drive the local bus and control lines.
Maximum Mode The following pins function descriptions are for the 8086/8088 systems in
maximum mode (i.e.. MN/MX = 0). Only the pins which are unique to maximum mode are
described below.
These pins are active during T4, T1 and T2 states and is returned to passive state (1,1,1 during
T3 or Tw (when ready is inactive). These are used by the 8288 bus controller to generate all
memory and I/O operation) access control signals. Any change by S2, S1, S0 during T4 is used
to indicate the beginning of a bus cycle. These status lines are encoded as shown in table 3.
S2 S1 S0 Characteristics
0 0 0 Interrupt acknowledge
0 0 1 Read I/O port
0 1 0 Write I/O port
0 1 1 Halt
1 0 0 Code access1 0 1 Read memory
1 1 0 Write memory
1 1 1 Passive State
Table 3
Queue Status is valid during the clock cycle after which the queue operation is performed. QS0,
QS1 provide status to allow external tracking of the internal 8086 instruction queue. The
condition of queue status is shown in table 4.
Queue status allows external devices like In-circuit Emulators or special instruction set extension
co-processors to track the CPU instruction execution. Since instructions are executed from the
8086 internal queue, the queue status is presented each CPU clock cycle and is not related to the
bus cycle activity. This mechanism allows (1) A processor to detect execution of a ESCAPE
instruction which directs the co- processor to perform a specific task and (2) An in-circuit
Emulator to trap execution of a specific memory location.
QS1 QS1 Characteristics
0 0 No operation
0 1 First byte of opcode from queue
1 0 Empty the queue
1 1 Subsequent byte from queue
Table 4
LOCK (O)
It indicates to another system bus master, not to gain control of the system bus while LOCK is
active Low. The LOCK signal is activated by the "LOCK" prefix instruction and remains active
until the completion of the instruction. This signal is active Low and floats to tri-state OFF
during 'hold acknowledge'. Example:
LOCK XCHG reg., Memory ; Register is any register and memory GT0
; is the address of the semaphore.
These pins are used by other processors in a multi processor organization. Local bus masters of
other processors force the processor to release the local bus at the end of the processors current
bus cycle. Each pin is bi-directional and has an internal pull up resistors. Hence they may be left
un-connected.
II)Flag Registers
The 8086 microprocessor has a 16 bit register for flag register. In this register 9 bits are active
for flags. This register has 9 flags which are divided into two parts that are as follows
Conditional Flags
Conditional flags represent result of last arithmetic or logical instruction executed. Conditional
flags are as follows:
1. CF (Carry Flag)
This flag indicates an overflow condition for unsigned integer arithmetic. It is also used
in multiple-precision arithmetic.
2. AF (Auxiliary Flag)
3. PF (Parity Flag)
This flag is used to indicate the parity of result. If lower order 8-bits of the result contains
even number of 1’s, the Parity Flag is set and for odd number of 1’s, the Parity Flag is
reset.
4. ZF (Zero Flag)
5. SF (Sign Flag)
In sign magnitude format the sign of number is indicated by MSB bit. If the result of
operation is negative, sign flag is set.
6. OF (Overflow Flag)
This stands for over flow flag. It occurs when signed numbers are added or subtracted.
An OF indicates that the result has exceeded the capacity of machine. It becomes set if
the sign result cannot express within the number of bites.
Control Flags
Control flags are set or reset deliberately to control the operations of the execution unit. Control
flags are as follows:
1. TF (Trap Flag):
It is used for single step control. It allows user to execute one instruction of a program at
a time for debugging. When trap flag is set, program can be run in single step mode.
2. IF (Interrupt Flag):
It is an interrupt enable/disable flag. This stands for interrupt flag. This flag is used to
enable or disable the interrupt in a program. If it is set, the maskable interrupt of 8086 is
enabled and if it is reset, the interrupt is disabled. It can be set by executing instruction sit
and can be cleared by executing CLI instruction.
3. DF (Direction Flag):
This flag stands for direction flag and is used for the direction of strings. If it is set, string
bytes are accessed from higher memory address to lower memory address. When it is
reset, the string bytes are accessed from lower memory address to higher memory address
The 8086 instructions are categorized into the following main types.
MOV :
This instruction copies a word or a byte of data from some source to a destination. The
destination can be a register or a memory location. The source can be a register, a memory
location, or an immediate number.
MOV AX,BX
MOV AX,5000H
MOV AX,[SI]
MOV AX,[2000H]
MOV AX,50H[BX]
MOV [734AH],BX
MOV DS,CX
MOV CL,[357AH]
Direct loading of the segment registers with immediate data is not permitted.
This instruction pushes the contents of the specified register/memory location on to the
stack. The stack pointer is decremented by 2, after each execution of the instruction.
E.g. PUSH AX
• PUSH DS
• PUSH [5000H]
Fig. 2.2 Push Data to stack memory
This instruction when executed, loads the specified register/memory location with the
contents of the memory location of which the address is formed using the current stack segment
and stack pointer.
POP DS POP
[5000H]
This instruction exchange the contents of the specified source and destination
operands
Eg. XCHG [5000H], AX
XCHG BX, AX
XLAT :
XLAT
Eg. IN AL,03H
IN AX,DX
OUT:
OUT DX, AX
LEA :
LDS:
[reg] [mem]
[DS] [mem + 2]
LES:
[reg] [mem]
[ES] [mem + 2]
Load (copy to) AH with the low byte the flag register. [AH]
[ Flags low byte]
Eg. LAHF
ADD AX, 0100H ADD AX, BX ADD AX, [SI] ADD AX, [5000H]
ADD [5000H], 0100H ADD 0100H
SAHF:
Eg. SAHF
PUSHF:
[SP] [SP] – 2
[[SP]] [Flags]
Eg. PUSHF
POPF :
[Flags] [[SP]]
[SP] [SP] + 2
Arithmetic Instructions:
The 8086 provides many arithmetic operations: addition, subtraction, negation, multiplication
and comparing two values.
ADD :
The add instruction adds the contents of the source operand to the destination
operand.
Eg.
This instruction performs the same operation as ADD instruction, but adds the carry flag to
the result.
SUB : Subtract
The subtract instruction subtracts the source operand from the destination operand and the
result is left in the destination operand.
The subtract with borrow instruction subtracts the source operand and the borrow flag (CF)
which may reflect the result of the previous calculations, from the destination.
INC : Increment
This instruction increases the contents of the specified Register or memory location by 1.
Immediate data cannot be operand of this instruction.
DEC : Decrement
The decrement instruction subtracts 1 from the contents of the specified register or memory
location.
The negate instruction forms 2’s complement of the specified destination in the instruction.
The destination can be a register or a memory location. This instruction can be implemented by
inverting each bit and adding 1 to it.
Eg. NEG AL
CMP : Compare
This instruction compares the source operand, which may be a register or an immediate data
or a memory location, with a destination operand that may be a
CMP BX, CX
This instruction multiplies a signed byte in source operand by a signed byte in AL or a signed
word in source operand by a signed word in AX.
This instruction copies the sign of a byte in AL to all the bits in AH. AH is then said to be
sign extension of AL.
Eg. CBW
AX= 0000 0000 1001 1000 Convert signed byte in AL signed word in AX. Result
in AX = 1111 1111 1001 1000
This instruction copies the sign of a byte in AL to all the bits in AH. AH is then said to be
sign extension of AL.
Eg. CWD
This instruction is used to divide an unsigned word by a byte or to divide an unsigned double
word by a word.
The AAA instruction is executed aftr an ADD instruction that adds two ASCII coded
operand to give a byte of result in AL. The AAA instruction converts the resulting contents of Al
to a unpacked decimal digits.
This instruction, after execution, converts the product available In AL into unpacked
BCD format.
Eg. MOV AL, 04 ; AL = 04
MOV BL ,09 ; BL = 09
MUL BL ; AX = AL*BL ; AX=24H
AAM ; AH = 03, AL=06
This instruction converts two unpacked BCD digits in AH and AL to the equivalent binary
number in AL. This adjustment must be made before dividing the two unpacked BCD digits in
AX by an unpacked BCD byte. In the instruction sequence, this instruction appears Before DIV
instruction.
Eg. AX 05 08
The result of AAD execution will give the hexadecimal number 3A in AL and 00
This instruction is used to convert the result of the addition of two packed BCD numbers to a
valid BCD number. The result has to be only in AL.
Eg. AL = 53CL = 29
ADD AL, CL ; AL (AL) + (CL)
; AL 53 + 29
; AL 7C
DAA ; AL 7C + 06 (as C>9)
; AL 82
This instruction converts the result of the subtraction of two packed BCD numbers to a valid
BCD number. The subtraction has to be in AL only.
Eg. AL = 75, BH = 46
2 F = (AL) -
SUB AL, BH ; AL (BH)
; AF = 1
DAS ; AL 2 9 (as F>9, F - 6 = 9)
Logical Instructions
This instruction bit by bit ANDs the source operand that may be an immediate register or
a memory location to the destination operand that may a register or a memory location. The
result is stored in the destination operand.
OR : Logical OR
This instruction bit by bit ORs the source operand that may be an immediate , register or
a memory location to the destination operand that may a register or a memory location. The
result is stored in the destination operand.
This instruction bit by bit XORs the source operand that may be an immediate , register
or a memory location to the destination operand that may a register or a memory location. The
result is stored in the destination operand.
Eg. XOR AX, 0098H XOR
AX, BX
The TEST instruction performs a bit by bit logical AND operation on the two operands.
The result of this ANDing operation is not available for further use, but flags
are affected.
Eg.TEST AX, BX
TEST [0500], 06H
SAL/SHL : SAL / SHL destination, count.
SAL and SHL are two mnemonics for the same instruction. This instruction shifts each
bit in the specified destination to the left and 0 is stored at LSB position. The MSB is shifted into
the carry flag. The destination can be a byte or a word.
It can be in a register or in a memory location. The number of shifts is indicated
by count.
This instruction shifts each bit in the specified destination to the right and 0 is stored at
MSB position. The LSB is shifted into the carry flag. The destination can be a byte or a word.
This instruction shifts each bit in the specified destination some number of bit positions
to the right. As a bit is shifted out of the MSB position, a copy of the old MSB is put in the MSB
position. The LSB will be shifted into CF.
This instruction rotates all bits in a specified byte or word to the left some number of bit
positions. MSB is placed as a new LSB and a new CF.
Eg. ROL CX, 1 MOV CL,
03H ROL BL, CL
This instruction rotates all bits in a specified byte or word to the right some number of bit
positions. LSB is placed as a new MSB and a new CF.
This instruction rotates all bits in a specified byte or word some number of bit positions
to the left along with the carry flag. MSB is placed as a new carry and previous carry is place as
new LSB.
This instruction rotates all bits in a specified byte or word some number of bit positions
to the right along with the carry flag. LSB is placed as a new carry and previous carry is place as
new MSB.
This instruction rotates all bits in a specified byte or word to the right some number of bit
positions. LSB is placed as a new MSB and a new CF.
This instruction rotates all bits in a specified byte or word some number of bit positions
to the left along with the carry flag. MSB is placed as a new carry and previous carry is place as
new LSB.
RCL AL, CL
This instruction rotates all bits in a specified byte or word some number of bit positions
to the right along with the carry flag. LSB is placed as a new carry and previous carry is place as
new MSB.
Eg. RCR CX, 1 MOV CL,
04H RCR AL, CL
Branch Instructions :
Branch Instructions transfers the flow of execution of the program to a new address
specified in the instruction directly or indirectly. When this type of instruction is executed, the
CS and IP registers get loaded with new values of CS and IP corresponding to the location to be
transferred.
This instruction is used to call a Subroutine (Procedure) from a main program. Address of
procedure may be specified directly or indirectly.
There are two types of procedure depending upon whether it is available in the same
segment or in another segment.
On execution this instruction stores the incremented IP & CS onto the stack and loads the
CS & IP registers with segment and offset addresses of the procedure to be called.
At the end of the procedure, the RET instruction must be executed. When it is executed, the
previously stored content of IP and CS along with Flags are retrieved into the CS, IP and Flag
registers from the stack and execution of the main program continues further.
This instruction is executed, when the overflow flag OF is set. This is equivalent to a Type 4
Interrupt instruction.
This instruction unconditionally transfers the control of execution to the specified address
using an 8-bit or 16-bit displacement. No Flags are affected by this instruction.
When it is executed, the values of IP, CS and Flags are retrieved from the stack to
continue the execution of the main program.
This instruction executes the part of the program from the Label or address specified in
the instruction upto the LOOP instruction CX number of times. At each iteration, CX is
decremented automatically and JUMP IF NOT ZERO structure.
OR BX, AX
LOOP Label 1
JZ/JE Label
JS Label
JNS Label
JO Label
JNO Label
JNP Label
JP Label
JB Label
JNB Label
JCXZ Label
Loop through a sequence of instructions from label while ZF=1 and CX=0.
The 8086 supports a set of more powerful instructions for string manipulations for
referring to a string, two parameters are required.
The length of the string is usually stored as count in the CX register.The incrementing or
decrementing of the pointer, in string instructions, depends upon the Direction Flag (DF) Status.
If it is a Byte string operation, the index registers are updated
by one. On the other hand, if it is a word string operation, the index registers are updated by two.
This instruction is used as a prefix to other instructions, the instruction to which the REP
prefix is provided, is executed repeatedly until the CX register becomes zero (at each iteration
CX is automatically decremented by one).
These are used for CMPS, SCAS instructions only, as instruction prefixes.
The starting address of the destination locations where this string has to be relocated is
given by DI (Destination Index) and ES (Extra Segment) contents.
The CMPS instruction can be used to compare two strings of byte or words. The length
of the string must be stored in the register CX. If both the byte or word strings are equal, zero
Flag is set.
The REP instruction Prefix is used to repeat the operation till CX (counter) becomes zero
or the condition specified by the REP Prefix is False.
This instruction scans a string of bytes or words for an operand byte or word specified in
the register AL or AX. The String is pointed to by ES:DI register pair. The length of the string s
stored in CX. The DF controls the mode for scanning of the string. Whenever a match to the
specified operand, is found in the string, execution stops and the zero Flag is set. If no match is
found, the zero flag is reset.
The LODS instruction loads the AL / AX register by the content of a string pointed to by
DS : SI register pair. The SI is modified automatically depending upon DF, If it is a byte transfer
(LODSB), the SI is modified by one and if it is a word transfer (LODSW), the SI is modified by
two. No other Flags are affected by this instruction.
The STOS instruction Stores the AL / AX register contents to a location in the string
pointer by ES : DI register pair. The DI is modified accordingly, No Flags are affected by this
instruction.
The direction Flag controls the String instruction execution, The source index SI and
Destination Index DI are modified after each iteration automatically. If DF=1, then the execution
follows autodecrement mode, SI and DI are decremented automatically after each iteration. If
DF=0, then the execution follows autoincrement mode. In this mode, SI and DI are incremented
automatically after each iteration.
These instructions control the functioning of the available hardware inside the processor
chip. These instructions are categorized into two types:
The Flag manipulation instructions directly modify some of the Flags of 8086. i. CLC
– Clear Carry Flag.
The Machine control instructions control the bus usage and execution
Input-Output Organization
The input-output subsystem of a computer, referred to as I/O, provides an
efficient mode of communication between the central system and the outside
environment.
Programs and data must be entered into computer memory for processing and
results obtained from computations must be recorded or displayed for the user.
Peripheral Devices
Input or output devices attached to the computer are also called peripherals.
➢ The display terminal can operate in a single-character mode where all characters
entered on the screen through the keyboard are transmitted to the computer
simultaneously
➢ . In the block mode, the edited text is first stored in a local memory inside the
terminal. The text is transferred to the computer as a block of data.
I/O operations are accomplished through external devices that provide a means of exchanging
data between external environment and computer. An external device attaches to the
computer by a link to an I/O module.
An external device linked to an I/O module is called peripheral device or peripheral. The figure
below shows attachment of external devices through I/O module.
1. Human readable: suitable for communicating with computer user. For example - video
display terminals and printers.
2. Machine readable: suitable for communicating with equipment. For example - sensor,
actuators used in robotics application.
3. Communication: suitable for communicating with remote devices. They may be
human readable device such as terminal and machine readable device such as
another computer.
1. The interface to I/O module: The interface to I/O module is in the form of
a) Control Signal – determines the function that the device will perform. E.g. send data to I/O
module (READ or INPUT), receive data from I/O module (WRITE or OUTPUT), report
status or perform some control function such as position a disk head.
b) Data Signal – send or receive the data from I/O module. c) Status Signal – it indicates the
status of signal. E.g. READY/NOT READY
1. Control Logic: associated with the device controls on specific operation as directed from
I/O module.
2. Transducer: converts the data from electrical to other form of energy during output and
from other forms of electrical during input.
3. Buffer: is associated with transducer to temporarily hold data during data transmission
from I/O module and external environment. Buffer size of 8 to 16 bits is common.
An I/O interface is required whenever the I/O device is driven by the processor. The interface
must have necessary logic to interpret the device address generated by the processor.
Handshaking should be implemented by the interface using appropriate commands (like BUSY,
READY, and WAIT), and the processor can communicate with an I/O device through the
interface.
It would not be practical for every I/O device to be wired to the computer in a different way, so
we must have a scheme where the hardware connections are fixed, and yet the communication
with the device is flexible, so that the widely varying needs of devices can all be met.
An I/O device, from the viewpoint of the CPU, is a set of registers. The CPU communicates with
and controls the I/O device by reading and writing these registers. For example, SPIM, the MIPS
simulator, uses two registers to communicate with the keyboard.
• The keyboard data register contains the ASCII code of the last key pressed.
• The keyboard control register indicates when a new key has been pressed. If bit 0 is one,
a key has been pressed since the last character was read. The keyboard controller sets this
bit when a key is pressed. It clears this bit when the keyboard data register is read.
The CPU can find out whether a new character is available by reading the keyboard control
register and testing bit 0. If bit 0 is 1, it then reads the keyboard data register to get the new key.
Accessing I/O devices at the hardware level is a lot like accessing memory. The registers in the
I/O devices are connected to the CPU using buses. We need an address bus to specify which I/O
device register is to be accessed. We need control lines to specify what kind of access is desired
(read, write, reset, etc.) Finally, we need a data bus to transfer the data between the CPU and the
device.
Each device has one or more control, status, and data registers at various I/O addresses. A
hypothetical example:
Address Register
I/O read and write operations can be more complex than memory read and write operations, but
the basic idea is the same. I/O control generally involves more than just read and write control
lines. In a sense, memory can be viewed as a very simple, fast I/O device.
Whereas memory is just a large pool of slow, inexpensive registers for storing data, each I/O
device register has a unique purpose in controlling a specific I/O device. This does not affect
how the CPU accesses them at the hardware level, but it does affect how they are used by
software.
Simple device control, such as stating whether an I/O register is to be read or written, can be
done over the control lines. More complex devices are often controlled by sending special data
blocks called Peripheral Control Blocks (PCBs) over the data lines. This is the primary method for
communicating with disk drives, for example.
Since I/O devices are of a very different nature than CPU circuits, there must be interface
hardware to connect each device to the CPU.
It consists of two data registers called ports, a control register, a status register, bus buffers
and timing and control circuits.
The four registers communicate directly with the I/O device attached to the interface.
The I/O data to and from the device can be transferred into either port A or port B.
Port A may be defined as an input port and port B may be defined as an output port.
The output device such as magnetic disk transfers data in both directions. So bidirectional
data bus is used. CPU gives control information to control register. The bits in the status
register are used for status conditions. It is also used for recording errors that may occur
during the data transfer.
The bus buffers use the bidirectional data bus to communicate with the CPU.
A timing and control circuit is used to detect the address assigned to the bus buffers.
Register
CS RS1 RS0
selected
None: data bus
0 X X in high-
impedance
1 0 0 Port A register
1 0 1 Port B register
1 1 0 Control register
1 1 1 Status register
There are basically three type of input-output interfaces. These are as:-
The processor of computer is communicate with several peripheral devices such as keyboard,
VDU, Printer, magnetic disk, magnetic tape, etc.
Each peripheral device has its own interface . Each interface communicate with i/o bus. The
communication link between processor and peripherals is shown as below:-
Each interface decode addresses and control receive from input-output bus and interpret them for
peripherals and provide signal for peripheral controller . It synchronize data flow at supervise the
transfer between peripherals and CPU. Each peripheral has its own controller.
For example:- Printer controller control the paper motion , the printing time and selection of
printing characters.
The input-output bus fro the processor is attached to all peripheral interfaces.
The input-output bus three lines:
1. Data line
2. Address line.
3. Control line.
1. Data line:-Data line of input-output bus carry the data to and from the peripherals.
1. Control line:-It contain control instructions in the form of function and input-output
command.These command control instruction are of four types:-
1.Control Command
2.Status Command
1. Control Command:-A control command is issue to activate the peripheral and to inform it
what to do.
2.Status Command:-A Status command is used to test the various status condition in the
interface and the peripheral.
3.Data output Command:-A Data output command is responsible for transfering the data from
the bus into peripherals.
3.Data output Command:-A Data output command is responsible for transfering the data from
the peripherals into input-output bus.
Two units, such as a CPU and an I/O interface, are designed independently of each other.
If the registers in the interface share a common clock with the CPU registers, the transfer
between the two units is said to be synchronous. In most cases, the internal timing in each unit is
independent from the other in that each uses its own private clock for internal registers.
In that case, the two units are said to be asynchronous to each other. This approach is widely
used in most computer systems.
Asynchronous data transfer between two independent units requires that control signals be
transmitted between the communicating units to indicate the time at which data is being
transmitted.
One way of achieving this is by means of a strobe pulse supplied by one of the units to
indicate to the other unit when the transfer has to occur.
Another method commonly used is to accompany each data item being transferred with a
control signal that indicates the presence of data in the bus. The unit receiving the data item
responds with another control signal to acknowledge receipt of the data. This type of
agreement between two independent units is referred to as handshaking.
The strobe pulse method and the handshaking method of asynchronous data transfer are not
restricted to I/O transfers. In fact, they are used extensively on numerous occasions requiring the
transfer of data between two independent units. In the general case we consider the transmitting
unit as the source and the receiving unit as the destination.
For example, the CPU is the source unit during an output or a write transfer and it is the
destination unit during an input or a read transfer. It is customary to specify the asynchronous
transfer between two independent units by means of a timing diagram that shows the timing
relationship that must exist between the control signals and the data in the buses. The sequence
of control during an asynchronous transfer depends on whether the transfer is initiated by the
source or by the destination unit.
1. Strobe control
2. Handshaking.
Strobe Control
This method of asynchronous data transfer uses a single control line to time each transfer. The
strobe may be activated by the source or the destination unit.
• The data bus carries the information from source to destination. The strobe is a
single line. The signal on this line informs the destination unit when a data word is
available in the bus.
• The strobe signal is given after a brief delay, after placing the data on the data bus. A
brief period after the strobe pulse is disabled the source stops sending the data.
• In this case the destination unit activates the strobe pulse informing the source to send
data. The source places the data on the data bus. The transmission is stopped briefly after
the strobe pulse is removed.
• The disadvantage of the strobe is that the source unit that initiates the transfer has no
way of knowing whether the destination unit has received the data or not.
• Similarly if the destination initiates the transfer it has no way of knowing whether the
source unit has placed data on the bus or not.
A Handshaking Protocol
• Three control lines
• ReadReq: indicate a read request for memory
• DataRdy: indicate the data word is now ready on the data lines
The transfer of data between two units my be done in parallel or serial. in parallel data
transmission, total message is transmitted at the same time. In serial data transmission, each bit
in the message is sent in sequence one at a time. In asynchronous transmission, binary
information is sent only when it is available and the line remains idle when there is no
information to be transmitted.
Mode of transfer are work in between CPU and peripherals. Input peripherals sends the data to
memory which is computed by CPU. The computed data is further send back to the memory and
further to output peripherals.
CPU merely execute the input-output instruction and may accept the data temporary but ultimate
source and destination is the memory location.
Data transfer between CPU and input-output devices may be handled in variety of modes. these
are:-
1. Programmed input-output.
Programmed I/O
• Programmed I/O operations are the result of I/O instructions written in computer
program. Each data item transfer is initiated by an instruction in the program. The I/O
device does not have direct access to memory. A transfer from an I/O device to memory
requires the execution of several instructions by the CPU. The data transfer can be
synchronous or asynchronous depending upon the type and the speed of the I/O devices.
• If the speeds match then synchronous data transfer is used. When there is mismatch then
asynchronous data transfer is used. The transfer is to and from a CPU register and
peripheral. Other instructions are needed to transfer the data to and from CPU and
memory. This method requires constant monitoring of the peripheral by the CPU. Once a
data transfer is initiated the CPU is required to monitor
• The interface to see when a transfer can again be made. In this method the CPU stays in
a loop till the I/O unit indicates that it is ready for data transfer. This is time consuming
process which can be solved by using interrupt.
It can be avoided by using an interrupt facility and special commands to inform the interface to
issue an interrupt request signal when the data are available from the device.
In the meantime the CPU can proceed to execute another program. The interface meanwhile
keeps monitoring the device. When the interface determines that the device is ready for data
transfer, it generates an interrupt request to the computer.
Upon detecting the external interrupt signal, the CPU momentarily stops the task it is processing,
branches to a service program to process the I/O transfer, and then returns to the task it was
originally performing.
1. Vectored interrupt
2. Non vectored interrupt
Vectored interrupt :
In vectored interrupt, the source that interrupts supplies the branch information to the computer.
This information is called the interrupt vector.
DMA Short for direct memory access, a technique for transferring data from main memory to
a device without passing it through the CPU. Computers that have DMA channels can transfer
data to and from devices much more quickly than computers without a DMA channel can. This
is useful for making quick backups and for real-time applications. Some expansion boards, such
as CD-ROM cards, are capable of accessing the computer's DMA channel. When you install the
board, you must specify which DMA channel is to be used, which sometimes involves setting a
jumper or DIP switch.
Direct Memory Access interactions
DMA controller is used to transfer the data between the memory and i/o device.
• The DMA controller needs the usual circuits to communicate with the CPU and i/o device.
• In addition to this, it needs an address register and address bus buffer.
• The address register contains an address of the desired location in memory.
• The word count register holds the number of words to be transferred. The control register
specifies the mode of transfer.
• The DMA communicates with the i/o devices through the DMA request and DMA
acknowledge line.
• The DMA communicates with the CPU through the data bus and control lines.
• The RD (Read) and WR (write) signals are bidirectional.
• When the BG (Bus Grant) signal are bidirectional.
• When the BG (Bus Grant) signal is 0, the CPU can communicate with the DMA registers
through the data bus.
• When BG is 1, the CPU has relinquished the buses. The the DMA can communicate directly
with the memory.
DMA Transfer
The connection between the DMA controller and other components in a computer system for
DMA transfer is shown in figure.
DMA transfer in a computer system
Memory Hierarchy
The memory unit is an essential component in any digital computer since it is needed
for storing programs and data. The memory unit that communicates directly with the CPU is
called the main memory. Devices that provide backup storage are called auxiliary memory.
They are used for storing system programs, large data files, and other backup information.
Only programs and data currently needed by the processor reside in main memory. All other
information is stored in auxiliary memory and transferred to main memory when needed.
A special very-high-speed memory called a cache is sometimes used to increase the speed of
processing by making current programs and data available to the CPU at a rapid rate. Fig(29)
shows the Memory Hierarchy:
Main Memory The main memory is the central storage unit in a computer system. It is a
relatively large and fast memory used to store programs and data during the computer
operation. The principal technology used for the main memory is based on semiconductor
integrated circuits. Integrated circuit RAM chips are available in two possible operating
modes:
The static RAM consists essentially of internal flip-flops that store the binary information.
The dynamic RAM stores the binary information in the form of electric charges that are applied
to capacitors.
Associative Memory
Many data-processing applications require the search of items in a table stored in
memory. An assembler program searches the symbol address table in order to extract the
symbol's binary equivalent.
A memory unit accessed by content is called an associative memory or content
addressable memory (CAM). When a word is written in an associative memory is capable of
finding an empty unused location to store the word. When a word is to be read from an
associative memory, the content of the word, or part of the word, is specified. The memory
locates all words which match the specified content and marks them for reading.
The block diagram of an associative memory is shown in Fig(30):
To illustrate with a numerical example, suppose that the argument register A and the key
register K have the bit configuration shown below. Only the three left most bits of A are
compared with memory words because K has l's in these positions.
Word 2 matches the unmasked argument field because the three leftmost bits of the argument
and the word are equal.
Cache Memory
If the active portions of the program and data are placed in a fast small memory, the
average memory access time can be reduced, thus reducing the total execution time of the
program. Such a fast small memory is referred to as a cache memory. It is placed between the
CPU and main memory.
The basic operation of the cache is as follows. When the CPU needs to access memory,
the cache is examined. If the word is found in the cache, it is read from the fast memory. If the
word addressed by the CPU is not found in the cache, the main memory is accessed to read the
word. The performance of cache memory is frequently measured in terms of a quantity called
hit ratio. When the CPU refers to memory and finds the word in cache, it is said to produce a
hit. If the word is not found in cache, it is in main memory and it counts as a miss.
Three types of mapping procedures are of practical interest when considering the
organization of cache memory:
1. Associative mapping
2. Direct mapping
3. Set-associative mapping
◼ If the active portions of the program and data are placed in a fast small memory, the
average memory access time can be reduced,
◼ The cache is the fastest component in the memory hierarchy and approaches the speed of
CPU component
◼ When CPU needs to access memory, the cache is examined
◼ If the word is found in the cache, it is read from the fast memory
◼ If the word addressed by the CPU is not found in the cache, the main memory is accessed
to read the word
◼ The performance of cache memory is frequently measured in terms of a quantity called
hit ratio
◼ When the CPU refers to memory and finds the word in cache, it is said to produce a hit
◼ Otherwise, it is a miss
◼ Hit ratio = hit / (hit+miss)
◼ The basic characteristic of cache memory is its fast access time,
◼ Therefore, very little or no time must be wasted when searching the words in the cache
◼ The transformation of data from main memory to cache memory is referred to as a
mapping process, there are three types of mapping:
◼ Associative mapping
◼ Direct mapping
◼ Set-associative mapping
◼ A CPU address of 15 bits is places in the argument register and the associative memory us
searched for a matching address
◼ If the address is found, the corresponding 12-bits data is read and sent to the CPU
◼ If not, the main memory is accessed for the word
◼ If the cache is full, an address-data pair must be displaced to make room for a pair that is
needed and not presently in the cache
Direct Mapping
◼ Associative memory is expensive compared to RAM
◼ In general case, there are 2^k words in cache memory and 2^n words in main memory (in
our case, k=9, n=15)
◼ The n bit memory address is divided into two fields: k-bits for the index and n-k bits for
the tag field
Set-Associative Mapping
◼ The disadvantage of direct mapping is that two words with the same index in their
address but with different tag values cannot reside in cache memory at the same time
◼ Set-Associative Mapping is an improvement over the direct-mapping in that each word of
cache can store two or more word of memory under the same index address
◼ In
the slide, each index address refers to two data words and their associated tags
◼ Each tag requires six bits and each data word has 12 bits, so the word length is 2*(6+12)
= 36 bits
UNIT-V
Pipelining
Pipelining is a technique of decomposing a sequential process into sub-operations; with
each sub-process being executed in a special dedicated segment that operates concurrently
with all other segments. A pipeline can be visualized as a collection of processing
segmentsthrough which binary information flows.
General Considerations
Any operation that can be decomposed into a sequence of sub-operations of about the same
complexity can be implemented by a pipeline processor. The general structure of a four-
segment pipeline is illustrated in Fig. 46. The operands pass through all four segments in a
fixed sequence.
Figure 48 shows how the instruction cycle in the CPU can be processed with a four-segment
pipeline. While an instruction is being executed in segment 4, the next instruction in sequence
is busy fetching an operand from memory in segment 3.
The four segments are represented in the flowchart:
1. FI is the segment that fetches an instruction.
2. DA is the segment that decodes the instruction and calculates the effective address.
3. FO is the segment that fetches the operand.
4. EX is the segment that executes the instruction.
A pipeline operation is said to have been stalled if one unit (stage) requires more time to
perform its function, thus forcing other stages to become idle. Consider, for example, the case
of an instruction fetch that incurs a cache miss. Assume also that a cache miss requires three
extra time units.
Instruction-Level Parallelism
Contrary to pipeline techniques, instruction-level parallelism (ILP) is based on the idea of
multiple issue processors (MIP). An MIP has multiple pipelined datapaths for instruction
execution. Each of these pipelines can issue and execute one instruction per cycle. Figure 49
shows the case of a processor having three pipes. For comparison purposes, we also show in
the same figure the sequential and the single pipeline case.
Arithmetic Pipeline
Pipeline arithmetic units are usually found in very high speed computers. They are used to
implement floating-point operations, multiplication of fixed-point numbers, and similar
computations encountered in scientific problems.
an example of a pipeline unit for floating-point addition and subtraction. The inputs to the
floating-point adder pipeline are two normalized floating-point binary numbers.
A, B are two fractions that represent the mantissas and a, b are the exponents. The sub-
operations that are performed in the four segments are:
1. Compare the exponents.
2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalize the result.
Numerical example may clarify the sub-operations performed in each segment. For simplicity,
we use decimal numbers, although Fig.49 refers to binary numbers. Consider the two
normalized floating-point numbers:
The two exponents are subtracted in the first segment to obtain (3 − 2 = 1). The largerexponent
3 is chosen as the exponent of the result. The next segment shifts the mantissa of Y to the
right to obtain:
This aligns the two mantissas under the same exponent. The addition of the two mantissas in
segment 3 produces the sum:
Suppose that the time delays of the four segments are 𝑡1 = 60𝑛𝑠, 𝑡2 = 70𝑛𝑠, 𝑡3 = 100𝑛𝑠,
𝑡4 = 80𝑛𝑠, and the interface registers have a delay of 𝑡𝑟 = 10𝑛𝑠. The clock cycle is chosen to be
𝑡𝑝 = 𝑡3 + 𝑡𝑟 = 110𝑛 . An equivalent non-pipeline floating point adder-subtractor will have a
delay time 𝑡𝑛 = 𝑡1 + 𝑡2 + 𝑡3 + 𝑡4 + 𝑡𝑟 = 320𝑛𝑠. In this case the pipelined adder has a speedup of
320/110 = 2.9 over the non-pipelined adder.
Supercomputers
Supercomputers are very powerful, high-performance machines used mostly for scientific
computations. To speed up the operation, the components are packed tightly together to
minimize the distance that the electronic signals have to travel. Supercomputers also use
special techniques for removing the heat from circuits to prevent them from burning up
because of their close proximity.
A supercomputer is a computer system best known for its high computational speed, fast and
large memory systems, and the extensive use of parallel processing.
Delayed Branch
Consider now the operation of the following four instructions:
If the three-segment pipeline proceeds: (I: Instruction fetch, A:ALU operation, and E: Execute
instruction) without interruptions, there will be a data conflict in instruction 3 because the
operand in R2 is not yet available in the A segment. This can be seen from the timing of the
pipeline shown in Fig. 50(a). The E segment in clock cycle 4 is in a process of placing the
memory data into R2. The A segment in clock cycle 4 is using the data from R2, but the value
in R2 will not be the correct value since it has not yet been transferred from memory. It is up
to the compiler to make sure that the instruction following the load instruction uses the data
fetched from memory. It was shown in Fig. 50 that a branch instruction delays the pipeline
operation by NOP instruction until the instruction at the branch address is fetched.
MULTIPROCESSORS
A multiple processor system consists of two or more processors that are connected
in a manner that allows them to share the simultaneous (parallel) execution of a given
computational task. Parallel processing has been advocated as a promising approach for
building high-performance computer systems. The organization and performance of a
multiple processor system are greatly influenced by the interconnection network used to
connect them. On the one hand, a single shared bus can be used as the interconnection
network for multiple processors.
SIMD SCHEMES
Two main SIMD configurations have been used in real-life machines. These are shown in
Figure 56.
MIMD SCHEMES
MIMD machines use a collection of processors, each having its own memory, which can
be used to collaborate on executing a given task. In general, MIMD systems can be
categorized based on their memory organization into shared-memory and message-passing
architectures.
INTERCONNECTION NETWORKS
The classification of interconnection networks is based on topology. Interconnection
networksare classified as either static or dynamic. In Figure 58, is provide such a taxonomy.