0% found this document useful (0 votes)
12 views55 pages

UNIT4 and 5 CAALP

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views55 pages

UNIT4 and 5 CAALP

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

UNIT 3

Pin Diagram and Pin description of 8086

The following pin function descriptions are for the microprocessor 8086 in either minimum
or maximum mode.

AD0 - AD15 (I/O): Address Data Bus

These lines constitute the time multiplexed memory/IO address during the first clock cycle (T1)
and data during T2, T3 and T4 clock cycles. A0 is analogous to BHE for the lower byte of the
data bus, pins D0-D7. A0 bit is Low during T1 state when a byte is to be transferred on the lower
portion of the bus in memory or I/O operations. 8-bit oriented devices tied to the lower half
would normally use A0 to condition chip select functions. These lines are active high and float to
tri-state during interrupt acknowledge and local bus "Hold acknowledge".

A19/S6, A18/S5, A17/S4, A16/S3 (0): Address/Status

During T1 state these lines are the four most significant address lines for memory operations.
During I/O operations these lines are low. During memory and I/O operations, status information
is available on these lines during T2, T3, and T4 states.S5: The status of the interrupt enable flag
bit is updated at the beginning of each cycle. The status of the flag is indicated through this bus.

S6:

When Low, it indicates that 8086 is in control of the bus. During a "Hold acknowledge" clock
period, the 8086 tri-states the S6 pin and thus allows another bus master to take control of the
status bus.

S3 & S4:

Lines are decoded as follows:


A17/S4 A16/S3 Function
0 0 Extra segment access
0 1 Stack segment access
1 0 Code segment access
1 1 Data segment access

After the first clock cycle of an instruction execution, the A17/S4 and A16/S3 pins specify which
segment register generates the segment portion of the 8086 address., This feature also provides a
degree of protection by preventing write operations to one segment from erroneously
overlapping into another segment and destroying information in that segment.

BHE /S7 (O): Bus High Enable/Status

During T1 state theBHE should be used to enable data onto the most significant half of the data
bus, pins D15 - D8. Eight-bit oriented devices tied to the upper half of the bus would normally
use BHE to control chip select functions. BHE is Low during T1 state of read, write and interrupt
acknowledge cycles when a byte is to be transferred on the high portion of the bus.
The S7 status information is available during T2, T3 and T4 states. The signal is active Low and
floats to 3-state during "hold" state. This pin is Low during T1 state for the first interrupt
acknowledge cycle.

RD (O): READ

The Read strobe indicates that the processor is performing a memory or I/O read cycle. This
signal is active low during T2 and T3 states and the Tw states of any read cycle. This signal
floats to tri-state in "hold acknowledge cycle".
TEST (I)TEST pin is examined by the "WAIT" instruction. If the TEST pin is Low, execution
continues. Otherwise the processor waits in an "idle" state. This input is synchronized internally
during each clock cycle on the leading edge of CLK.

INTR (I): Interrupt Request

It is a level triggered input which is sampled during the last clock cycle of each instruction to
determine if the processor should enter into an interrupt acknowledge operation. A subroutine is
vectored to via an interrupt vector look up table located in system memory. It can be internally
masked by software resetting the interrupt enable bit INTR is internally synchronized. This
signal is active HIGH.

NMI (I): Non-Muskable Interrupt

An edge triggered input, causes a type-2 interrupt. A subroutine is vectored to via the interrupt
vector look up table located in system memory. NMI is not maskable internally by software. A
transition from a LOW to HIGH on this pin initiates the interrupt at the end of the current
instruction. This input is internally synchronized.

RESET (I)

Reset causes the processor to immediately terminate its present activity. To be recognised, the
signal must be active high for at least four clock cycles, except after power-on which requires a
50 Micro Sec. pulse. It causes the 8086 to initialize registers DS, SS, ES, IP and flags to all
zeros. It also initializes CS to FFFF H. Upon removal of the RESET signal from the RESET pin,
the 8086 will fetch its next instruction from the 20 bit physical address FFFF0H. The reset signal
to 8086 can be generated by the 8284. (Clock generation chip). To guarantee reset from power-
up, the reset input must remain below 1.5 volts for 50 Micro sec. after Vcc has reached the
minimum supply voltage of 4.5V.

READY (I)

Ready is the acknowledgement from the addressed memory or I/O device that it will complete
the data transfer. The READY signal from memory or I/O is synchronized by the 8284 clock
generator to form READY. This signal is active HIGH. The 8086 READY input is not
synchronized. Correct operation is not guaranteed if the setup and hold times are not met.

CLK (I): Clock

Clock provides the basic timing for the processor and bus controller. It is asymmetric with 33%
duty cycle to provide optimized internal timing. Minimum frequency of 2 MHz is required, since
the design of 8086 processors incorporates dynamic cells. The maximum clock frequencies of
the 8086-4, 8086 and 8086-2 are4MHz, 5MHz and 8MHz respectively.
Since the 8086 does not have on-chip clock generation circuitry, and 8284 clock generator chip
must be connected to the 8086 clock pin. The crystal connected to 8284 must have a frequency 3
times the 8086 internal frequency. The 8284 clock generation chip is used to generate READY,
RESET and CLK.

MN/MX (I): Maximum / Minimum

This pin indicates what mode the processor is to operate in. In minimum mode, the 8086 itself
generates all bus control signals. In maximum mode the three status signals are to be decoded to
generate all the bus control signals.
Minimum Mode Pins The following 8 pins function descriptions are for the 8086 in minimum
mode; MN/ MX = 1. The corresponding 8 pins function descriptions for maximum mode is
explained later.

M/IO (O): Status line

This pin is used to distinguish a memory access or an I/O accesses. When this pin is Low, it
accesses I/O and when high it access memory. M / IO becomes valid in the T4 state preceding a
bus cycle and remains valid until the final T4 of the cycle. M/IO floats to 3 - state OFF during
local bus "hold acknowledge".

WR (O): Write

Indicates that the processor is performing a write memory or write IO cycle, depending on the
state of the M /IOsignal. WR is active for T2, T3 and Tw of any write cycle. It is active LOW,
and floats to 3-state OFF during local bus "hold acknowledge ".

INTA (O): Interrupt Acknowledge

It is used as a read strobe for interrupt acknowledge cycles. It is active LOW during T2, T3, and
T4 of each interrupt acknowledge cycle.

ALE (O): Address Latch Enable

ALE is provided by the processor to latch the address into the 8282/8283 address latch. It is an
active high pulse during T1 of any bus cycle. ALE signal is never floated.

DT/ R (O): DATA Transmit/Receive


In minimum mode, 8286/8287 transceiver is used for the data bus. DT/ R is used to control the
direction of data flow through the transceiver. This signal floats to tri-state off during local bus
"hold acknowledge".

DEN (O): Data Enable

It is provided as an output enable for the 8286/8287 in a minimum system which uses the
transceiver. DEN is active LOW during each memory and IO access. It will be low beginning
with T2 until the middle of T4, while for a write cycle, it is active from the beginning of T2 until
the middle of T4. It floats to tri-state off during local bus "hold acknowledge".

HOLD & HLDA (I/O): Hold and Hold Acknowledge

Hold indicates that another master is requesting a local bus "HOLD". To be acknowledged,
HOLD must be active HIGH. The processor receiving the "HOLD " request will issue HLDA
(HIGH) as an acknowledgement in the middle of the T1-clock cycle. Simultaneous with the issue
of HLDA, the processor will float the local bus and control lines. After "HOLD" is detected as
being Low, the processor will lower the HLDA and when the processor needs to run another
cycle, it will again drive the local bus and control lines.

Maximum Mode The following pins function descriptions are for the 8086/8088 systems in
maximum mode (i.e.. MN/MX = 0). Only the pins which are unique to maximum mode are
described below.

S2, S1, S0 (O): Status Pins

These pins are active during T4, T1 and T2 states and is returned to passive state (1,1,1 during
T3 or Tw (when ready is inactive). These are used by the 8288 bus controller to generate all
memory and I/O operation) access control signals. Any change by S2, S1, S0 during T4 is used
to indicate the beginning of a bus cycle. These status lines are encoded as shown in table 3.
S2 S1 S0 Characteristics
0 0 0 Interrupt acknowledge
0 0 1 Read I/O port
0 1 0 Write I/O port
0 1 1 Halt
1 0 0 Code access1 0 1 Read memory
1 1 0 Write memory
1 1 1 Passive State
Table 3

QS0, QS1 (O): Queue – Status

Queue Status is valid during the clock cycle after which the queue operation is performed. QS0,
QS1 provide status to allow external tracking of the internal 8086 instruction queue. The
condition of queue status is shown in table 4.

Queue status allows external devices like In-circuit Emulators or special instruction set extension
co-processors to track the CPU instruction execution. Since instructions are executed from the
8086 internal queue, the queue status is presented each CPU clock cycle and is not related to the
bus cycle activity. This mechanism allows (1) A processor to detect execution of a ESCAPE
instruction which directs the co- processor to perform a specific task and (2) An in-circuit
Emulator to trap execution of a specific memory location.
QS1 QS1 Characteristics
0 0 No operation
0 1 First byte of opcode from queue
1 0 Empty the queue
1 1 Subsequent byte from queue
Table 4

LOCK (O)

It indicates to another system bus master, not to gain control of the system bus while LOCK is
active Low. The LOCK signal is activated by the "LOCK" prefix instruction and remains active
until the completion of the instruction. This signal is active Low and floats to tri-state OFF
during 'hold acknowledge'. Example:
LOCK XCHG reg., Memory ; Register is any register and memory GT0
; is the address of the semaphore.

RQ/GT0 and RQ/GT1 (I/O): Request/Grant

These pins are used by other processors in a multi processor organization. Local bus masters of
other processors force the processor to release the local bus at the end of the processors current
bus cycle. Each pin is bi-directional and has an internal pull up resistors. Hence they may be left
un-connected.
II)Flag Registers
The 8086 microprocessor has a 16 bit register for flag register. In this register 9 bits are active
for flags. This register has 9 flags which are divided into two parts that are as follows

Conditional Flags

Conditional flags represent result of last arithmetic or logical instruction executed. Conditional
flags are as follows:

1. CF (Carry Flag)

This flag indicates an overflow condition for unsigned integer arithmetic. It is also used
in multiple-precision arithmetic.

2. AF (Auxiliary Flag)

If an operation performed in ALU generates a carry/barrow from lower


niONLINE/BBle (i.e. D0 – D3) to upper niONLINE/BBle (i.e. D4 – D7), the AF flag is
set i.e. carry given by D3 bit to D4 is AF flag. This is not a general-purpose flag; it is
used internally by the processor to perform Binary to BCD conversion.

3. PF (Parity Flag)

This flag is used to indicate the parity of result. If lower order 8-bits of the result contains
even number of 1’s, the Parity Flag is set and for odd number of 1’s, the Parity Flag is
reset.

4. ZF (Zero Flag)

It is set; if the result of arithmetic or logical operation is zero else it is reset.

5. SF (Sign Flag)

In sign magnitude format the sign of number is indicated by MSB bit. If the result of
operation is negative, sign flag is set.

6. OF (Overflow Flag)

This stands for over flow flag. It occurs when signed numbers are added or subtracted.
An OF indicates that the result has exceeded the capacity of machine. It becomes set if
the sign result cannot express within the number of bites.
Control Flags

Control flags are set or reset deliberately to control the operations of the execution unit. Control
flags are as follows:

1. TF (Trap Flag):

It is used for single step control. It allows user to execute one instruction of a program at
a time for debugging. When trap flag is set, program can be run in single step mode.

2. IF (Interrupt Flag):

It is an interrupt enable/disable flag. This stands for interrupt flag. This flag is used to
enable or disable the interrupt in a program. If it is set, the maskable interrupt of 8086 is
enabled and if it is reset, the interrupt is disabled. It can be set by executing instruction sit
and can be cleared by executing CLI instruction.

3. DF (Direction Flag):

This flag stands for direction flag and is used for the direction of strings. If it is set, string
bytes are accessed from higher memory address to lower memory address. When it is
reset, the string bytes are accessed from lower memory address to higher memory address

III)Instruction set of 8086

The 8086 instructions are categorized into the following main types.

i. Data Copy / Transfer Instructions

ii. Arithmetic and Logical Instructions

iii. Branch Instructions

iv. Loop Instructions

v. Machine Control Instructions

vi. Flag Manipulation Instructions

vii. Shift and Rotate Instructions

viii. String Instructions


Data Copy / Transfer Instructions :

MOV :

This instruction copies a word or a byte of data from some source to a destination. The
destination can be a register or a memory location. The source can be a register, a memory
location, or an immediate number.

MOV AX,BX

MOV AX,5000H

MOV AX,[SI]

MOV AX,[2000H]

MOV AX,50H[BX]

MOV [734AH],BX

MOV DS,CX

MOV CL,[357AH]

Direct loading of the segment registers with immediate data is not permitted.

PUSH : Push to Stack

This instruction pushes the contents of the specified register/memory location on to the
stack. The stack pointer is decremented by 2, after each execution of the instruction.

E.g. PUSH AX

• PUSH DS

• PUSH [5000H]
Fig. 2.2 Push Data to stack memory

POP : Pop from Sack

This instruction when executed, loads the specified register/memory location with the
contents of the memory location of which the address is formed using the current stack segment
and stack pointer.

The stack pointer is incremented by 2 Eg.


POP AX

POP DS POP
[5000H]

Fig 2.3 Popping Register Content from Stack Memory

XCHG : Exchange byte or word

This instruction exchange the contents of the specified source and destination
operands
Eg. XCHG [5000H], AX
XCHG BX, AX
XLAT :

Translate byte using look-up table

Eg. LEA BX, TABLE1

MOV AL, 04H

XLAT

Simple input and output port transfer Instructions:


IN:

Copy a byte or word from specified port to accumulator.

Eg. IN AL,03H

IN AX,DX

OUT:

Copy a byte or word from accumulator specified port.

Eg. OUT 03H, AL

OUT DX, AX

LEA :

Load effective address of operand in specified register. [reg]


offset portion of address in DS

Eg. LEA reg, offset

LDS:

Load DS register and other specified register from memory.

[reg] [mem]

[DS] [mem + 2]

Eg. LDS reg, mem

LES:

Load ES register and other specified register from memory.

[reg] [mem]

[ES] [mem + 2]

Eg. LES reg, mem

Flag transfer instructions:


LAHF:

Load (copy to) AH with the low byte the flag register. [AH]
[ Flags low byte]

Eg. LAHF

0100H AX, BX AX, [SI] AX, [5000]


[5000], 0100H

ADD AX, 0100H ADD AX, BX ADD AX, [SI] ADD AX, [5000H]
ADD [5000H], 0100H ADD 0100H
SAHF:

Store (copy) AH register to low byte of flag register. [Flags


low byte] [AH]

Eg. SAHF

PUSHF:

Copy flag register to top of stack.

[SP] [SP] – 2
[[SP]] [Flags]
Eg. PUSHF

POPF :

Copy word at top of stack to flag register.

[Flags] [[SP]]
[SP] [SP] + 2

Arithmetic Instructions:

The 8086 provides many arithmetic operations: addition, subtraction, negation, multiplication
and comparing two values.

ADD :

The add instruction adds the contents of the source operand to the destination
operand.

Eg.

ADC : Add with Carry

This instruction performs the same operation as ADD instruction, but adds the carry flag to
the result.

Eg. ADC ADC


ADC ADC
ADC

SUB : Subtract

The subtract instruction subtracts the source operand from the destination operand and the
result is left in the destination operand.

Eg. SUB AX, 0100H

SUB AX, BX SUB AX,


[5000H]

SUB [5000H], 0100H


SONLINE/BB : Subtract with Borrow

The subtract with borrow instruction subtracts the source operand and the borrow flag (CF)
which may reflect the result of the previous calculations, from the destination.

INC : Increment

This instruction increases the contents of the specified Register or memory location by 1.
Immediate data cannot be operand of this instruction.

Eg. INC AX INC [BX]


INC [5000H]

DEC : Decrement

The decrement instruction subtracts 1 from the contents of the specified register or memory
location.

Eg. DEC AX DEC


[5000H]
NEG : Negate

The negate instruction forms 2’s complement of the specified destination in the instruction.
The destination can be a register or a memory location. This instruction can be implemented by
inverting each bit and adding 1 to it.

Eg. NEG AL

AL = 0011 0101 35H Replace number in AL with its 2’s complement AL =


1100 1011 = CBH

CMP : Compare

This instruction compares the source operand, which may be a register or an immediate data
or a memory location, with a destination operand that may be a

register or a memory location Eg.


CMP BX, 0100H CMP AX, 0100H

CMP [5000H], 0100H CMP


BX, [SI]

CMP BX, CX

MUL :Unsigned Multiplication Byte or Word

This instruction multiplies an unsigned byte or word by the contents of AL.

Eg. MUL BH ; (AX) (AL) x (BH)


MUL CX ; (DX)(AX) (AX) x (CX)
MUL WORD PTR [SI] ; (DX)(AX) (AX) x ([SI])

IMUL :Signed Multiplication

This instruction multiplies a signed byte in source operand by a signed byte in AL or a signed
word in source operand by a signed word in AX.

Eg. IMUL BH IMUL


CX IMUL [SI]

CBW : Convert Signed Byte to Word

This instruction copies the sign of a byte in AL to all the bits in AH. AH is then said to be
sign extension of AL.
Eg. CBW

AX= 0000 0000 1001 1000 Convert signed byte in AL signed word in AX. Result
in AX = 1111 1111 1001 1000

CWD : Convert Signed Word to Double Word

This instruction copies the sign of a byte in AL to all the bits in AH. AH is then said to be
sign extension of AL.

Eg. CWD

Convert signed word in AX to signed double word in DX : AX DX=


1111 1111 1111 1111

Result in AX = 1111 0000 1100 0001

DIV : Unsigned division

This instruction is used to divide an unsigned word by a byte or to divide an unsigned double
word by a word.

Eg. DIV CL ; Word in AX / byte in CL


; Quotient in AL, remainder in AH
Double word in DX and AX /
DIV CX ; word
; in CX, and Quotient in AX,
; remainder in DX
AAA : ASCII Adjust After Addition

The AAA instruction is executed aftr an ADD instruction that adds two ASCII coded
operand to give a byte of result in AL. The AAA instruction converts the resulting contents of Al
to a unpacked decimal digits.

Eg. ADD CL, DL ; [CL] = 32H = ASCII for 2

; [DL] = 35H = ASCII for 5


; Result [CL] = 67H
MOV AL, CL ; Move ASCII result into AL since
; AAA adjust only [AL]
AAA ; [AL]=07, unpacked BCD for 7

AAS : ASCII Adjust AL after Subtraction


This instruction corrects the result in AL register after subtracting two unpacked ASCII
operands. The result is in unpacked decimal format. The procedure is similar to AAA instruction
except for the subtraction of 06 from AL.

AAM : ASCII Adjust after Multiplication

This instruction, after execution, converts the product available In AL into unpacked
BCD format.
Eg. MOV AL, 04 ; AL = 04
MOV BL ,09 ; BL = 09
MUL BL ; AX = AL*BL ; AX=24H
AAM ; AH = 03, AL=06

AAD : ASCII Adjust before Division

This instruction converts two unpacked BCD digits in AH and AL to the equivalent binary
number in AL. This adjustment must be made before dividing the two unpacked BCD digits in
AX by an unpacked BCD byte. In the instruction sequence, this instruction appears Before DIV
instruction.

Eg. AX 05 08

AAD result in AL 00 3A 58D = 3A H in AL

The result of AAD execution will give the hexadecimal number 3A in AL and 00

in AH. Where 3A is the hexadecimal Equivalent of 58 (decimal).

DAA : Decimal Adjust Accumulator

This instruction is used to convert the result of the addition of two packed BCD numbers to a
valid BCD number. The result has to be only in AL.

Eg. AL = 53CL = 29
ADD AL, CL ; AL (AL) + (CL)
; AL 53 + 29
; AL 7C
DAA ; AL 7C + 06 (as C>9)
; AL 82

DAS : Decimal Adjust after Subtraction

This instruction converts the result of the subtraction of two packed BCD numbers to a valid
BCD number. The subtraction has to be in AL only.

Eg. AL = 75, BH = 46
2 F = (AL) -
SUB AL, BH ; AL (BH)
; AF = 1
DAS ; AL 2 9 (as F>9, F - 6 = 9)

Logical Instructions

AND : Logical AND

This instruction bit by bit ANDs the source operand that may be an immediate register or
a memory location to the destination operand that may a register or a memory location. The
result is stored in the destination operand.

Eg. AND AX, 0008H AND


AX, BX

OR : Logical OR

This instruction bit by bit ORs the source operand that may be an immediate , register or
a memory location to the destination operand that may a register or a memory location. The
result is stored in the destination operand.

Eg. OR AX, 0008H OR


AX, BX

NOT : Logical Invert

This instruction complements the contents of an operand register or a memory location,


bit by bit.

Eg. NOT AX NOT


[5000H]

XOR : Logical Exclusive OR

This instruction bit by bit XORs the source operand that may be an immediate , register
or a memory location to the destination operand that may a register or a memory location. The
result is stored in the destination operand.
Eg. XOR AX, 0098H XOR
AX, BX

TEST : Logical Compare Instruction

The TEST instruction performs a bit by bit logical AND operation on the two operands.
The result of this ANDing operation is not available for further use, but flags
are affected.
Eg.TEST AX, BX
TEST [0500], 06H
SAL/SHL : SAL / SHL destination, count.

SAL and SHL are two mnemonics for the same instruction. This instruction shifts each
bit in the specified destination to the left and 0 is stored at LSB position. The MSB is shifted into
the carry flag. The destination can be a byte or a word.
It can be in a register or in a memory location. The number of shifts is indicated

by count.

SHR : SHR destination, count

This instruction shifts each bit in the specified destination to the right and 0 is stored at
MSB position. The LSB is shifted into the carry flag. The destination can be a byte or a word.

It can be a register or in a memory location. The number of shifts is indicated by


count.

Eg. SHR CX, 1 MOV CL,


05H SHR AX, CL

SAR : SAR destination, count

This instruction shifts each bit in the specified destination some number of bit positions
to the right. As a bit is shifted out of the MSB position, a copy of the old MSB is put in the MSB
position. The LSB will be shifted into CF.

Eg. SAR BL, 1 MOV CL,


04H SAR DX, CL

ROL Instruction : ROL destination, count

This instruction rotates all bits in a specified byte or word to the left some number of bit
positions. MSB is placed as a new LSB and a new CF.
Eg. ROL CX, 1 MOV CL,
03H ROL BL, CL

ROR Instruction : ROR destination, count

This instruction rotates all bits in a specified byte or word to the right some number of bit
positions. LSB is placed as a new MSB and a new CF.

Eg. ROR CX, 1 MOV CL,


03H ROR BL, CL
SAL CX, 1 SAL AX, CL
RCL Instruction : RCL destination, count

This instruction rotates all bits in a specified byte or word some number of bit positions
to the left along with the carry flag. MSB is placed as a new carry and previous carry is place as
new LSB.

Eg. RCL CX, 1 MOV CL,


04H RCL AL, CL

RCR Instruction : RCR destination, count

This instruction rotates all bits in a specified byte or word some number of bit positions
to the right along with the carry flag. LSB is placed as a new carry and previous carry is place as
new MSB.

Eg. RCR CX, 1 MOV CL,


04H RCR AL, CL

ROR Instruction : ROR destination, count

This instruction rotates all bits in a specified byte or word to the right some number of bit
positions. LSB is placed as a new MSB and a new CF.

Eg. ROR CX, 1 MOV CL,


03H ROR BL, CL

RCL Instruction : RCL destination, count

This instruction rotates all bits in a specified byte or word some number of bit positions
to the left along with the carry flag. MSB is placed as a new carry and previous carry is place as
new LSB.

Eg. RCL CX, 1

MOV CL, 04H

RCL AL, CL

RCR Instruction : RCR destination, count

This instruction rotates all bits in a specified byte or word some number of bit positions
to the right along with the carry flag. LSB is placed as a new carry and previous carry is place as
new MSB.
Eg. RCR CX, 1 MOV CL,
04H RCR AL, CL

Branch Instructions :

Branch Instructions transfers the flow of execution of the program to a new address
specified in the instruction directly or indirectly. When this type of instruction is executed, the
CS and IP registers get loaded with new values of CS and IP corresponding to the location to be
transferred.

The Branch Instructions are classified into two types

i. Unconditional Branch Instructions.

ii. Conditional Branch Instructions.

Unconditional Branch Instructions :

In Unconditional control transfer instructions, the execution control is transferred to the


specified location independent of any status or condition. The CS and IP are unconditionally
modified to the new CS and IP.

CALL : Unconditional Call

This instruction is used to call a Subroutine (Procedure) from a main program. Address of
procedure may be specified directly or indirectly.

There are two types of procedure depending upon whether it is available in the same
segment or in another segment.

i. Near CALL i.e., ±32K displacement.

ii. For CALL i.e., anywhere outside the segment.

On execution this instruction stores the incremented IP & CS onto the stack and loads the
CS & IP registers with segment and offset addresses of the procedure to be called.

RET: Return from the Procedure.

At the end of the procedure, the RET instruction must be executed. When it is executed, the
previously stored content of IP and CS along with Flags are retrieved into the CS, IP and Flag

registers from the stack and execution of the main program continues further.

INT N: Interrupt Type N.


In the interrupt structure of 8086, 256 interrupts are defined corresponding to the types
from 00H to FFH. When INT N instruction is executed, the type byte N is multiplied by 4 and
the contents of IP and CS of the interrupt service routine will be taken from memory block in
0000 segment.

INTO: Interrupt on Overflow

This instruction is executed, when the overflow flag OF is set. This is equivalent to a Type 4
Interrupt instruction.

JMP: Unconditional Jump

This instruction unconditionally transfers the control of execution to the specified address
using an 8-bit or 16-bit displacement. No Flags are affected by this instruction.

IRET: Return from ISR

When it is executed, the values of IP, CS and Flags are retrieved from the stack to
continue the execution of the main program.

LOOP : LOOP Unconditionally

This instruction executes the part of the program from the Label or address specified in
the instruction upto the LOOP instruction CX number of times. At each iteration, CX is
decremented automatically and JUMP IF NOT ZERO structure.

Example: MOV CX, 0004H

MOV BX, 7526H

Label 1 MOV AX, CODE

OR BX, AX

LOOP Label 1

Conditional Branch Instructions

When this instruction is executed, execution control is transferred to the address


specified relatively in the instruction, provided the condition implicit in the Opcode is satisfied.
Otherwise execution continues sequentially.

JZ/JE Label

Transfer execution control to address ‘Label’, if ZF=1.


JNZ/JNE Label

Transfer execution control to address ‘Label’, if ZF=0

JS Label

Transfer execution control to address ‘Label’, if SF=1.

JNS Label

Transfer execution control to address ‘Label’, if SF=0.

JO Label

Transfer execution control to address ‘Label’, if OF=1.

JNO Label

Transfer execution control to address ‘Label’, if OF=0.

JNP Label

Transfer execution control to address ‘Label’, if PF=0.

JP Label

Transfer execution control to address ‘Label’, if PF=1.

JB Label

Transfer execution control to address ‘Label’, if CF=1.

JNB Label

Transfer execution control to address ‘Label’, if CF=0.

JCXZ Label

Transfer execution control to address ‘Label’, if CX=0

Conditional LOOP Instructions.

LOOPZ / LOOPE Label


Loop through a sequence of instructions from label while ZF=1 and CX=0.

LOOPNZ / LOOPENE Label

Loop through a sequence of instructions from label while ZF=1 and CX=0.

String Manipulation Instructions

A series of data byte or word available in memory at consecutive locations, to be referred


as Byte String or Word String. A String of characters may be located in consecutive memory
locations, where each character may be represented by its ASCII equivalent.

The 8086 supports a set of more powerful instructions for string manipulations for
referring to a string, two parameters are required.

I. Starting and End Address of the String.

II. Length of the String.

The length of the string is usually stored as count in the CX register.The incrementing or
decrementing of the pointer, in string instructions, depends upon the Direction Flag (DF) Status.
If it is a Byte string operation, the index registers are updated
by one. On the other hand, if it is a word string operation, the index registers are updated by two.

REP : Repeat Instruction Prefix

This instruction is used as a prefix to other instructions, the instruction to which the REP
prefix is provided, is executed repeatedly until the CX register becomes zero (at each iteration
CX is automatically decremented by one).

i. REPE / REPZ- repeat operation while equal / zero.


ii. REPNE / REPNZ - repeat operation while not equal / not zero.

These are used for CMPS, SCAS instructions only, as instruction prefixes.

MOVSB / MOVSW :Move String Byte or String Word

Suppose a string of bytes stored in a set of consecutive memory locations is to be moved


to another set of destination locations.The starting byte of source string is located in the memory
location whose address may be computed using SI (Source Index) and DS (Data Segment)
contents.

The starting address of the destination locations where this string has to be relocated is
given by DI (Destination Index) and ES (Extra Segment) contents.

CMPS : Compare String Byte or String Word

The CMPS instruction can be used to compare two strings of byte or words. The length
of the string must be stored in the register CX. If both the byte or word strings are equal, zero
Flag is set.

The REP instruction Prefix is used to repeat the operation till CX (counter) becomes zero
or the condition specified by the REP Prefix is False.

SCAN : Scan String Byte or String Word

This instruction scans a string of bytes or words for an operand byte or word specified in
the register AL or AX. The String is pointed to by ES:DI register pair. The length of the string s
stored in CX. The DF controls the mode for scanning of the string. Whenever a match to the
specified operand, is found in the string, execution stops and the zero Flag is set. If no match is
found, the zero flag is reset.

LODS : Load String Byte or String Word

The LODS instruction loads the AL / AX register by the content of a string pointed to by
DS : SI register pair. The SI is modified automatically depending upon DF, If it is a byte transfer
(LODSB), the SI is modified by one and if it is a word transfer (LODSW), the SI is modified by
two. No other Flags are affected by this instruction.

STOS : Store String Byte or String Word

The STOS instruction Stores the AL / AX register contents to a location in the string
pointer by ES : DI register pair. The DI is modified accordingly, No Flags are affected by this
instruction.

The direction Flag controls the String instruction execution, The source index SI and
Destination Index DI are modified after each iteration automatically. If DF=1, then the execution
follows autodecrement mode, SI and DI are decremented automatically after each iteration. If
DF=0, then the execution follows autoincrement mode. In this mode, SI and DI are incremented
automatically after each iteration.

Flag Manipulation and a Processor Control Instructions

These instructions control the functioning of the available hardware inside the processor
chip. These instructions are categorized into two types:

1. Flag Manipulation instructions.


2. Machine Control instructions.

Flag Manipulation instructions

The Flag manipulation instructions directly modify some of the Flags of 8086. i. CLC
– Clear Carry Flag.

ii. CMC – Complement Carry Flag.

iii. STC – Set Carry Flag.


iv. CLD – Clear Direction Flag.

v. STD – Set Direction Flag.

vi. CLI – Clear Interrupt Flag.


vii. STI – Set Interrupt Flag.

Machine Control instructions

The Machine control instructions control the bus usage and execution

i. WAIT – Wait for Test input pin to go low.


ii. HLT – Halt the process.
iii. NOP – No operation.
Escape to external device like
iv. ESC – NDP
UNIT-IV

Input-Output Organization and Memory Organization

Input-Output Organization
The input-output subsystem of a computer, referred to as I/O, provides an
efficient mode of communication between the central system and the outside
environment.
Programs and data must be entered into computer memory for processing and
results obtained from computations must be recorded or displayed for the user.

Peripheral Devices
Input or output devices attached to the computer are also called peripherals.

➢ The display terminal can operate in a single-character mode where all characters
entered on the screen through the keyboard are transmitted to the computer
simultaneously
➢ . In the block mode, the edited text is first stored in a local memory inside the
terminal. The text is transferred to the computer as a block of data.

➢ Printers provide a permanent record on paper of computer output data.


➢ Magnetic tapes are used mostly for storing files of data.
➢ Magnetic disks have high-speed rotational surfaces coated with magnetic material.

Introduction about Input Output Organization:-

Input Output Organization:

I/O operations are accomplished through external devices that provide a means of exchanging
data between external environment and computer. An external device attaches to the
computer by a link to an I/O module.

An external device linked to an I/O module is called peripheral device or peripheral. The figure
below shows attachment of external devices through I/O module.

External Devices can be categorized as

1. Human readable: suitable for communicating with computer user. For example - video
display terminals and printers.
2. Machine readable: suitable for communicating with equipment. For example - sensor,
actuators used in robotics application.
3. Communication: suitable for communicating with remote devices. They may be
human readable device such as terminal and machine readable device such as
another computer.

Block diagram of external device is described below.

1. The interface to I/O module: The interface to I/O module is in the form of

a) Control Signal – determines the function that the device will perform. E.g. send data to I/O
module (READ or INPUT), receive data from I/O module (WRITE or OUTPUT), report
status or perform some control function such as position a disk head.

b) Data Signal – send or receive the data from I/O module. c) Status Signal – it indicates the
status of signal. E.g. READY/NOT READY

1. Control Logic: associated with the device controls on specific operation as directed from
I/O module.
2. Transducer: converts the data from electrical to other form of energy during output and
from other forms of electrical during input.
3. Buffer: is associated with transducer to temporarily hold data during data transmission
from I/O module and external environment. Buffer size of 8 to 16 bits is common.

Introduction about input-output interface:-

An I/O interface is required whenever the I/O device is driven by the processor. The interface
must have necessary logic to interpret the device address generated by the processor.
Handshaking should be implemented by the interface using appropriate commands (like BUSY,
READY, and WAIT), and the processor can communicate with an I/O device through the
interface.

It would not be practical for every I/O device to be wired to the computer in a different way, so
we must have a scheme where the hardware connections are fixed, and yet the communication
with the device is flexible, so that the widely varying needs of devices can all be met.

An I/O device, from the viewpoint of the CPU, is a set of registers. The CPU communicates with
and controls the I/O device by reading and writing these registers. For example, SPIM, the MIPS
simulator, uses two registers to communicate with the keyboard.

• The keyboard data register contains the ASCII code of the last key pressed.
• The keyboard control register indicates when a new key has been pressed. If bit 0 is one,
a key has been pressed since the last character was read. The keyboard controller sets this
bit when a key is pressed. It clears this bit when the keyboard data register is read.

The CPU can find out whether a new character is available by reading the keyboard control
register and testing bit 0. If bit 0 is 1, it then reads the keyboard data register to get the new key.

Accessing I/O devices at the hardware level is a lot like accessing memory. The registers in the
I/O devices are connected to the CPU using buses. We need an address bus to specify which I/O
device register is to be accessed. We need control lines to specify what kind of access is desired
(read, write, reset, etc.) Finally, we need a data bus to transfer the data between the CPU and the
device.

Each device has one or more control, status, and data registers at various I/O addresses. A
hypothetical example:

Address Register

ff00 keyboard status


ff01 keyboard data
ff02 display status
ff03 display data
ff04 disk status
ff05 disk block address
ff06 disk block size
ff07 disk data address
...

I/O read and write operations can be more complex than memory read and write operations, but
the basic idea is the same. I/O control generally involves more than just read and write control
lines. In a sense, memory can be viewed as a very simple, fast I/O device.
Whereas memory is just a large pool of slow, inexpensive registers for storing data, each I/O
device register has a unique purpose in controlling a specific I/O device. This does not affect
how the CPU accesses them at the hardware level, but it does affect how they are used by
software.

Simple device control, such as stating whether an I/O register is to be read or written, can be
done over the control lines. More complex devices are often controlled by sending special data
blocks called Peripheral Control Blocks (PCBs) over the data lines. This is the primary method for
communicating with disk drives, for example.

Since I/O devices are of a very different nature than CPU circuits, there must be interface
hardware to connect each device to the CPU.

Example of I/O Interface

An example of an I/O interface unit is shown in figure.

It consists of two data registers called ports, a control register, a status register, bus buffers
and timing and control circuits.

The four registers communicate directly with the I/O device attached to the interface.
The I/O data to and from the device can be transferred into either port A or port B.

Port A may be defined as an input port and port B may be defined as an output port.
The output device such as magnetic disk transfers data in both directions. So bidirectional
data bus is used. CPU gives control information to control register. The bits in the status
register are used for status conditions. It is also used for recording errors that may occur
during the data transfer.

The bus buffers use the bidirectional data bus to communicate with the CPU.

A timing and control circuit is used to detect the address assigned to the bus buffers.

Register
CS RS1 RS0
selected
None: data bus
0 X X in high-
impedance
1 0 0 Port A register
1 0 1 Port B register
1 1 0 Control register
1 1 1 Status register

There are basically three type of input-output interfaces. These are as:-

1. I/O bus and interface modules,.

2. I/O versus memory bus.

3. isolated versus memory-mapped I/O.

Introduction About Input-Output Bus And Interface Module:-

The processor of computer is communicate with several peripheral devices such as keyboard,
VDU, Printer, magnetic disk, magnetic tape, etc.

Each peripheral device has its own interface . Each interface communicate with i/o bus. The
communication link between processor and peripherals is shown as below:-

Each interface decode addresses and control receive from input-output bus and interpret them for
peripherals and provide signal for peripheral controller . It synchronize data flow at supervise the
transfer between peripherals and CPU. Each peripheral has its own controller.

For example:- Printer controller control the paper motion , the printing time and selection of
printing characters.

The input-output bus fro the processor is attached to all peripheral interfaces.
The input-output bus three lines:

1. Data line

2. Address line.

3. Control line.

1. Data line:-Data line of input-output bus carry the data to and from the peripherals.

1. Address line:-Address line contain the address of data and instructions.

1. Control line:-It contain control instructions in the form of function and input-output
command.These command control instruction are of four types:-

1.Control Command

2.Status Command

3.Data output Command

4. Data input Command

1. Control Command:-A control command is issue to activate the peripheral and to inform it
what to do.

2.Status Command:-A Status command is used to test the various status condition in the
interface and the peripheral.

3.Data output Command:-A Data output command is responsible for transfering the data from
the bus into peripherals.

3.Data output Command:-A Data output command is responsible for transfering the data from
the peripherals into input-output bus.

Introduction About Asynchronous Data Transfer:-

Asynchronous Data Transfer


The internal operations in a digital system are synchronized by means of clock pulses supplied
by a common pulse generator.
Clock pulses are applied to all registers within a unit and all data transfers among internal
registers occur simultaneously during the occurrence of a clock pulse.

Two units, such as a CPU and an I/O interface, are designed independently of each other.

If the registers in the interface share a common clock with the CPU registers, the transfer
between the two units is said to be synchronous. In most cases, the internal timing in each unit is
independent from the other in that each uses its own private clock for internal registers.

In that case, the two units are said to be asynchronous to each other. This approach is widely
used in most computer systems.

Asynchronous data transfer between two independent units requires that control signals be
transmitted between the communicating units to indicate the time at which data is being
transmitted.

One way of achieving this is by means of a strobe pulse supplied by one of the units to
indicate to the other unit when the transfer has to occur.

Another method commonly used is to accompany each data item being transferred with a
control signal that indicates the presence of data in the bus. The unit receiving the data item
responds with another control signal to acknowledge receipt of the data. This type of
agreement between two independent units is referred to as handshaking.

The strobe pulse method and the handshaking method of asynchronous data transfer are not
restricted to I/O transfers. In fact, they are used extensively on numerous occasions requiring the
transfer of data between two independent units. In the general case we consider the transmitting
unit as the source and the receiving unit as the destination.

For example, the CPU is the source unit during an output or a write transfer and it is the
destination unit during an input or a read transfer. It is customary to specify the asynchronous
transfer between two independent units by means of a timing diagram that shows the timing
relationship that must exist between the control signals and the data in the buses. The sequence
of control during an asynchronous transfer depends on whether the transfer is initiated by the
source or by the destination unit.

There are two types of asynchronous data transmittion methods:-

1. Strobe control

2. Handshaking.
Strobe Control
This method of asynchronous data transfer uses a single control line to time each transfer. The
strobe may be activated by the source or the destination unit.

(i) Source Initiated Data Transfer:

• The data bus carries the information from source to destination. The strobe is a
single line. The signal on this line informs the destination unit when a data word is
available in the bus.
• The strobe signal is given after a brief delay, after placing the data on the data bus. A
brief period after the strobe pulse is disabled the source stops sending the data.

Source - initiated strobe for data transfer

(ii) Destination Initiated Data Transfer:

• In this case the destination unit activates the strobe pulse informing the source to send
data. The source places the data on the data bus. The transmission is stopped briefly after
the strobe pulse is removed.

• The disadvantage of the strobe is that the source unit that initiates the transfer has no
way of knowing whether the destination unit has received the data or not.
• Similarly if the destination initiates the transfer it has no way of knowing whether the
source unit has placed data on the bus or not.

• This difficulty is solved by using hand shaking method of data transfer.

Destination - initiated strobe for data transfer

A Handshaking Protocol
• Three control lines
• ReadReq: indicate a read request for memory

Address is put on the data lines at the same time

• DataRdy: indicate the data word is now ready on the data lines

Data is put on the data lines at the same time

• Ack: acknowledge the ReadReq or the DataRdy of the other party

Asynchronous Serial Transfer

The transfer of data between two units my be done in parallel or serial. in parallel data
transmission, total message is transmitted at the same time. In serial data transmission, each bit
in the message is sent in sequence one at a time. In asynchronous transmission, binary
information is sent only when it is available and the line remains idle when there is no
information to be transmitted.

Asynchronous serial transmission

Asynchronous serial transmission is character oriented. Each character transmitter consists


of a start bit, character bits, and stop bits.
The first bit is called the start bit. It is always a 0 and is used to indicate the beginning of a
character. The last bit called the stop bit is always a 1.

Introduction About Mode of transfer:-

Mode of transfer are work in between CPU and peripherals. Input peripherals sends the data to
memory which is computed by CPU. The computed data is further send back to the memory and
further to output peripherals.

CPU merely execute the input-output instruction and may accept the data temporary but ultimate
source and destination is the memory location.

Data transfer between CPU and input-output devices may be handled in variety of modes. these
are:-

1. Programmed input-output.

2. Interrupt initiated input-output.

3. Direct Memory Access input-output.

Programmed I/O

• Programmed I/O operations are the result of I/O instructions written in computer
program. Each data item transfer is initiated by an instruction in the program. The I/O
device does not have direct access to memory. A transfer from an I/O device to memory
requires the execution of several instructions by the CPU. The data transfer can be
synchronous or asynchronous depending upon the type and the speed of the I/O devices.
• If the speeds match then synchronous data transfer is used. When there is mismatch then
asynchronous data transfer is used. The transfer is to and from a CPU register and
peripheral. Other instructions are needed to transfer the data to and from CPU and
memory. This method requires constant monitoring of the peripheral by the CPU. Once a
data transfer is initiated the CPU is required to monitor
• The interface to see when a transfer can again be made. In this method the CPU stays in
a loop till the I/O unit indicates that it is ready for data transfer. This is time consuming
process which can be solved by using interrupt.

Interrupt initiated I/O


In the programmed I/O method, the CPU stays in a program loop until the I/O unit indicates that
it is ready for data transfer. This is a time-consuming process since it keeps the processor busy
needlessly.

It can be avoided by using an interrupt facility and special commands to inform the interface to
issue an interrupt request signal when the data are available from the device.

In the meantime the CPU can proceed to execute another program. The interface meanwhile
keeps monitoring the device. When the interface determines that the device is ready for data
transfer, it generates an interrupt request to the computer.

Upon detecting the external interrupt signal, the CPU momentarily stops the task it is processing,
branches to a service program to process the I/O transfer, and then returns to the task it was
originally performing.

Example of Interrupt initiated I/O:

1. Vectored interrupt
2. Non vectored interrupt

Vectored interrupt :
In vectored interrupt, the source that interrupts supplies the branch information to the computer.
This information is called the interrupt vector.

Non vectored interrupt


In a non vectored interrupt, the branch address is assigned to a fixed location in memory.

Direct Memory Access

DMA Short for direct memory access, a technique for transferring data from main memory to
a device without passing it through the CPU. Computers that have DMA channels can transfer
data to and from devices much more quickly than computers without a DMA channel can. This
is useful for making quick backups and for real-time applications. Some expansion boards, such
as CD-ROM cards, are capable of accessing the computer's DMA channel. When you install the
board, you must specify which DMA channel is to be used, which sometimes involves setting a
jumper or DIP switch.
Direct Memory Access interactions

Direct Memory Access Controller

DMA controller is used to transfer the data between the memory and i/o device.

• The DMA controller needs the usual circuits to communicate with the CPU and i/o device.
• In addition to this, it needs an address register and address bus buffer.
• The address register contains an address of the desired location in memory.
• The word count register holds the number of words to be transferred. The control register
specifies the mode of transfer.
• The DMA communicates with the i/o devices through the DMA request and DMA
acknowledge line.
• The DMA communicates with the CPU through the data bus and control lines.
• The RD (Read) and WR (write) signals are bidirectional.
• When the BG (Bus Grant) signal are bidirectional.
• When the BG (Bus Grant) signal is 0, the CPU can communicate with the DMA registers
through the data bus.
• When BG is 1, the CPU has relinquished the buses. The the DMA can communicate directly
with the memory.

DMA Transfer

The connection between the DMA controller and other components in a computer system for
DMA transfer is shown in figure.
DMA transfer in a computer system

• The DMA request line is used to request a DMA transfer.


• The bus request (BR) signal is used by the DMA controller to request the CPU to
relinquish control of the buses.
• The CPU activates the bus grant (BG) output to inform the external DMA that its buses
are in a high-impedance state (so that they can be used in the DMA transfer.)
• The address bus is used to address the DMA controller and memory at given location
• The Device select (DS) and register select (RS) lines are activated by addressing the
DMA controller.
• The RD and WR lines are used to specify either a read (RD) or write (WR) operation on
the given memory location.
• The DMA acknowledge line is set when the system is ready to initiate data transfer.
• The data bus is used to transfer data between the I/O device and memory.
• When the last word of data in the DMA transfer is transferred, the DMA controller
informs the termination of the transfer to the CPU by means of the interrupt line.

Memory Hierarchy
The memory unit is an essential component in any digital computer since it is needed
for storing programs and data. The memory unit that communicates directly with the CPU is
called the main memory. Devices that provide backup storage are called auxiliary memory.
They are used for storing system programs, large data files, and other backup information.
Only programs and data currently needed by the processor reside in main memory. All other
information is stored in auxiliary memory and transferred to main memory when needed.
A special very-high-speed memory called a cache is sometimes used to increase the speed of
processing by making current programs and data available to the CPU at a rapid rate. Fig(29)
shows the Memory Hierarchy:

Main Memory The main memory is the central storage unit in a computer system. It is a
relatively large and fast memory used to store programs and data during the computer
operation. The principal technology used for the main memory is based on semiconductor
integrated circuits. Integrated circuit RAM chips are available in two possible operating
modes:
The static RAM consists essentially of internal flip-flops that store the binary information.
The dynamic RAM stores the binary information in the form of electric charges that are applied
to capacitors.

Associative Memory
Many data-processing applications require the search of items in a table stored in
memory. An assembler program searches the symbol address table in order to extract the
symbol's binary equivalent.
A memory unit accessed by content is called an associative memory or content
addressable memory (CAM). When a word is written in an associative memory is capable of
finding an empty unused location to store the word. When a word is to be read from an
associative memory, the content of the word, or part of the word, is specified. The memory
locates all words which match the specified content and marks them for reading.
The block diagram of an associative memory is shown in Fig(30):
To illustrate with a numerical example, suppose that the argument register A and the key
register K have the bit configuration shown below. Only the three left most bits of A are
compared with memory words because K has l's in these positions.

Word 2 matches the unmasked argument field because the three leftmost bits of the argument
and the word are equal.

Cache Memory
If the active portions of the program and data are placed in a fast small memory, the
average memory access time can be reduced, thus reducing the total execution time of the
program. Such a fast small memory is referred to as a cache memory. It is placed between the
CPU and main memory.
The basic operation of the cache is as follows. When the CPU needs to access memory,
the cache is examined. If the word is found in the cache, it is read from the fast memory. If the
word addressed by the CPU is not found in the cache, the main memory is accessed to read the
word. The performance of cache memory is frequently measured in terms of a quantity called
hit ratio. When the CPU refers to memory and finds the word in cache, it is said to produce a
hit. If the word is not found in cache, it is in main memory and it counts as a miss.
Three types of mapping procedures are of practical interest when considering the
organization of cache memory:

1. Associative mapping
2. Direct mapping
3. Set-associative mapping
◼ If the active portions of the program and data are placed in a fast small memory, the
average memory access time can be reduced,

◼ Thus reducing the total execution time of the program

◼ Such a fast small memory is referred to as cache memory

◼ The cache is the fastest component in the memory hierarchy and approaches the speed of
CPU component
◼ When CPU needs to access memory, the cache is examined
◼ If the word is found in the cache, it is read from the fast memory
◼ If the word addressed by the CPU is not found in the cache, the main memory is accessed
to read the word
◼ The performance of cache memory is frequently measured in terms of a quantity called
hit ratio
◼ When the CPU refers to memory and finds the word in cache, it is said to produce a hit
◼ Otherwise, it is a miss
◼ Hit ratio = hit / (hit+miss)
◼ The basic characteristic of cache memory is its fast access time,
◼ Therefore, very little or no time must be wasted when searching the words in the cache
◼ The transformation of data from main memory to cache memory is referred to as a
mapping process, there are three types of mapping:
◼ Associative mapping
◼ Direct mapping
◼ Set-associative mapping

To help understand the mapping procedure, we have the following example:


Associative mapping
◼ The fastest and most flexible cache organization uses an associative memory
◼ The associative memory stores both the address and data of the memory word
◼ This permits any location in cache to store ant word from main memory
◼ The address value of 15 bits is shown as a five-digit octal number and its corresponding
12-bitword is shown as a four-digit octal number

◼ A CPU address of 15 bits is places in the argument register and the associative memory us
searched for a matching address
◼ If the address is found, the corresponding 12-bits data is read and sent to the CPU
◼ If not, the main memory is accessed for the word
◼ If the cache is full, an address-data pair must be displaced to make room for a pair that is
needed and not presently in the cache
Direct Mapping
◼ Associative memory is expensive compared to RAM
◼ In general case, there are 2^k words in cache memory and 2^n words in main memory (in
our case, k=9, n=15)
◼ The n bit memory address is divided into two fields: k-bits for the index and n-k bits for
the tag field
Set-Associative Mapping
◼ The disadvantage of direct mapping is that two words with the same index in their
address but with different tag values cannot reside in cache memory at the same time
◼ Set-Associative Mapping is an improvement over the direct-mapping in that each word of
cache can store two or more word of memory under the same index address
◼ In
the slide, each index address refers to two data words and their associated tags

◼ Each tag requires six bits and each data word has 12 bits, so the word length is 2*(6+12)
= 36 bits
UNIT-V

Pipeline and Vector Processing and Multi Processors

Pipelining
Pipelining is a technique of decomposing a sequential process into sub-operations; with
each sub-process being executed in a special dedicated segment that operates concurrently
with all other segments. A pipeline can be visualized as a collection of processing
segmentsthrough which binary information flows.
General Considerations
Any operation that can be decomposed into a sequence of sub-operations of about the same
complexity can be implemented by a pipeline processor. The general structure of a four-
segment pipeline is illustrated in Fig. 46. The operands pass through all four segments in a
fixed sequence.

The space-time diagram of a four-segment pipeline is demonstrated in Fig47.

The speedup(S) of a pipeline processing over an equivalent non-pipeline processing is


𝑛𝑡𝑛
defined by the ratio: 𝑆=
(𝑘+𝑛−1)𝑡𝑝
As the number of tasks increases, n becomes much larger than 𝑘 − 1, and 𝑘 + 𝑛 − 1
approaches the value of n. Under this condition, the speedup becomes:
𝑡𝑛
𝑆=
𝑡𝑝
numerical example: Let the time it takes to process a sub-operation in each segment be
equal to 𝑡𝑝= 20 ns. Assume that the pipeline has 𝑘 = 4 segments and executes 𝑛 = 100
tasks in sequence. The pipeline system will take
(𝑘 + 𝑛 − 1)𝑡𝑝 = (4 + 99) × 20 = 2060𝑛𝑠
to complete. Assuming that t = ktp = 4 x 20 =
80 ns,a non-pipeline system requires:
𝑛𝑘𝑡𝑝 = 100 × 80 = 8000𝑛𝑠
to complete the 100 tasks. The speedup ratio is equal to:
8000⁄
2060 = 3.88
Instruction Pipeline
The computer needs to process each instruction with the following sequence of steps:
1. Fetch the instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.

Figure 48 shows how the instruction cycle in the CPU can be processed with a four-segment
pipeline. While an instruction is being executed in segment 4, the next instruction in sequence
is busy fetching an operand from memory in segment 3.
The four segments are represented in the flowchart:
1. FI is the segment that fetches an instruction.
2. DA is the segment that decodes the instruction and calculates the effective address.
3. FO is the segment that fetches the operand.
4. EX is the segment that executes the instruction.
A pipeline operation is said to have been stalled if one unit (stage) requires more time to
perform its function, thus forcing other stages to become idle. Consider, for example, the case
of an instruction fetch that incurs a cache miss. Assume also that a cache miss requires three
extra time units.

Instruction-Level Parallelism
Contrary to pipeline techniques, instruction-level parallelism (ILP) is based on the idea of
multiple issue processors (MIP). An MIP has multiple pipelined datapaths for instruction
execution. Each of these pipelines can issue and execute one instruction per cycle. Figure 49
shows the case of a processor having three pipes. For comparison purposes, we also show in
the same figure the sequential and the single pipeline case.
Arithmetic Pipeline
Pipeline arithmetic units are usually found in very high speed computers. They are used to
implement floating-point operations, multiplication of fixed-point numbers, and similar
computations encountered in scientific problems.
an example of a pipeline unit for floating-point addition and subtraction. The inputs to the
floating-point adder pipeline are two normalized floating-point binary numbers.

A, B are two fractions that represent the mantissas and a, b are the exponents. The sub-
operations that are performed in the four segments are:
1. Compare the exponents.
2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalize the result.
Numerical example may clarify the sub-operations performed in each segment. For simplicity,
we use decimal numbers, although Fig.49 refers to binary numbers. Consider the two
normalized floating-point numbers:

The two exponents are subtracted in the first segment to obtain (3 − 2 = 1). The largerexponent
3 is chosen as the exponent of the result. The next segment shifts the mantissa of Y to the
right to obtain:

This aligns the two mantissas under the same exponent. The addition of the two mantissas in
segment 3 produces the sum:
Suppose that the time delays of the four segments are 𝑡1 = 60𝑛𝑠, 𝑡2 = 70𝑛𝑠, 𝑡3 = 100𝑛𝑠,
𝑡4 = 80𝑛𝑠, and the interface registers have a delay of 𝑡𝑟 = 10𝑛𝑠. The clock cycle is chosen to be
𝑡𝑝 = 𝑡3 + 𝑡𝑟 = 110𝑛 . An equivalent non-pipeline floating point adder-subtractor will have a
delay time 𝑡𝑛 = 𝑡1 + 𝑡2 + 𝑡3 + 𝑡4 + 𝑡𝑟 = 320𝑛𝑠. In this case the pipelined adder has a speedup of
320/110 = 2.9 over the non-pipelined adder.
Supercomputers
Supercomputers are very powerful, high-performance machines used mostly for scientific
computations. To speed up the operation, the components are packed tightly together to
minimize the distance that the electronic signals have to travel. Supercomputers also use
special techniques for removing the heat from circuits to prevent them from burning up
because of their close proximity.
A supercomputer is a computer system best known for its high computational speed, fast and
large memory systems, and the extensive use of parallel processing.
Delayed Branch
Consider now the operation of the following four instructions:

If the three-segment pipeline proceeds: (I: Instruction fetch, A:ALU operation, and E: Execute
instruction) without interruptions, there will be a data conflict in instruction 3 because the
operand in R2 is not yet available in the A segment. This can be seen from the timing of the
pipeline shown in Fig. 50(a). The E segment in clock cycle 4 is in a process of placing the
memory data into R2. The A segment in clock cycle 4 is using the data from R2, but the value
in R2 will not be the correct value since it has not yet been transferred from memory. It is up
to the compiler to make sure that the instruction following the load instruction uses the data
fetched from memory. It was shown in Fig. 50 that a branch instruction delays the pipeline
operation by NOP instruction until the instruction at the branch address is fetched.
MULTIPROCESSORS
A multiple processor system consists of two or more processors that are connected
in a manner that allows them to share the simultaneous (parallel) execution of a given
computational task. Parallel processing has been advocated as a promising approach for
building high-performance computer systems. The organization and performance of a
multiple processor system are greatly influenced by the interconnection network used to
connect them. On the one hand, a single shared bus can be used as the interconnection
network for multiple processors.

CLASSIFICATION OF COMPUTER ARCHITECTURES

The instruction stream is defined as the sequence of instructions performed by the


computer. The data stream is defined as the data traffic exchanged between the memory
and the processing unit. This leads to four distinct categories of computer architectures:

1. Single-instruction single-data streams (SISD)


2. Single-instruction multiple-data streams (SIMD)
3. Multiple-instruction single-data streams (MISD)
4. Multiple-instruction multiple-data streams (MIMD)

SIMD SCHEMES
Two main SIMD configurations have been used in real-life machines. These are shown in
Figure 56.

MIMD SCHEMES
MIMD machines use a collection of processors, each having its own memory, which can
be used to collaborate on executing a given task. In general, MIMD systems can be
categorized based on their memory organization into shared-memory and message-passing
architectures.

INTERCONNECTION NETWORKS
The classification of interconnection networks is based on topology. Interconnection
networksare classified as either static or dynamic. In Figure 58, is provide such a taxonomy.

You might also like