0% found this document useful (0 votes)
21 views

DDCO Module 4 Notes Final

Uploaded by

adarshadarshk145
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

DDCO Module 4 Notes Final

Uploaded by

adarshadarshk145
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Digital Design and Computer Organization

Module -4
Chapter 4
Input-Output Organization

Presented By
Dr. Maya B S
Assistant Professor
Department of CS&E
Bangalore Institute of Technology
Bangalore
❖Accessing I/O Devices
❖Interrupts – Interrupt Hardware
❖ Enabling and Disabling Interrupts
❖ Handling Multiple Devices
❖Direct Memory Access: Bus Arbitration
❖ Speed, size and Cost of memory systems
❖ Cache Memories – Mapping Functions
Text book 2: 4.1, 4.2.1, 4.2.2, 4.2.3, 4.4, 5.4, 5.5.1

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 2


4.1 Accessing I/O Devices

❖The components of a computer system communicate with each other


through an interconnection network, as shown in Figure 3.1.
❖The interconnection network consists of circuits needed to transfer
information between the processor, the memory unit, and a number of I/O
devices.
❖we described the concept of an address space and how the processor may
access individual memory locations within such an address space.
❖ Load and Store instructions use addressing modes to generate effective
addresses that identify the desired locations.
❖ This idea of using addresses to access various locations in the memory
can be extended to deal with the I/O devices as well.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 3


20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 4
❖For this purpose, each I/O device must appear to the processor as consisting of some addressable
locations, just like the memory.
❖ Some addresses in the address space of the processor are assigned to these I/O locations, rather
than to the main memory.
❖ These locations are usually implemented as bit storage circuits (flip-flops) organized in the form of
registers as I/O registers.
❖Since the I/O devices and the memory share the same address space, this arrangement is called
memory-mapped I/O. It is used in most computers.
❖With memory-mapped I/O, any machine instruction that can access memory can be used to transfer
data to or from an I/O device.
❖For example, if DATAIN is the address of a register in an input device, the instruction
Load R2, DATAIN → reads the data from the DATAIN register and loads them into processor
register R2.
❖Similarly, the instruction Store R2, DATAOUT →sends the contents of register R2 to location
DATAOUT, which is a register in an output device.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 5


4.1.1 I/O Device Interface

❖An I/O device is connected to the interconnection network by using a circuit, called the device
interface, which provides the means for data transfer and for the exchange of status and control
information needed to facilitate the data transfers and govern the operation of the device.
❖The interface includes some registers that can be accessed by the processor.
❖One register may serve as a buffer for data transfers, another may hold information about the
current status of the device, and yet another may store the information that controls the operational
behavior of the device.
❖These data, status, and control registers are accessed by program instructions as if they were
memory locations.
❖Typical transfers of information are between I/O registers and the registers in the processor.
❖Figure 3.2 illustrates how the keyboard and display devices are connected to the processor from the
software point of view.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 6


20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 7
4.1.2 Program-Controlled I/O
❖Let us begin the discussion of input/output issues by looking at two essential I/O
devices for human-computer interaction—keyboard and display.
❖Consider a task that reads characters typed on a keyboard, stores these data in the
memory, and displays the same characters on a display screen.
❖A simple way of implementing this task is to write a program that performs all
functions needed to realize the desired action.
❖ This method is known as program-controlled I/O.
❖In addition to transferring each character from the keyboard into the memory, and
then to the display, it is necessary to ensure that this happens at the right time.
❖ An input character must be read in response to a key being pressed.
❖ For output, a character must be sent to the display only when the display device is
able to accept it.
❖The rate of data transfer from the keyboard to a computer is limited by the typing
speed of the user, which is unlikely to exceed a few characters per second.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 8


❖The rate of output transfers from the computer to the display is much higher.
❖It is determined by the rate at which characters can be transmitted to and displayed on the display
device, typically several thousand characters per second.
❖This is still much slower than the speed of a processor that can execute billions of instructions per
second.
❖The difference in speed between the processor and I/O devices creates the need for mechanisms to
synchronize the transfer of data between them.
❖One solution to this problem involves a signaling protocol.
❖On output, the processor sends the first character and then waits for a signal from the display that the
next character can be sent.
❖It then sends the second character, and so on. An input character is obtained from the keyboard in a
similar way.
❖The processor waits for a signal indicating that a key has been pressed and that a binary code that
represents the corresponding character is available in an I/O register associated with the keyboard.
❖Then the processor proceeds to read that code.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 9


• For an input device such as keyboard, a status flag SIN is included in the interface circuit as part of
the status register.
• This flag is set to 1 when a character is entered at the keyboard and cleared to 0 when this character
is read by the processor.
• Similarly output operations can be controlled by output status flag SOUT.
• Figure 4.2 illustrates the hardware required to connect an I/O device to the bus.
Example:
❖Let us consider an example of I/O operations involving a keyboard and a display device in a
computer system.
❖The four registers shown below are used for data transfer operations.
❖STATUS register contains two control flags, SIN and SOUT which provide status
information for the keyboard and display unit respectively.
❖ Two flags KIRQ and DIRQ in STATUS register and KEN and DEN in CONTROL register
are used for interrupts.
❖Data from keyboard are made available in the DATAIN register, and the data sent to the display
are stored in DATAOUT register.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 10


20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 11
20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 12
• The program below reads a line of characters from keyboard and stores it in a memory buffer
starting a location LINE.
• ✓ Subroutine PROCESS is used to process the input line.
• ✓ As each character is read, it is echoed back to the display.
• ✓ Register R0 is used as a pointer to the memory buffer area.
• ✓ R0 is updated using Autoincrement addressing mode so that successive characters are stored in
successive memory locations.
• ✓ Each character is checked whether it is the Carriage Return (CR) character which has the ASCII
code 0D (hex).
• ✓ If it is, a Line Feed character (ASCII code 0A) is sent to move the cursor one line down on the
display and subroutine PROCESS is called.
• ✓ Otherwise, the program waits for another character from the keyboard.
• ✓ This example illustrates program controlled I/O, in which processor repeatedly checks status
flag to achieve synchronization between processor and I/O device.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 13


20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 14
4.2 INTERRUPTS
• Sometimes the processor time will be wasted while waiting for an I/O device to become ready.
• This time can be eliminated by I/O device by sending a hardware signal called an
• interrupt to the processor. A bus control line called interrupt signal line is dedicated for this purpose.
Example:
• Consider a task that requires some computations to be performed and the results to be printed on a line
printer.
• Let the program contain two routines COMPUTE( to produce a set of n lines of output) and PRINT(
to print the output).
• The required task may be performed by repeatedly executing first the COMPUTE routine and then
the PRINT routine.
• The printer accepts only one line of text at a time.
• The disadvantage of this process is that the processor spends considerable amount of time waiting for
the printer to get ready. So COMPUTE and PRINT routines must be overlapped.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 15


1. COMPUTE routine is executed to produce the first n lines of output. Then PRINT routine is
executed to send the first line of text to the printer.

2. Instead of waiting for the line to be printed, the PRINT routine may be temporarily suspended and
execution of COMPUTE routine continued.

3. Whenever the printer becomes ready, it alerts the processor by sending an interrupt request signal.

4. The processor interrupts the execution of the COMPUTE routine and transfers control to the
PRINT routine.

5. The PRINT routine sends the second line to the printer and is again suspended.

6. The interrupted COMPUTE routine resumes execution at the point of interruption.

7. This process continues until all n lines are printed and the PRINT routine ends.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 16


• In the above example PRINT routine is the interrupt service routine
• Assume that an interrupt request arrives during execution of instruction i in the figure below.
• The processor first completes execution of instruction i, then it loads the PC with the address of the first
instruction of interrupt service routine.
• To return back to the instruction i+1, the contents of PC are saved in processor stack.
• After the completion of interrupt service routine, the contents of processor stack are reloaded back into
PC.
• However, the processor must inform the device that its request has been recognized so that it may
remove its interrupt request signal.
• This can be accomplished by using an interrupt acknowledge signal.
• The status or data register in the device interface may also inform that its interrupt request has been
recognized.
• Before starting the execution of ISR, any information that may be altered during the execution of that
routine must be saved.
• The information that needs to be saved include condition code flags and the contents of any registers
used by both the interrupted program and interrupt service routine.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 17


Difference between Subroutine and Interrupt Service Routine

❖The task of storing and restoring information can be done automatically by the processor. But this
increases the total execution time.
❖Saving registers also increases the delay between the time an interrupt request is received and the start
of execution of the interrupt service routine. This delay is called interrupt latency.
❖This latency should be kept minimum. So the processor saves only the contents of program counter and
processor status register.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 18


• Duplicate sets of processor registers can be used by ISR. This eliminates the need to save and
restore registers.
• Real Time Processing is processing of certain routines which are accurately timed relative to
external events.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 19


4.2.1 Interrupt Hardware:

• The figure below shows that a single interrupt line can serve n devices.
• To request an interrupt, the device closes its associated switch.
• If all interrupt request signals are inactive (all switches are open), the voltage on the interrupt request
line will be equal to Vdd. This is inactive state of the line.
• When a device requests an interrupt (switch is closed), the voltage on the line drops to 0 causing the
interrupt request signal INTR received by the processor togo to 1.
• The value of INTR is the logical OR of the requests from individual devices.
• INTR = INTR1 + ........................... + INTRn
• In the electronic implementation of the circuit in fig. 4.6, special gates known as open collector (for
bipolar circuits) or open drain (for MOS circuits) are used to drive the INTR line.
• The resistor R is called a pull up resistor because it pulls the line voltage up to the high voltage state
when the switches are open.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 20


20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 21
4.2.2 Enabling and Disabling Interrupts
• Sometimes the interrupts may alter the sequence of events envisaged by the programmer.
• In some cases, the ISR may change some of the data used by the instruction in question.
• So enabling and disabling of interrupts must be available.
• A simplest way is to provide machine instructions such as Interrupt-enable and Interrupt-
disable.
• When a device activates the interrupt request signal, it will be active during the execution of ISR.
• This active request should not lead to successive interruptions causing the system to enter an
infinite loop from which it cannot recover.
• There are three mechanisms to solve this problem.
• The first possibility is to have the processor hardware ignore the interrupt request line until the
execution of the first instruction of the ISR has been completed. The programmer can ensure that
no further interruptions will occur by using Interrupt- disable instruction as the first instruction of
the ISR. The Interrupt-enable instruction is the last instruction of the ISR before the Return from
interrupt instruction.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 22


• In second option (for processor with only one interrupt request line), the processor can automatically
disable interrupts before starting the execution of the ISR.
• One bit of program status register (PS), called interrupt enable indicates whether interrupts are
enabled.
• After saving the contents of PS on stack, the processor clears the interrupt enable bit in its PS register
thus disabling further interrupts.
• When the Return- from-interrupts instruction is executed, the contents of PS are restored from the stack
setting the Interrupt-enable bit back to 1.
• In the third option, the processor has a special interrupt request line for which the interrupt handling
circuit responds only to the leading edge of the signal.
• Such a line is said to be edge triggered. In this case, the processor will receive only one request. Hence no
danger of multiple interruptions.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 23


• Following are the sequence of events involved in handling an interrupt request from a single device.
➢ The device raises an interrupt request.
➢ The processor interrupts the program currently being executed.
➢ Interrupts are disabled by changing the control bits in the PS.
➢ The device is informed that its request has been recognized, and in response, it deactivates the
interrupt request signal.
➢ The action requested by the interrupt is performed by the ISR.
➢ Interrupts are enabled and execution of the interrupted program is resumed.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 24


4.2.3 Handling Multiple Devices

• There is a possibility that many devices may interrupt at the same time. Some of the methods of
overcoming this situation is
✓ One method of overcoming this situation is polling scheme.
✓ Normally when a device raises an interrupt request, the IRQ bit in its status register is set to 1.
✓ In polling scheme, the ISR polls all the I/O devices connected to the bus.
✓ The first device encountered with its IRQ bit set is the device that should be served.
✓ Its main disadvantage is the time spent interrogating the IRQ bits of all the devices that may not be
requesting any service.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 25


Vectored Interrupts

❖The device requesting an interrupt can identify itself by sending a special code to the processor over
the bus.
❖The code supplied by the device may represent the starting address of the ISR for that device. The code
length is in the range of 4-8 bits.
❖This arrangement implies that the ISR for the given device must always start at the same location.
However, the location pointed to by the interrupting device can be used to store the starting address of
ISR.
❖The processor reads this address called interrupt vector and loads it into the PC. Sometimes the
processor may not be ready to receive the interrupt vector code immediately.
❖So the interrupting device must wait to put the data on the bus only when the processor is ready to
receive it.
❖When the processor is ready to receive the interrupt vector code, it activates the interrupt acknowledge
line INTA.
❖The I/O device responds by sending its interrupt vector code and turning OFF the INTR signal.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 26


Interrupt Nesting:
❖There might be long delay in responding to an interrupt signal which might lead to erroneous
operation.
❖For example, a computer that keeps track of the time of day using a real time clock. This is a
device that sends interrupt requests to the processor at regular intervals.
❖For each of these requests the processor executes a short ISR to increment a set of counters in the
memory that keep track of time in seconds, minutes and so on.
❖This interrupt request from the clock must be accepted even during the execution of the ISR of
another device.
❖Hence I/O devices must be organized in the priority structure.
➢ In multiple level priority organization the priority level is assigned to the processor.
➢ The priority level of the processor is the priority of the program that is currently being executed.
➢ The processor accepts interrupts only from devices that have priorities higher than its own.
➢ This action disables interrupts from devices at the same level of priority or lower.
➢ However interrupt requests from higher priority devices will continue to be accepted.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 27


• A few bits of processor status word (PSW) denotes processor's priority.
• It can be changed by program instructions that write into the PS.
• These are privileged instructions which can be executed only when processor is executing OS routine.
• The user program cannot change the priority.
• An attempt to execute privileged instructions while in user mode leads to a special type of interrupt
called privilege exception.
➢ However separate Interrupt Request and Interrupt acknowledge lines can be used for each device.
➢ Each of the interrupt request lines is assigned a different priority level.
➢ Interrupt requests received over these lines are sent to a priority arbitration circuit in the processor.
➢ A request is accepted only if it has a higher priority level than that currently assigned to the processor.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 28


Simultaneous Requests:
• There is a problem when simultaneous interrupt requests arrive.
• The requests with highest priority can be accepted but if several devices share one interrupt request line
then there is a problem.
• A Daisy Chain method of connecting the devices can solve this problem to some extent.
• In this the Interrupt Request line is common to all the devices.
• The Interrupt acknowledgement line INTA signal propagates serially through devices shown in figure
8a.
• When several devices raise an interrupt request INTR line is activated, the processor responds by
setting the INTA line to 1.
• This signal is received by device1. Device1 passes the signal to device2 only if it does not require any
service.
• If device1 has a pending request, it blocks INTA signal and proceeds to put its identifying code on the
data lines.
• In this arrangement the device that is electrically closest to the processor has the highest priority and so
on.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 29


20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 30
❖The scheme in Figure 4.8a requires considerably fewer wires than the individual
connections in Figure 4.7.
❖ The main advantage of the scheme in Figure 4.7 is that it allows the processor to
accept interrupt requests from some devices but not from others, depending upon
their priorities.
❖The two schemes may be combined to produce the more general structure in
Figure 4.8b.
❖ Devices are organized in groups, and each group is connected at a different
priority level.
❖Within a group, devices are connected in a daisy chain. This organization is used
in many computer systems.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 31


4.4 DIRECT MEMORY ACCESS (V.V.V.IMP)

Direct Memory Access (DMA) is an approach in which large blocks of data retransferred at high speed
directly between an external device and the main memory without continuous intervention by the processor.
• DMA Controller - It is a control circuit that is a part of I/O device to perform DMA transfer.
• For each word transferred, it provides memory address and all the bus signals that control data transfer.
• DMA controller must even increment the memory address for successive words and keep track of the
number of transfers.
• Although DMA controller can transfer data without intervention by the processor, its operation must be
under the control of a program executed by the processor.
• The processor sends the starting address, the number of words in the block and the direction of the transfer.
• On receiving this information the DMA controller performs the requested operation.
• When the entire block has been transferred, the controller informs the processor by raising an interrupt
signal.
• After the DMA transfer is completed, the processor can return to the program that requested the transfer.
The OS is responsible for suspending that requested the transfer in Blocked State, initiating DMA operation
and starting another program.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 32


The figure 4.18 below shows an example of the DMA controller register.
❖Two registers are used for storing the starting address and the word count.
❖The third register contains status and control flags.
❖The R/W bit determines the direction of transfer.
❖When this bit is set to 1, the controller transfers data from memory to the I/O
device (read operation).
❖Otherwise it performs write operation.
❖When the controller has completed transferring a block of data, it sets the Done
flag to 1.
❖Bit 30 is the interrupt enable flag, IE, which causes the controller to raise an
interrupt after it has completed transfer of a block of data (when set to 1).
❖The controller sets IRQ bit to 1 when it has requested interrupt.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 33


20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 34
Burst Mode :
The transfer of block of data between Main memory and I/O device by DMA controller without
interruption is called burst transfer mode
Cycle Stealing :
❖This is similar to burst transfer mode, but instead of data being transferred all at once, it is
transferred one byte at a time.
❖The DMA controller, after transferring one byte of data, releases control of the system buses, lets the
CPU process an instruction and then requests access to the bus by sending the bus request signal
through the control bus and then transfers another byte of data.
❖ This keeps going on until all the data has been transferred. The transfer rate is slower but it prevents
the CPU from staying idle for a long period of time.
❖The figure 4.19 below shows the use of DMA controller in a computer system. Let a block of data
has to be transferred from main memory to one of the disks.
❖A program writes the address and word count information into the registers of the corresponding
channel of the DMA controller.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 35


20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 36
4.4.1 BUS ARBITRATION (V. V.IMP)

❖The device that is allowed to initiate data transfers on the bus at any given time is
called the bus master.
❖When the current master relinquishes control of the bus, another device can acquire
this status.
❖Bus arbitration is the process by which the next device to become the bus master is
selected and bus mastership is transferred to it.
❖The selection of the bus master must take into account the needs of various devices by
establishing a priority system for gaining access to the bus.
❖There are two approaches to bus arbitration: Centralized and Distributed.
❖In centralized arbitration, a single bus arbiter performs the required arbitration.
❖In distributed arbitration, all devices participate in the selection of the next bus master.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 37


Centralized Arbitration
❖The bus arbiter may be the processor or a separate unit connected to the bus.
❖Figure 4.20 illustrates a basic arrangement in which the processor contains the bus
arbitration circuitry.
❖In this case, the processor is normally the bus master unless it grants bus
mastership to one of the DMA controllers.
❖ A DMA controller indicates that it needs to become the bus master by activating
the Bus-Request line, BR.
❖ The signal on the Bus-Request line is the logical OR of the bus requests from all
the devices connected to it.
❖When Bus-Request is activated, the processor activates the Bus-Grant signal,
BG1, indicating to the DMA controllers that they may use the bus when it
becomes free.
❖ This signal is connected to all DMA controllers using a daisy-chain
arrangement.
20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 38
❖Thus, if DMA controller 1 is requesting the bus, it blocks the propagation of the
grant signal to other devices.
❖Otherwise, it passes the grant downstream by asserting BG2.
❖ The current bus master indicates to all devices that it is using the bus by
activating another open-collector line called Bus- Busy, BBSY.
❖ Hence, after receiving the Bus-Grant signal, a DMA controller waits for Bus-
Busy to become inactive, then assumes mastership of the bus.
❖At this time, it activates Bus-Busy to prevent other devices from using the bus at
the same time.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 39


20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 40
❖The timing diagram in Figure 4.21 shows the sequence of events for the devices
in Figure 4.20 as DMA controller 2 requests and acquires bus mastership and later
releases the bus.
❖During its tenure as the bus master, it may perform one or more data transfer
operations, depending on whether it is operating in the cycle stealing or block
mode.
❖After it releases the bus, the processor resumes bus mastership.
❖This figure shows the causal relationships among the signals involved in the
arbitration process. Details of timing, which vary significantly from one computer
bus to another, are not shown.
❖Figure 4.20 shows one bus-request line and one bus-grant line forming a daisy
chain. Several such pairs may be provided, in an arrangement similar to that used
for multiple interrupt requests in Figure 4.8b.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 41


20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 42
❖This arrangement leads to considerable flexibility in determining the order in
which requests from different devices are serviced.
❖The arbiter circuit ensures that only one request is granted at any given time,
according to a predefined priority scheme.
❖ For example, if there are four bus request lines, BR1 through BR4, a fixed
priority scheme may be used in which BR1 is given top priority and BR4 is given
lowest priority.
❖Alternatively, a rotating priority scheme may be used to give all devices an equal
chance of being serviced.
❖Rotating priority means that after a request on line BRI is granted, the priority
order becomes 2, 3, 4, 1.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 43


Distributed Arbitration
• Distributed arbitration means that all devices waiting to use the bus have equal responsibility in
carrying out the arbitration process, without using a central arbiter.
• A simple method for distributed arbitration is illustrated in Figure 4.22.
• Each device on the bus is assigned a 4-bit identification number.
• When one or more devices request the bus, they assert the Start-Arbitration signal and place their
4-bit ID numbers on four open-collector lines, ARBO through ARB3.
• A winner is selected as a result of the interaction among the signals transmitted over these lines by
all contenders.
• The net outcome is that the code on the four lines represents the request that has the highest ID
number.
• The drivers are of the open-collector type. Hence, if the input to one driver is equal to one and the
input to another driver connected to the same bus line is equal to 0 the bus will be in the low-
voltage state.
• In other words, the connection performs an OR function in which logic 1 wins.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 44


20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 45
❖Assume that two devices, A and B, having ID numbers 5 and 6, respectively, are
requesting the use of the bus.
❖Device A transmits the pattern 0101, and device B transmits the pattern 0110.
❖The code seen by both devices is 0111.
❖Each device compares the pattern on the arbitration lines to its own ID, starting
from the most significant bit.
❖If it detects a difference at any bit position, it disables its drivers at that bit position
and for all lower-order bits. It does so by placing a 0 at the input of these drivers.
❖In the case of our example, device A detects a difference on line ARB1. Hence, it
disables its drivers on lines ARB1 and ARBO.
❖This causes the pattern on the arbitration lines to change to 0110, which means that
B has won the contention.

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 46


❖Note that, since the code on the priority lines is 0111 for a short period, device B
may temporarily disable its driver on line ARB0.
❖However, it will enable this driver again once it sees a 0 on line ARBI resulting from
the action by device A.
❖Decentralized arbitration has the advantage of offering higher reliability, because
operation of the bus is not dependent on any single device.
❖Many schemes have been proposed and used in practice to implement distributed
arbitration.

21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 47


5.4 SPEED, SIZE, AND COST
❖We have already stated that an ideal memory would be fast, large, and
inexpensive.
❖ It is clear that a very fast memory can be implemented if SRAM chip are used.
❖ But these chips are expensive because their basic cells have six transistors,
which precledes packing a very large number of cells onto a single chip .
❖Thus, for cost reasons, it is impractical to build a large memory using SRAM
chips.
❖ The alternative is to use Dynamic RAM chips, which have much simpler basic
cells and thus are much less expensive. But such memories are significantly
slower.
❖Dynamic memory units in the range of hundreds of megabytes can be
implemented at a reasonable cost, the affordable size is still small compared to the
demands of large programs with voluminous data.

21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 48


❖A solution is provided by using secondary storage, mainly magnetic disks, to
implement large memory spaces.
❖Very large disks are available at a reasonable price, and they are used extensively in
computer systems.
❖They are much slower than the semiconductor memory units.
❖So we conclude the following: A huge amount of cost-effective storage can be
provided by magnetic disks. A large, yet affordable, main memory can be built
with dynamic RAM Technology.
❖This leaves SRAMs to be used in smaller units where speed is of the essence, such
as in cache memories .
❖All of these different types of memory units are employed effectively in a computer.
❖The entire computer memory can be viewed as the hierarchy depicted in Figure
5.13.

21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 49


21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 50
❖The fastest access is to data held in processor registers. Therefore, if we consider the
registers to be part of the memory hierarchy, then the processor registers are at the top
in terms of the speed of access.
❖At the next level of the hierarchy is a relatively small amount of memory that can be
implemented directly on the processor chip. This memory, called a processor cache,
holds copies of instructions and data stored in a much larger memory that is provided
externally.
❖There are often two levels of caches. A primary cache is always located on the
processor chip. This cache is small because it competes for space on the processor
chip, which must implement many other functions. The primary cache is referred to as
level 1 (L1) cache.
❖A larger, secondary cache is placed between the primary cache and the rest of the
memory. It is referred to as level 2 (L2) cache. It is usually implemented using SRAM
chips.
❖It is possible not to have a cache on the processor chip at all. Also, it is possible to
have both L1 and 12 caches on the processor chip.
21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 51
❖The next level in the hierarchy is called the main memory. This rather large
memory is implemented using dynamic memory components, typically in the form
of SIMMs, DIMMs, or RIMMs.
❖The main memory is much larger but significantly slower than the cache memory.
❖ In a typical computer, the access time for the main memory is about ten times
longer than the access time for the LI cache.
❖Disk devices provide a huge amount of inexpensive storage.
❖They are very slow compared to the semiconductor devices used to implement the
main memory.
❖During program execution, the speed of memory access is of utmost importance.
❖ The key to managing the operation of the hierarchical memory system in Figure
5.13 is to bring the instructions and data that will be used in the near future as close
to the processor as possible.

21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 52


5.5.1 Cache Memory
• The speed of the main memory is very low in comparison with the speed of
modern processors.
• For good performance, the processor cannot spend much of its time waiting to
access instructions and data in main memory.
• Hence, it is important to devise a scheme that reduces the time needed to access
the necessary information.
• Since the speed of the main memory unit is limited by electronic and packaging
constraints, the solution must be sought in a different architectural arrangement.
An efficient solution is to use a fast cache memory which essentially makes the
main memory appear to the processor to be faster than it really is.

21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 53


❖The effectiveness of the cache mechanism is based on a property of computer
programs called locality of reference.
❖Analysis of programs shows that most of their execution time is spent on routines
in which many instructions are executed repeatedly. These instructions may
constitute a simple loop, nested loops, or a few procedures that repeatedly call
each other.
❖The actual detailed pattern of instruction sequencing is not important the point is
that many instructions in localized areas of the program are executed repeatedly
during some time period, and the remainder of the program is accessed relatively
infrequently. This is referred to as locality of reference.
❖It manifests itself in two ways: temporal and spatial.
❖The first means that a recently executed instruction is likely to be executed again
very soon.
❖The spatial aspect means that instructions in close proximity to a recently
executed instruction are also likely to be executed soon.
21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 54
❖We will use the term block to refer to a set of contiguous address locatives of some size.
❖Another term that is often used to refer to a cache block is cache line.
❖Consider the simple arrangement in Figure 5.14.

❖When a Read request is received from the processor, the contents of a block of memory words
containing the location specified are transferred into the cache one word at a time.
❖ Subsequently, when the program references any of the locations in this block, the desired contents are
read directly from the cache.
❖Usually, the cache memory can store a reasonable number of blocks at any given time, but this number
is small compared to the total number of block in the main memory.

21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 55


❖The correspondence between the main memory blocks and those in the cache is
specified by a mapping function.
❖ When the cache is full and a memory word (instruction or data) that is not in the
cache is referenced, the cache control hardware must decide which block should
be removed to create space for the new block that contains the referenced word.
❖ The collection of rules for making this decision constitutes the replacement
algorithm.
❖The processor does not need to know explicitly about the existence of the cache.
❖ It simply issues Read and Write requests using addresses that refer to locations
in the memory.
❖The cache control circuitry determines whether the requested word currently
exists in the cache. If it does, the Read or Write operation is performed on the
appropriate cache location

21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 56


❖When the addressed word in a Read operation is not in the cache, a read miss
occurs.
❖The block of words that contains the requested word is copied from the main
memory into the cache.
❖After the entire block is loaded into the cache, the particular word requested is
forwarded to the processor.
❖Alternatively, this word may be sent to the processor as soon as it is read from the
main memory.
❖The latter approach, which is called load through or early restart, reduces the
processor's waiting period somewhat, but at the expense of more complex
circuitry.

21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 57


❖During a Write operation, if the addressed word is not in the cache, a write miss
occurs.
❖Then, if the write-through protocol is used, the information is written directly into
the main memory.
❖In the case of the write-back protocol, the block containing the addressed word is
first brought into the cache, and then the desired word in the cache is overwritten
with the new information.

21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 58


5.5.1 Mapping function(V.V.IMP)
❖To discuss possible methods for specifying where memory blocks are placed in
the cache, we use a specific small example.
❖Consider a cache consisting of 128 blocks of 16 words each, for a total of 2048
(2K) words, and assume that the main memory is addressable by a 16-bit address.
❖ The main memory has 64K words, which we will view as 4K blocks of 16 words
each.
❖For simplicity, we will assume that consecutive addresses refer to consecutive
words.

21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 59


Direct Mapping

❖The simplest way to determine cache locations in which to store memory blocks is
the direct mapping technique.
❖In this technique, block j of the main memory maps onto block j modulo 128 of
the cache, as depicted in Figure 3.15.
❖Thus, whenever one of the main memory blocks 0, 128, 256, is loaded in the
cache, it is stored in cache block 0.
❖Block 1,129, 257, are stored in cache block 1, and so on. Since more that one
memory block is mapped onto a given cache block position, even when cache is
not full.
❖For example, instructions of program may start in block-1 and continue in block
129, possibly after a branch.

21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 60


❖As this program is executed, both of these blocks must be transferred to the block-
1 position in the cache.
❖Contention is resolved by allowing the new block to overwrite the currently
resident block, let this case, the replacement algorithm is trivial Placement of a
hack in the cache is determined from the memory address.
❖The memory address can be divided into three fields, as shown in Figure 5.15.

21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 61


21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 62
❖The low-order 4 bits select one of 16 words in a block.
❖ When a new block enters the cache, the 7-bit cache block field determines the
cache position in which this block must be stored.
❖The high-order 5 bits of the memory address of the block are stored in 5 tag bits
associated with its location in the cache.
❖They identify which of the 32 blocks that are mapped into this cache position are
commonly resident in the cache.
❖ As execution proceeds the 7-bit cache block field of each address generated by
processor points to a particular block location in the cache.
❖The high-order 5 bits of the address are compared with the tag bits associated with
that cache location.
❖If they match, then the desired word is in that block of the cache.
❖If there is no match, then the block containing the required word must first to read
from main memory and loaded into the cache.
❖The direct-mapping technique is easy to implement, but it is not very flexible.
21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 63
Associative Mapping
❖Figure 5.16 shows a much more flexible mapping method, in which a main memory
block can be placed into any cache block position.
❖ let this case, 12 tag bits are required to identify a memory block when it is resident
in the cache.
❖The tag bits of an address received from the processor are compared to the tag bits
of each block of the cache in see if the desired block is present.
❖This is called the associative-mapping technique.
❖This gives complete freedom in choosing the cache location in which to place
memory the block.
❖Thus, the space in the cache can be used more efficiently.
❖ A new block that has to be brought into the cache has to replace (eject) an existing
block only if the cache is full.
❖ In this case, we need an algorithm to select the block to be replaced. Many
replacement algorithms are possible.
21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 64
• The cost of an associative cache is higher than the cost of a direct mapped cache
because of the need to search all 128 tag patterns to determine whether a given
block is in the cache.
• A search of this kind is called an associative search. For performance reasons,
the tags must be searched in parallel.

21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 65


Set Associative Mapping

❖A combination of the direct and associative mapping techniques can be used.


❖ Blocks of the cache are grouped into sets, and the mapping allows a block of the
main memory in reside in any block of a specific set.
❖ Hence, the contention problem of the direct method is eased by having a few
choices for block placement.
❖ At the same time, the hardware cost is reduced by decreasing the size of the
associative search.
❖An example of this set associative mapping technique is shown in Figure 5.17 .
❖for a cache with two blocks per set.
❖In this case, memory blocks 0, 64, 128, 4032 map into cache set 0, and they can
occupy either of the two block positions within this set.

21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 66


❖Having 64 sets means that the 6-bit set field of the address determines which set
of the cache might contain the desired block.
❖The tag field of the address must then be associatively compared to the tags of the
two blocks of the set to check if the desired block is present.
❖This two-way associative search is simple to implement.
❖The number of blocks per set is a parameter that can be selected to suit the
requirements of a particular computer.
❖ For the main memory and cache sizes in Figure 5.17, four blocks per set can be
accommodated by a 5-bit set field, eight blocks per set by a 4-bit set field, and so
on.
❖The extreme condition of 128 blocks per set requires no set bits and corresponds
to the fully associative technique, with 12 tag bits.
❖The other extreme of one block per set is the direct-mapping method. A cache that
has k blocks per set is referred to as a k-way set-associative cache.
21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 67
21-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 68
THANK YOU

20-02-2024 Dr. Maya B S, Assistant Professor, Department of CSE 69

You might also like