0% found this document useful (0 votes)

15 views55 pages

CA Unit - 3, 4 and 5 QR

Uploaded by

Zia Rocker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views55 pages

CA Unit - 3, 4 and 5 QR

Uploaded by

Zia Rocker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

UNIY III

Part A
1. Identify the type of MIPS instruction set?
A) R Type instruction or Arithmetic Logical Instructions
B) Load and Store Instruction or Memory reference instructions
C) Branch Instruction
D) All of the above
Answer D.
2. Identify the data path elements in MIPS core architecture.
A) Instruction memory and Register File
B) ALU
C) Data Memory
D) All of the above
Answer D
3. What is pipelining?
A) It is a technique to achieve thread level parallelism
B) It is a technique in which multiple threads are overlapped in execution.
C) It is an implementation technique in which multiple instructions are overlapped in
execution.
D) All of above
Answer : C
4. Consider the non-pipelined machine with 6 execution stage of lengths
50ns, 50ns,60ns,60ns,50ns and 50ns. Find the instruction latency on this machine.
A) 320ns
B) 280ns
C) 100ns
D) 260 ns
Answer : A

5. Consider the pipelined machine with 4 execution stage of lengths

60ns, 50ns,90ns and 80ns with latch delay 10ns. Find the pipeline cycle time on this
machine?
A) 320ns
B) 280ns
C) 100ns
D) 260 ns
Answer : C
6. List out stages for 5-stage pipeline
A) IF, ID ,WB,EX,MEM
B) IF, ID ,EX ,MEM ,WB
C) IF,ID,MEM,WB,EX
D) IF,MEM,ID,EX,WB
Answer : B
7. Identify the mechanism for handling control hazards
A) Branch Prediction
B) Delayed branch
C) Stalling or Multiple Streams
D) All of the above
Answer D
8. Consider the following code sequence
add $2 $1 $3
sub $4 $2 $5
Identify the type of hazard appear on pipeline
A) Data Hazard
B) Control Hazard
C) Structural Hazard
D) None of the above
Answer : A

9. Consider the following code sequence,

36: sub $10, $4, $8
40: beq $1, $3, 7
…..
72.lw $4, 50($7)
Identify the type of hazard appear on pipeline

E) Data Hazard
F) Control Hazard
G) Structural Hazard
H) None of the above
Answer : B
10. What is the use of exception program counter?
A) A 32-bit register used to hold the address of the affected instruction.
B) A register used to record the cause of the exception.
C) A register to specify the location of control transfer
D) None of the above

Answer : A
Part B
1. How to build a data path between core MIPS components. Explain with neat block
diagram.

A data path element is a unit used to operate on or hold data within a processor.
The major components required to execute each class of MIPS instructions are
1. Program counter (PC) : is a register that holds the address of the current instruction
2. Instruction memory : Memory to store instructions
3. Register File : A register file is a collection of registers in which any register can be read
or written by specifying the number of the register in the fi le. Th e register fi le contains
the register state of the computer
4. ALU – to operate data
5. Data memory – to store data

Building A MIPS datapath consists of data path for

1. Fetching the instruction and incrementing the PC

2. Executing an arithmetic and logic instructions
3. Executing a memory-reference instruction
4. Executing a branch instruction

1. Data Path for fetching instruction and PC

To execute any instruction, we must start by fetching the instruction from memory. To prepare
for executing the next instruction, we must also increment the program counter so that it points
at the next instruction, 4 bytes later. This is shown in following figure.

PC=PC+4 for PC to store address of next instruction

2. Datapath For Executing Arithmetic and Logic Instructions

 The arithmetic-logical instructions use the ALU, with the inputs coming from the two
registers.
 The two elements needed to implement R-format ALU operations are the register fi le
and the ALU

 The register file always outputs the contents of the registers corresponding to the Read
register inputs on the outputs.
 The operation to be performed by the ALU is controlled with the ALU operation signal,
which will be 4 bits wide.
 The inputs carrying the register number to the register fi le are all 5 bits wide, whereas
the lines carrying data values are 32 bits wide

3. Data Path for Executing a memory-reference instruction

 The memory instructions can also use the ALU to do the address calculation, although
the second input is the sign extended 16-bit off set field from the instruction.
 The value stored into a destination register comes from the ALU (for an R-type
instruction) or the memory (for a load)
 Thus, one multiplexor is placed at the ALU input and another at the data input to the
register file.

4. Data Path for Executing a branch instruction

 The datapath for a branch uses the ALU to evaluate the branch condition and a separate
adder to compute the branch target as the sum of the incremented PC and the sign-
extended, lower 16 bits of the instruction (the branch displacement), shifted left 2 bits.
 The branch instruction uses the main ALU for comparison of the register operands
 An additional multiplexor is required to select either the sequentially following
instruction address (PC + 4) or the branch target address to be written into the PC.

The final combined diagram for simple data path for MIPS core architecture is
Example 1: Operation of a Datapath for an R-Type
add $t1,$t2,$t3
 The instruction is fetched, and the PC is incremented.
 Two registers, $t2 and $t3, are read from the register file, and the main control unit
computes the setting of the control lines during this step .
 The ALU operates on the data read from the register file, using the function code to
generate the ALU function.
 The result from the ALU is written into the register file using the destination register
($t1)
Example 2: Operation of a Datapath for an LOAD/STORE
lw $t1, offset($t2) (or) sw $t1, offset($t2)
 An instruction is fetched from the instruction memory, and the PC is incremented.
 A register ($t2) value is read from the register file.
 The ALU computes the sum of the value read from the register file and the sign-
extended, lower 16 bits of the instruction (offset).
 The sum from the ALU is used as the address for the data memory.
 The data from the memory unit is written into the register file; the register destination is
given in the instruction ($t1).
Example 3: Operation of a Datapath for an BRANCH
beq $t1,$t2,offset
 The instruction is fetched from the instruction memory, and the PC is incremented.
 Two registers, $t1 and $t2, are read from the register file.
 The ALU performs a subtraction on the data values read from the register file.
 The value of PC + 4 is added to the sign-extended, lower 16 bits of the instruction
(offset) shifted left by two; the result is the branch target address.
 The Zero result from the ALU is used to decide which adder result into the PC.
------------------------------------------------------------------------------------------------------------
2. Explain basic operation of a five-stage pipelining with a neat diagram.

Pipelining
 Pipelining is an implementation technique in which multiple instructions are
overlapped in execution
 Exploits instruction level parallelism
Five stage Pipelining : MIPS instructions classically take five steps:
1. IF: Instruction fetch
- Fetch instruction from memory.
2. ID: Instruction decode and register file read
- Read registers while decoding the instruction. The regular format
of MIPS instructions allows reading and decoding to occur simultaneously.
3. EX: Execution or address calculation
- Execute the operation or calculate an address.
4. MEM: Data memory access
- Access an operand in data memory.
5.WB : Write back
- Write the result into a register
Example:
Consider the sequence of instruction
lw $1, 100($0)
lw $2, 200($0)
lw $3, 300($0)

The pipelined execution of instructions is

It takes 1400 ps. If non- pipelined execution takes place then it takes 2400 ps
Advantages and Disadvantages:
 Pipelining increases the number of simultaneously executing instructions and the rate at
which instructions are started and completed.
 Pipelining does not reduce the time it takes to complete an individual instruction, also
called the latency
 pipelining improves instruction throughput rather than individual instruction execution
time or latency.
------------------------------------------------------------------------------------------------------------
3. Explain the different types of pipeline hazards with suitable examples.
Pipeline Hazard : There are situations in pipelining when the next instruction cannot execute in
the following clock cycle.
 “Any condition that causes a pipeline to stall(delay) is called a hazard.
Types of Hazard :
They are categorized in to three types:
 Structural hazard
 Data hazard
 Instruction (Control) hazard
1. Structural hazard
 These hazards are because of conflicts due to insufficient resources.
 When a planned instruction cannot execute in the proper clock cycle because the
hardware does not support the combination of instructions that are set to execute

Example :
1.Use of ALU
2.Memory Access
Solution : Use Additional hardware ( multiple core , Divide memory into instruction memory
and data memory)
2. Data Hazards
 Data hazards occur when the pipeline must be stalled because one step must wait for
another to complete.
 Instruction needs data from the result of a previous instruction still executing in
pipeline
 Example Consider the sequence of instructions
add $s0, $t0, $t1
sub $t2, $s0, $t3

sub $t2, $s0, $t3 – wait for completion of WB in previous instruction

Solution :
 Hardware solution – Data forwarding
 Software Solution – reordering of code
3.Control Hazards
 The third type of hazard is called a control hazard, arising from the need to make a
decision based on the results of one instruction while others are executing.
 Also called branch hazard.
 When the proper instruction cannot execute in the proper pipeline clock cycle because
the instruction that was fetched is not the one that is needed; that is, the flow of
instruction addresses is not what the pipeline expected.
Example :

IF $1 equal $2 then the

instruc on fetched in
44,48,52 not needed

Solution :
1. Stall the pipeline
2. Branch Prediction
3. Delayed Branch
----------------------------------------------------------------------------------------------------------
4. What is data hazards? Explain any two mechanism for handling data hazards with suitable
example.
Data Hazards
 Data hazards occur when the pipeline must be stalled because one step must wait for
another to complete.
 Also called pipelined data hazard.
 Instruction needs data from the result of a previous instruction still executing in
pipeline
 Example Consider the sequence of instructions
add $s0, $t0, $t1
sub $t2, $s0, $t3
sub $t2, $s0, $t3 – wait for completion of WB in previous instruction
Solution :
 Hardware solution – Data forwarding
 Software Solution – Reordering of code
i) Data Forwarding:
 Also called bypassing.
 A method of resolving a data hazard by retrieving the missing data element from internal
buffers rather than waiting for it to arrive from programmer visible registers or memory.
 It use forwarding unit (Adding extra hardware) to retrieve the missing item early from the
internal resources is called forwarding or bypassing.

Sometimes forwarding with stalls required.

ii) Reordering of Code

 Find the hazards in the preceding code segment and reorder the instructions to
avoid any pipeline stalls
Example:
------------------------------------------------------------------------------------------------------------------
5. Explain the hazards caused by unconditional branching statements and illustrate the
mechanism of handling such control hazards.
Control Hazards
 The third type of hazard is called a control hazard, arising from the need to make a
decision based on the results of one instruction while others are executing.
 Also called branch hazard.
 When the proper instruction cannot execute in the proper pipeline clock cycle because
the instruction that was fetched is not the one that is needed; that is, the flow of
instruction addresses is not what the pipeline expected.
Example :

IF $1 equal $2 then the

instruc on fetched in
44,48,52 not needed

Solution :

1. Stall the pipeline - Just operate sequentially until the first batch is dry
2. Branch Prediction - One simple approach is to predict always that branches will be
untaken. when branches are taken does the pipeline stall.

3. Delayed Branch
Here, executes the sequentially next statement with the branch executing after one
instruction delay – compiler’s job to find a statement that can be put in the slot that is
independent of branch outcome

Example :
6. Explain in detail how exceptions are handled in MIPS architecture?
Exception – is an error that occur during program execution.
The two types of exceptions that occur in MIPS structure
1.Undefined instruction
Example : XXX $12 $13 - Here XXX is undefined
2.Arithmetic overflow
Example : add $1, $2, $1
Exception handling in Processor :
The basic action that the processor must perform when an exception occurs is
 to save the address of the offending instruction in the exception program counter (EPC)
 Transfer control to the operating system at some specified address.
The operating system can then take the appropriate action, which may involve providing some
service to the user program, taking some predefined action in exception also called interrupt or
stopping the execution of the program and reporting an error.
Vectored Interrupt :
An interrupt for which the address to which control is transferred is determined by the cause of
the exception
Example :

Few registers used by processor are

 EPC: A 32-bit register used to hold the address of the affected instruction. (Such a
register is needed even when exceptions are vectored.)
 Cause: A register used to record the cause of the exception.
The pipelined data path with controls to handle exceptions:
1. The key additions include a new input with the value 8000 0180hex in the multiplexor
that supplies the new PC value
2. Cause register to record the cause of the exception; and an Exception PC register to save
the address of the instruction that caused the exception.
3. The 8000 0180hex input to the multiplexor is the initial address to begin fetching
instructions in the event of an exception overflow. The ALU overflow signal is an input
to the control unit.
4. Flush signal is paused to pipeline stages to undone the action carried out by the affected
instruction.
For example, an undefined instruction is discovered in the ID stage, and invoking the
operating system occurs in the EX stage. Exceptions are collected in the Cause register in a
pending exception field so that the hardware can interrupt based on later exceptions, once
the earliest one has been serviced. This is shown below

Part C
1. Build a simple data path with control unit and explain the process of designing
control unit with necessary signals.
Build a simple data path with control unit
Building A MIPS datapath consists of data path for

1. Fetching the instruction and incrementing the PC

2. Executing an arithmetic and logic instructions
3. Executing a memory-reference instruction
4. Executing a branch instruction

Designing a MIPS ALU

MIPS ALU defines the 6 following combinations of four control inputs
ALUOp indicates whether the operation to be performed should be add (00) for loads and stores,
subtract (01) for beq, or determined by the operation encoded in the function field (10). The
output of the ALU control unit is a 4-bit signal that directly controls the ALU by generating one
of the 4-bit combinations shown previously.

This truth table shows how the 4-bit ALU control is set depending on these two input fields
MIPS support three types of instruction

Using this information, instruction labels and extra multiplexor is added.

The control signals are
The data path and its control unit is shown below
The input to the control unit is the 6-bit opcode fi eld from the instruction. The outputs of the
control unit consist of three 1-bit signals that are used to control multiplexors (RegDst, ALUSrc,
and MemtoReg), three signals for controlling reads and writes in the register file and data
memory (RegWrite, MemRead, and MemWrite), a 1-bit signal used in determining whether to
possibly branch (Branch), and a 2-bit control signal for the ALU (ALUOp). An AND gate is used
to combine the branch control signal and the Zero output from the ALU; the AND gate output
controls the selection of the next PC.

2. What are pipeline Hazards? Explain the different types of hazards with example
and give solutions to the hazards.

Pipeline Hazard : There are situations in pipelining when the next instruction cannot execute in
the following clock cycle.
 “Any condition that causes a pipeline to stall(delay) is called a hazard.
Types of Hazard :
They are categorized in to three types:
 Structural hazard
 Data hazard
 Instruction (Control) hazard
2. Structural hazard
 These hazards are because of conflicts due to insufficient resources.
 When a planned instruction cannot execute in the proper clock cycle because the
hardware does not support the combination of instructions that are set to execute

sub $t2, $s0, $t3 – wait for completion of WB in previous instruction

Solution :
4. Stall the pipeline
5. Branch Prediction
6. Delayed Branch
UNIT IV MCQS
1. Parallelism is called as
a) Instruction level parallelism
b) Data level Parallelism
c) Task level parallelism
d) Bit level Parallelism
Answer: a) Instruction Parallelism

2. Which of these is NOT involved in the case of a memory write operation?

a) Data bus
b) b. MDR
c) c. MAR
d) d. PC
Answer: (d) PC

3. What is the minimum time delay present between the initiations of two separate,
independent memory operations known as?
a) Cycle Time
b) Latency Time
c) Access Time
d) Transfer Rate
Answer: (a) Cycle time

4. 50. The sequence of events that take place in the computer when it is interpreting and
executing an instruction is called
a) executing cycle
b) instruction cycle
c) machine cycle
d) decoding cycle
Answer: (a) executing cycle

5.A mechanism by which the instructions streams are divided into several smallersteams and
executed in parallel is called
a) threading
b) hardware multithreading
c) instruction streaming
d) data streaming
Answer:(b)hardware multi threading

6. An instruction cycle consists of

a) fetching, and decoding
b) decoding, and executing
c) fetching, decoding, executing, and storing
d) fetching, executing, and storing
Answer: (c), The instruction cycle (also known as machine cycle) represents the sequence of
events that takes place when instruction is read from memory and executed.

Prepared By Dr V. Uma Rani, Associate Professor/ CSE

7. A CPU register that keeps the track of execution of the program and contains the
instructions currently being executed is called
a) Index register
b) Memory address register
c) Instruction register
d) Stack pointer
Answer: (c), Instruction register holds an instruction until it is decoded.
8. These determine the functions to be performed by the processor and its interaction with
memory.
A. Operation Performed
B. Operands used
C. Execution sequencing
D. None of them
Answer: A) operation performed
9. The stages of the pipeline are an instruction—————— and an —————— that
executes the instruction
A. fetch
B. execute/memory
C. both a and b
D. none of them
Answer: C) both a and b
10. The interconnection between ALU & Registers is collectively known as:
A. Information path
B. Data path
C. Process route
D. Information trail
Answer :B

Unit IV
PART B

1. Explain in detail the Hardware multithreading.

Hardware multithreading
Hardware multithreading allows multiple threads to share the functional units of a single
processor in an overlapping fashion.
To permit this sharing, the processor must duplicate the independent state of each thread. For
example, each thread would have a separate copy of the register file and the program counter.
The instruction stream is divided into several smaller streams, known as threads, such that
the threads can be executed in parallel.
TYPES:
There are three main approaches to hardware multithreading.
o Fine-grained multithreading / Interleaved Multithreading
o Coarse-grained multithreading / Blocking Multithreading
o Simultaneous multithreading (SMT)
FINE-GRAINED MULTITHREADING
• The processor switches between threads on each instruction, resulting in interleaved

Prepared By Dr V. Uma Rani, Associate Professor/ CSE

execution of multiple threads.
• Also called interleaving
• This interleaving is often done in a round-robin fashion, skipping any threads that are
stalled at that clock cycle.

Example : Consider the four threads and its sequential execution, Fine grained multithreading
solution

Advantages
• it can hide the throughput losses that arise from both short and long stalls, since
instructions from other threads can be executed when one thread stalls.
• the interleaving of threads mostly eliminates idle clock cycles
[Note: stall is a delay in execution of an instruction]

Disadvantages

• It slows down the execution of the individual threads, since a thread that is ready to
execute without stalls will be delayed by instructions from other threads.
• It has idle slots within some clock cycles.

Coarse-grained multithreading

• Coarse-grained multithreading switches threads only on costly stalls, such as last-level

cache misses. ( ie Only switches threads that are stalled waiting for a time consuming
operation to complete).
• Also called blocking. Here a switch is made to another thread when current thread causes
a stall, a second thread is scheduled and so on.
• Example : Sequential execution of thread and Coarse grained multithreading

Prepared By Dr V. Uma Rani, Associate Professor/ CSE

Advantages:
• Coarse-grained multithreading is much more useful for reducing the penalty of high-cost
stalls. where pipeline refill is negligible compared to the stall time.
• It reduces the number of completely idle clock cycles, the pipeline start-up overhead still
leads to idle cycles

Disadvantages

• It is limited in its ability to overcome throughput losses, especially from shorter stalls.
• All issue slots will not be used.
Simultaneous multithreading (SMT)

• A variation of hardware multithreading.

• Allows multiple threads to execute simultaneously.
• Relies on dynamic multiple Issue and Register renaming.
• Schedule instructions from multiple threads
• Instructions from independent threads execute when function units are available
• Within threads, dependencies handled by scheduling and register renaming
• Used in Intel Pentium-4 HT
• SMT relies on dynamic multiple issue mechanisms, it does not switch resources every
cycle.
SMT = ILP + TLP
• Instead, SMT is always executing instructions from multiple threads, leaving it up to the
hardware to associate instruction slots and renamed registers with their proper threads.
•
• [Note ILP achieved by replicate the internal components of the computer called as
multiple issue.]
• [Note : Register Renaming – all WAW and WAR hazards are avoided]

Example : Given four threads execute running on a processor sequential and using SMT is
shown below

Prepared By Dr V. Uma Rani, Associate Professor/ CSE

Advantages:

• In the SMT case, thread-level parallelism and instruction-level parallelism are both
exploited, with multiple threads using the issue slots in a single clock cycle.

Disadvantages

• Ideally, the issue slot usage is limited by imbalances in the resource needs and resource
availability over multiple threads

---------------------------------------------------------------------------------------------------------------------

2. Compare and contrast Fine grained Multi-Threading and Coarse grained Multi-
Threading.

Fine grained Multi-Threading Coarse grained Multi-Threading.

The processor switches between threads Coarse-grained multithreading switches
on each instruction, resulting in threads only on costly stalls, such as last-
interleaved execution of multiple threads. level cache misses.
It is also called interleaving It is also called blocking
Example : There are four threads , every Let four threads are running, first thread
clock cycle chooses one thread at a time. run by processor, if they need costly stall
then thread 2 run by processor during the
time. This is illustrated here.

Prepared By Dr V. Uma Rani, Associate Professor/ CSE

Black color -Thread 1
Blue color – Thread 2
Black color -Thread 1
Blue color – Thread 2
Grey color – Thread 3
Light Blue – Thread 4
it can hide the throughput losses that Coarse-grained multithreading is much
arise from both short and long stalls more useful for reducing the penalty of
high-cost stalls
It slows down the execution of the It doesn’t slow down the execution of the
individual threads. individual threads.
since a thread ready to execute without Hard to overcome throughput losses from
stalls will be delayed by instructions shorter stalls, due to pipeline start-up costs
from other threads
Example :Used on Sun Niagara Example : Used in IBM AS/400

-------------------------------------------------------------------------------------------------------

3. Discuss shared memory multiprocessor with a neat diagram.

Shared memory multiprocessor (SMP):
• Shared memory multiprocessor (SMP) is one that offers the programmer a single physical
address space across all processors
• Processors communicate through shared variables in memory, with all processors capable
of accessing any memory location via loads and stores.
Types of SMP
1. UMA
2. NUMA
3. COMA

• Uniform memory access (UMA) multiprocessors

• All processors have access to all parts of main memory using loads and stores.
• The memory access time of a processor to all regions of memory is the same
• Caches are used to reduce latency and to lower bus traffic

Prepared By Dr V. Uma Rani, Associate Professor/ CSE

• It need hardware to ensure that caches and memory are consistent (cache
coherency)
• UMA need hardware mechanism to support process synchronization ( the process
of coordinating the behavior of two or more processes, which may be running on
different processors

• Nonuniform memory access (NUMA) multiprocessors

• All processors have access to all parts of main memory using loads and stores.
• The memory access time of a processor differs depending on which region of
main memory is accessed. (ie Not all processors have equal access time to all
memories)
• One SMP can directly access memory of another SMP.
• Memory access across link is slower
• If cache coherence is maintained, called CC-NUMA

Here core1 access own memory as fast but it take long time to access core2 memory.
Similarly core2 access own memory as fast but it take long time to access core1 memory.
• Cache Only Memory Access (COMA)
• The data can be read into the local caches and/or modified and then updated at
their “permanent” location.

Prepared By Dr V. Uma Rani, Associate Professor/ CSE

Here Data have no specific “permanent” location. The entire physical address space is
considered a huge, single cache. The data can be read into the local caches and/or modified
and then updated at their “permanent” location. Data can migrate and/or can be replicated in
the various memory banks of the central main memory.

4. Explain Multi core processors with a suitable diagram and examples.

A multi-core processor is an integrated circuit with two or more processors connected to it for
faster simultaneous processing of several tasks, reduced power consumption, and for greater
performance. Generally, it is made up of two or more processors that read and execute program
instructions.

Multicore Architecture

A multi-core processor's design enables the communication between all available cores, and they
divide and assign all processing duties appropriately. The processed data from each core is
transmitted back to the computer's main board (Motherboard) via a single common gateway once

Prepared By Dr V. Uma Rani, Associate Professor/ CSE

all of the processing operations have been finished. This method beats a single-core CPU in
terms of total performance.
Benefits of Multicore Processor:
o When compared to single-core processors, a multicore processor has the potential of doing
more tasks.
o Better application and Hardware performance
o Low energy consumption when doing many activities at once.
o Data takes less time to reach its destination since both cores are integrated on a single chip.
o With the use of a small circuit, the speed can be increased.
o Detecting infections with anti-virus software while playing a game is an example of
multitasking.
o With the use of low frequency, it can accomplish numerous tasks at the same time.
o In comparison to a single-core processor, it is capable of processing large amounts of data.
Limitations of Multicore Processor:
o Although it contains several processors, it is not twice as fast as a simple processor.
o The task of managing is more complicated as compared to managing a single-core CPU.
o The performance of a multi-core processor is entirely dependent on the tasks that users
execute.
o If other processors demand linear/sequential processing, multi-core processors take longer
to process.
o The battery drains more quickly.
o Its consumption of power is so high as compared to a simpler processor.
o Furthermore, in comparison to a single-core processor, it is more expensive.
Applications of the multicore processor are as follows:
o Games with high graphics, such as Overwatch and Star Wars Battlefront, as well as 3D
games.
o The multicore processor is more appropriate in Adobe Premiere, Adobe Photoshop,
iMovie, and other video editing software.
o Solidworks with computer-aided design (CAD).
o High network traffic and database servers.
o Industrial robots, for example, are embedded systems.

5. Draw and explain the block diagram of GPU architecture.

GPU architecture:

Prepared By Dr V. Uma Rani, Associate Professor/ CSE

GPU architecture is simple than that of CPU. Graphics processing unit architecture has
much more cores than a CPU to achieve parallel data processing with higher tolerate latency.
The GPU is good at data-parallel processing and high SP floating point arithmetic intensity

Main Features of GPU

• Hundreds of simple cores, operating on a common memory (like the PRAM model)
• High compute power but high memory latency (1:500)
• No caching, prefetching, etc
• High arithmetic intensity needed for good performance
• Graphics rendering, image/signal processing, matrix manipulation, FFT, etc

What is GPU?

o Dedicated graphics chip that handles all processing required for rendering 3D objects on
the screen
o Typically placed on a video card, which contains its own memory and display interfaces
(HDMI, DVI, VGA, etc)
o Primitive GPUs were developed in the 1980s, although the first “complete” GPUs began
in the mid 1990s.

GPU implements the graphics pipeline consisting of:

• Vertex transformations - Compute camera coords, lighting

• Geometry processing - Primitive-wide properties
• Rasterizing polygons to pixels - Find pixels falling on each polygon
• Processing the pixels - Texture lookup, shading, Z-values
• Writing to the framebuffer - Colour, Z-value

Example : NVIDIA GeForce

Prepared By Dr V. Uma Rani, Associate Professor/ CSE

• First GPU is GeForce 256 by NVIDIA in 1999.
• These GPU chips can process a minimum of 10 million polygons per second.
• The NVIDIA GPU - 128 cores on a single chip.
• Each core can handle 8 threads of instructions.
• Totally 1,024 (8 * 128) threads are executed concurrently on a single GPU.
• It has many Streaming Multiprocessors (SM) and Memory controller
• Multiple SMs can be built on single GPU chip.
• Each SM is associated with a private L1 Data Cache.
• Each SM has 16 load/store units allowing source and destination address to be calculated
for 16 threads/clock.
• Each SM has 32 CUDA cores (Totally 16*32 =512 CUDA Cores)
• Memory Controller (MC) is associated with a shared L2 cache for faster access to the
cached data.
• Both MC and L2 are on-chip
• GPU uses a programming model called CUDA (Compute Unified Device Architecture)
• CUDA is an extension of the C language.
• It enables the programmer to write C programs to execute on GPUs.
• It is used to control the device.
• The programmer specifies CPU and GPU functions. Host code (CPU) can be C++
Device code (GPU) may only be C.
• GPU memory systems are designed for data throughput with wide memory buses
• Much larger bandwidth than typical CPUs typically 6 to 8 times
• The on-chip SMEM memory is local to each Streaming Multiprocessor.

6. Draw and discuss about the cluster architecture and its types.

Cluster is a set of loosely (or) tightly connected computers(nodes) working together as a unified
computing resource.

Prepared By Dr V. Uma Rani, Associate Professor/ CSE

• Each node performs the same task
• Nodes are controlled by software.
• Connected to each other by I/O interconnect via standard network switches and cables
• Each computer has private memory and OS.
• It easier to expand the system
• It is also easier to replace a computer
• Easy to scale down gracefully

Types of Clustered System

1) Asymmetric Clustering System

• One of the nodes is in standby mode
• All the others run the required applications.
• The standby node continuously monitors the server and if it fails, the standby node
takes its place of the server.
2) Symmetric Clustering System
• All the nodes run applications as well as monitor each other.
• More efficient
• Doesn't keep a node merely as a standby.

ATTRIBUTES OF CLUSTERED SYSTEMS

1) Load Balancing Clusters

• Share the workload to provide a better performance.
• System performance is optimized.
• Use a round robin mechanism.

2) High Availability Clusters

• Improve the availability of the clustered system.
• Have extra nodes to be used if some of the system components fail.
• Removes single point of failure
• Also known as failover clusters or HA clusters.

Prepared By Dr V. Uma Rani, Associate Professor/ CSE

BENEFITS OF CLUSTERED SYSTEMS
• Performance
• Fault Tolerance
• Scalability
APPLICATIONS OF CLUSTERS
• Amazon, Facebook ,Google, Microsoft have multiple datacenters each with clusters of
tens of thousands of servers.
• Large Clusters are called as Warehouse-Scale Computers used to provide internet
services in Google,Facebook,Youtube,Amazon and cloud computing services -
Rackspace
o WSC is a cluster comprised of tens of thousands of servers.
o Act as one giant computer .
o Internet services necessitated the construction of new buildings to house, power,
and cool 100,000 servers.
o WSC often use a hierarchy of networks for interconnection
PART C

1. Explain in detail Flynn’s classification of parallel hardware.

FLYNN’S CLASSIFICATION:
Flynn‘s Taxonomy In 1966, Michael Flynn proposed a classification for computer
architectures based on the number of instruction steams and data streams.
1. SISD (Single Instruction stream, Single Data stream)
2. SIMD (Single Instruction stream, Multiple Data streams).
3. MISD (Multiple Instruction streams, SIngle Data stream)
4. MIMD (Multiple Instruction streams, Multiple Data stream)

SISD: (Singe-Instruction stream, Singe-Data stream)

 SISD corresponds to the traditional mono-processor ( von Neumann computer).

 A single data stream is being processed by one instruction stream
 A single-processor computer (uni-processor) in which a single stream of instructions
is generated from the program.

Prepared By Dr V. Uma Rani, Associate Professor/ CSE

 Example :
IBM 704, VAX 11/780, CRAY-1, Older mainframe computers
Single Instruction, Multiple Data (SIMD)
• Executes a single instruction on multiple data values simultaneously using many
processors. (ie All processors execute the same instruction at the same time)
• A single control unit does the fetch and decoding for all processors.
• SIMD architectures include array processors.
• It simplifies synchronization
• Reduced instruction control hardware
• Works best for highly data-parallel applications.
• E.g – ILLIAC-IV, MPP, CM-2, STARAN

Multiple Instruction, Single Data (MISD)

Executing different instructions but all of them operating on the same data stream.
• This structure is not commercially implemented.
• Systolic array is one example of an MISD architecture

Multiple Instruction, Multiple Data (MIMD)

 Execute multiple instructions simultaneously on multiple data streams.
 Each processor must include its own control unit

Prepared By Dr V. Uma Rani, Associate Professor/ CSE

 Distributed-memory multiprocessor or Shared-memory multicomputer.
 E.g – CRAY-XMP, IBM 370/168 M

SPMD : Single Program Multiple Data

It is a subcategory of MIMD.
• Tasks are split up and run simultaneously on multiple processors with different input in
order to obtain results faster.
---------------------------------------------------------------------------------------------------------------
2. Explain in detail about message passing multi processor with neat diagram.

Message passing multi-processor :

• A multiprocessor communicates via explicit message passing

• Each processor has private physical address space
• All processors have their memory and they communicate through message -passing.

• SW and HW interfaces for send/receive messages between processors

• Some concurrent applications run well on parallel HW, independent of shared-
address or message-passing

Prepared By Dr V. Uma Rani, Associate Professor/ CSE

• The message can be thought of as a remote procedure call.
• Much easier for hardware designer- Compared to implementation of cache coherent
protocol.
• Communication is explicit, Fewer performance surprise than with the implicit
communication in cache-coherent shared memory computers.
• Harder to port a sequential program to a message-passing computer, Since every
communication must be identified in advance
• Message Passing is a way of communicating between processors by explicitly sending
and receiving messages

• Coordination is built-in with message passing

• If sender needs confirmation that the message has arrived, the receiving processor can then
send an acknowledgment message back to the sender
• Types of message-passing
o Synchronous message passing systems - require the sender and receiver to wait
for each other while transferring the message.
o Asynchronous message passing - the sender and receiver do not wait for each
other and can carry on their own computations while transfer of messages is being
done

Advantages of Message Passing Model:

• Easier to build than scalable shared memory machines

• Easy to scale
• Coherency and synchronization is the responsibility of the user, so the system designer
need not worry about them.

Disadvantage of Message Passing Model

• Large overhead: copying of buffers requires large data transfers

• Programming is more difficult.
• Blocking nature of SEND/RECEIVE can cause increased latency and deadlock issues

Prepared By Dr V. Uma Rani, Associate Professor/ CSE

Unit V
Part A
1. Which one is not a characteristic of SRAM?
A. Information will be available as long as power is available.
B. More complex hardware compared to DRAM
C. Access time is 10 ns which is low compared to DRAM
D. Refreshing is needed

Answer : D
2. The reason for the implementation of the cache memory is ________
a) To increase the internal memory of the system
b) The difference in speeds of operation of the processor and memory
c) To reduce the memory access and cycle time
d) All of the mentioned
Answer : b)

3. Consider the following statements and Identify the correct statement about TLB.
I. Translation look aside buffer (TLB) is a cache that keeps track of recently
used address mappings to try to avoid an access to the page table.
II. The TLB is a memory type that is both cheaper and bigger than the register,
and faster and smaller than the main memory.
III. TLB is a part of the processor's memory management unit
IV. Also known as Address translation cache
A. I & III
B. I & II
C. I,III,IV
D. I,II,III,IV
Answer D.
4. What will be the width of address and data buses for a 512 K * 8 memory chip?
A. 19, 8
B. 8, 10
C. 9, 8
D. 9,16
Answer : A
Explanation : Use formula
x = log2(y)
x= log2(512*1024) = log 2 ( 219 ) =19 address lines

5. Order the steps to be taken in an instruction cache miss

I. Send the original PC value to the memory
II. Write the cache entry, putting the data from memory in the data portion of the
entry, writing the upper bits of the address into the tag field, and turning the
valid bit on.
III. Restart the instruction execution at the first step, which will prefetch the
instruction, this time finding it in the cache.
IV. Instruct the main memory to perform read and wait for the memory to
complete its access.
A. I,II,III,IV
B. I,IV,III,II
C. I,IV,II,III
D. IV,II,III,I
Answer : C
6. What is rotational latency?
A. Once the head has reached the correct track, we must wait for the desired sector to
rotate under the read /write head. This time is called the rotational latency or
rotational delay.
B. latency is generally induced by the hardware components
C. It is induced by the software components
D. It is induced by I/O devices
Answer: A
7. In memory-mapped I/O ____________
A. The I/O devices and the memory share the same address space
B. The I/O devices have a separate address space
C. The memory and I/O devices have an associated address space
D. A part of the memory is specifically set aside for the I/O operation
Answer : A
8. The signal sent to the device from the processor to the device after receiving an
interrupt is ___________
A. Interrupt-acknowledge
B. Return signal
C. Service signal
D. Permission signal
Answer : A)
9. The DMA transfers are performed by a control circuit called as __________
A) Device interface
B) DMA controller
C) Data controller
D) Overlooker
Answer : B
10. To resolve the clash over the access of the system BUS we use ______
a) Multiple BUS
b) BUS arbitrator
c) Priority access
d) None of the mentioned
Answer : B
PART B

1. Explain in detail about memory Hierarchy with neat diagram

2. Explain in detail about SRAM and DRAM.
3. Explain in detail about Cache memory in detail.
4. Discuss the methods used to measure and improve the performance of the cache.
5. Explain in detail about interrupts with diagram.
6. Write short notes on USB.
PART C

1. Describe the basic operations of cache in detail with diagram and discuss the various
mapping schemes used in cache design with example.
2. Draw the typical block diagram of a DMA controller and explain how it is used for
direct data transfer between memory and peripherals
UNIT V
PART B

1. Explain in detail about memory Hierarchy with neat diagram

Memory Hierarchy is one of the most required things in Computer Memory as it helps in
optimizing the memory available in the computer. There are multiple levels present in the
memory, each one having a different size, different cost, etc.
Types of Memory Hierarchy
This Memory Hierarchy Design is divided into 2 main types:
 External Memory or Secondary Memory: Comprising of Magnetic Disk, Optical
Disk, and Magnetic Tape i.e. peripheral storage devices which are accessible by the
processor via an I/O Module.
 Internal Memory or Primary Memory: Comprising of Main Memory, Cache
Memory & CPU registers. This is directly accessible by the processor.

Registers
Registers are small, high-speed memory units located in the CPU. They are used to store the
most frequently used data and instructions. Registers have the fastest access time and the
smallest storage capacity, typically ranging from 16 to 64 bits.
Cache Memory
Cache memory is a small, fast memory unit located close to the CPU. It stores frequently used
data and instructions that have been recently accessed from the main memory. Cache memory
is designed to minimize the time it takes to access data by providing the CPU with quick
access to frequently used data.
Main Memory
Main memory is the primary memory of a computer system. It has a larger storage capacity
than cache memory, but it is slower. Main memory is used to store data and instructions that
are currently in use by the CPU.

1 PREPARED BY
DR V.UMA RANI ASSO.PROF/CSE
RAM
 RAM is also known as Read/Write memory.
 The information stored in RAM can be read and also written.
 It is volatile .

TYPES OF RAM
 Static RAM: Static RAM stores the binary information in transistors and
information remains valid until power is supplied. It has a faster access time and is
used in implementing cache memory.
 Dynamic RAM: It stores the binary information as a charge on the capacitor. It
requires refreshing circuitry to maintain the charge on the capacitors after a few
milliseconds. It contains more memory cells per unit area as compared to SRAM.
ROM
 Memory is called a read-only memory, or ROM, when information can be written into it
only once at the time of manufacture.
 The information stored in ROM can then only be read.
 It is used to store programs that are permanently resident in the computer.
 ROM is non-volatile.
Secondary Storage
Secondary storage, such as hard disk drives (HDD) and solid-state drives (SSD), is a non-
volatile memory unit that has a larger storage capacity than main memory. It is used to store
data and instructions that are not currently in use by the CPU. Secondary storage has the
slowest access time and is typically the least expensive type of memory in the memory
hierarchy.
5. Magnetic Disk
Magnetic Disks are simply circular plates that are fabricated with either a metal or a plastic or
a magnetized material. The Magnetic disks work at a high speed inside the computer and these
are frequently used.
6. Magnetic Tape
Magnetic Tape is simply a magnetic recording device that is covered with a plastic film. It is
generally used for the backup of data. In the case of a magnetic tape, the access time for a
computer is a little slower and therefore, it requires some amount of time for accessing the
strip.
2 PREPARED BY
DR V.UMA RANI ASSO.PROF/CSE
Characteristics of Memory Hierarchy
 Capacity: It is the global volume of information the memory can store. As we
move from top to bottom in the Hierarchy, the capacity increases.
 Access Time: It is the time interval between the read/write request and the
availability of the data. As we move from top to bottom in the Hierarchy, the access
time increases.
 Performance: Earlier when the computer system was designed without a Memory
Hierarchy design, the speed gap increased between the CPU registers and Main
Memory due to a large difference in access time.
Advantages of Memory Hierarchy
 It helps in removing some destruction, and managing the memory in a better way.
 It helps in spreading the data all over the computer system.
 It saves the consumer’s price and time.
--------------------------------------------------------------------------------------------

2. Explain in detail about SRAM and DRAM.

SRAM (STATIC RANDOM ACCESS MEMORY)
Memories that consist of circuits capable of retaining their state as long as power is applied are
known as static memories.
SRAM is implemented with two inverters cross-connected to form a latch.
• The latch is connected to two bit lines by transistors T1 and T2.
• These transistors act as switches that can be opened or closed under control of the word line.
• When the word line is at ground level, the transistors are turned off and the latch retains its
state.

Advantages:
 Very low power consumption
 Can be accessed very quickly
 Less expensive
 Higher density
DRAM (DYNAMIC RANDOM ACCESS MEMORY)
 DRAM do not retain their state for a long period, unless they are accessed frequently for
Read or Write operations.
 The information is stored in a dynamic memory cell in the form of a charge on a
capacitor.
 The contents must be periodically refreshed.
3 PREPARED BY
DR V.UMA RANI ASSO.PROF/CSE
 The contents may be refreshed while accessing them for reading.
 To store information in this cell, transistor T is turned on and an appropriate voltage is
applied to the bit line.
 This causes a known amount of charge to be stored in the capacitor

TYPES OF DRAM
1. SDARAM
2. DDRAM
SDRAM (SYNCHRONOUS DRAM)
• DRAMs whose operation is synchronized with a clock signal are known as synchronous
DRAM (SDRAM).
• SDRAMs have built-in refresh circuitry
• SDRAMs operate with clock speeds that can exceed 1 GHz.
• SDRAMs have high data rate
DOUBLE-DATA-RATE SDRAM (DDR SDRAM)
• Data are transferred externally on both the rising and falling edges of the clock.
• They offer increased storage capacity, lower power, and faster clock speeds.
• The earliest version is known as DDR. Later versions, called DDR2, DDR3, and DDR4.
• DDR2 and DDR3 can operate at clock frequencies of 400 and 800 MHz, respectively.
• They transfer data using the effective clock speeds of 800 and 1600 MHz, respectively

--------------------------------------------------------------------------------------------------------
3. Explain in detail about Cache memory in detail.
Cache Memory :
A faster and smaller segment of memory whose access time is as close as registers are
known as Cache memory. In a hierarchy of memory, cache memory has access time lesser
than primary memory. Generally, cache memory is very smaller and hence is used as a
buffer.
 Cache memory is faster, they can be accessed very fast
 Cache memory is smaller, a large amount of data cannot be stored

4 PREPARED BY
DR V.UMA RANI ASSO.PROF/CSE
Need of cache memory
Data in primary memory can be accessed faster than secondary memory but still, access times
of primary memory are generally in few microseconds, whereas CPU is capable of performing
operations in nanoseconds. Due to the time lag between accessing data and acting of data
performance of the system decreases as the CPU is not utilized properly, it may remain idle for
some time. In order to minimize this time gap new segment of memory is Introduced known as
Cache Memory.
Types of Cache Memory (Based on location)
L1 or Level 1 Cache: It is the first level of cache memory that is present inside the processor.
It is present in a small amount inside every core of the processor separately. The size of this
memory ranges from 2KB to 64 KB.
L2 or Level 2 Cache: It is the second level of cache memory that may present inside or
outside the CPU. If not present inside the core, It can be shared between two cores depending
upon the architecture and is connected to a processor with the high-speed bus. The size of
memory ranges from 256 KB to 512 KB.

L3 or Level 3 Cache: It is the third level of cache memory that is present outside the CPU and
is shared by all the cores of the CPU. Some high processors may have this cache. This cache is
used to increase the performance of the L2 and L1 cache. The size of this memory ranges from
1 MB to 8MB.

TYPES OF CACHE (Based on data)

1. Unified cache : Data and instructions are stored together (Von Neumann
Architecture)
2. Split cache: Data and instructions are stored separately (Harvard architecture)
5 PREPARED BY
DR V.UMA RANI ASSO.PROF/CSE
If a process needs some data, it first searches in the cache memory.
• If the data is available in the cache, this is termed as a cache hit and the data is accessed as
required.
• If the data is not in the cache, then it is termed as a cache miss. Then the data is obtained from
the main memory. A copy of the data is stored in cache and then forwarded to processor.

TYPES OF CACHE (Based on characteristics)

• Read architecture and Write architecture
Read Architecture use two policy to read data
1) Look Aside
CPU requests memory from cache and main memory simultaneously. If the data is in the
cache then it is returned, otherwise the CPU waits for the data from the main memory.
2) Look Through - CPU request memory from the cache. Only if the data is not present
then the main memory queried.
Write Architecture use two policies to write data
1) Write Back - Data in the cache is compared to the data in the main memory. Data is
written only if there is a difference.
2) Write Through - When data is stored back to memory it is written to cache and main
memory at the same time.
Advantages
 Cache memory is faster than main memory.
 It consumes less access time as compared to main memory.
 It stores the program that can be executed within a short period of time.
 It stores data for temporary use.

--------------------------------------------------------------------------------------------------

4. Discuss the methods used to measure and improve the performance of the cache.

CACHE PERFORMANCE
If a process needs some data, it first searches in the cache memory.
• If the data is available in the cache, this is termed as a cache hit and the data is accessed as
required.
• If the data is not in the cache, then it is termed as a cache miss. Then the data is obtained
from the main memory. A copy of the data is stored in cache and then forwarded to processor
There are two different techniques for improving cache performance.
Techniques for Reducing the miss rate by more flexible block replacement
1) Direct Mapping
2) Set-associative cache
3) Fully associated cache
Techniques for Reducing the miss penalty by an additional level
1) Multilevel caching
Cache mapping

6 PREPARED BY
DR V.UMA RANI ASSO.PROF/CSE
 Cache mapping refers to a technique using which the content present in the main
memory is brought into the memory of the cache
 The correspondence between the main memory blocks and cache is specified by a
“Mapping Function”.
 Mapping functions determine how memory blocks are placed in the cache.

Types of Cache Mapping

Direct Mapping
 In direct mapping, a certain block of the main memory would be able to map a cache
only up to a certain line of the cache.
 The total line numbers of cache to which any distinct block can map are given by the
following:
 Cache line number = (Address of the Main Memory Block ) Modulo (Total number
of lines in Cache)

7 PREPARED BY
DR V.UMA RANI ASSO.PROF/CSE
 block of the main memory would be able to map to a certain line of the cache only.
 Thus, the incoming (new) block always happens to replace the block that already exists,
if any, in this certain line.
K-way Set Associative Mapping
 The grouping of the cache lines occurs into various sets where all the sets consist of k
number of lines.
 Any given main memory block can map only to a particular cache set.
 However, within that very set, the block of memory can map any cache line that is freely
available.
 The cache set to which a certain main memory block can map is basically given as
follows:
Cache set number = ( Block Address of the Main Memory ) Modulo (Total Number of sets
present in the Cache)

 k = 2 would suggest that every set consists of two cache lines.

8 PREPARED BY
DR V.UMA RANI ASSO.PROF/CSE
 Here, within this very set, the block ‘j’ is capable of mapping to any cache line that is
freely available at that moment.
 In case all the available cache lines happen to be occupied, then one of the blocks that
already exist needs to be replaced.
Fully Associative Mapping
In the case of fully associative mapping,

 The main memory block is capable of mapping to any given line of the cache that’s
available freely at that particular moment.
 It helps us make a fully associative mapping comparatively more flexible than direct
mapping.

 Every single line of cache is available freely.

 Thus, any main memory block can map to a line of the cache.
 In case all the cache lines are occupied, one of the blocks that exists already needs to be
replaced.

5. Explain in detail about interrupts with diagram.

An interrupt is a signal from a device attached to a computer or from a program within the
computer that requires the operating system to stop and figure out what to do next.
Interrupt is the method of creating a temporary halt during program execution and allows
peripheral devices to access the microprocessor.
Whenever an interrupt occurs, it causes the CPU to stop executing the current program. Then,
comes the control to interrupt handler or interrupt service routine.

9 PREPARED BY
DR V.UMA RANI ASSO.PROF/CSE
The processor responds to that interrupt with an ISR (Interrupt Service Routine), which is a
short program to instruct the microprocessor on how to handle the interrupt.

These are the steps in which ISR (Interrpt Service Routine) handles interrupts. These are as
follows −
Step 1 − When an interrupt occurs let assume processor is executing i'th instruction and
program counter will point to the next instruction (i+1)th.
Step 2 − When an interrupt occurs the program value is stored on the process stack and the
program counter is loaded with the address of interrupt service routine.
Step 3 − Once the interrupt service routine is completed the address on the process stack is
popped and placed back in the program counter.
Step 4 − Now it executes the resume for (i+1)th line.

Types of interrupts

Interrupts are classified into two types:

Hardware interrupt

10 PREPARED
BY
DR V.UMA RANI ASSO.PROF/CSE
The interrupt signal generated from external devices and i/o devices are made interrupt to CPU
when the instructions are ready.
For example − In a keyboard if we press a key to do some action this pressing of the keyboard
generates a signal that is given to the processor to do action, such interrupts are called hardware
interrupts.
Hardware interrupts are classified into two types which are as follows −
 Maskable Interrupt − The hardware interrupts that can be delayed when a
highest priority interrupt has occurred to the processor.
 Non Maskable Interrupt − The hardware that cannot be delayed and
immediately be serviced by the processor.
Software interrupts
The interrupt signal generated from internal devices and software programs need to access any
system call then software interrupts are present.
Software interrupt is divided into two types. They are as follows −
 Normal Interrupts − The interrupts that are caused by the software instructions
are called software instructions.
 Exception − Exception is nothing but an unplanned interruption while executing a
program. For example − while executing a program if we got a value that is
divided by zero is called an exception.

6. Write short notes on USB.

USB

Universal Serial Bus (USB) is an industry-standard that establishes specifications for

connectors, cables, and protocols for communication, connection, and power supply between
personal computers and their peripheral devices.
The Universal Serial Bus (USB) is the most widely used interconnection standard.
A large variety of devices are available with a USB connector, including mouse, memory keys,
disk drives, printers, cameras, and many more.
The commercial success of the USB is due to its simplicity and low cost.
The original USB specification supports two speeds of operation, called low-speed (1.5
Megabits/s) and full-speed (12 Megabits/s). Later, USB 2, called High-Speed USB, was
introduced.
It enables data transfers at speeds up to 480 Megabits/s.
As I/O devices continued to evolve with even higher speed, USB 3 (called Superspeed) was
developed
There have been 3 generations of USB specifications:
1. USB 1.x
2. USB 2.0

11 PREPARED
BY
DR V.UMA RANI ASSO.PROF/CSE
3. USB 3.x
The first USB was formulated in the mid-1990s. USB 1.1 was announced in 1995 and released
in 1996. It was too popular and grab the market till about the year 2000. In the duration of USB
1.1 Intel announced a USB host controller and Philips announced USB audio for isochronous
communication with consumer electronics devices.
In April of 2000, USB 2.0 was announced. USB 2.0 has multiple updates and additions. The
USB Implementer Forum (USB IF) currently maintains the USB standard and it was released
in 1996.
USB was designed to standardize the connection of peripherals like pointing devices,
keyboards, digital still, and video cameras.

Host Controller:
The host controller initiates all data transfer, and the root hub provides a connection between
devices and the host controller. The root hub receives transactions generated by the host
controller and transmits them to the USB devices. The host controller uses polling to detect a
new device that connected to the bus or disconnected from it.

12 PREPARED
BY
DR V.UMA RANI ASSO.PROF/CSE
Root Hub:
The root hub performs power distribution to the devices, enables and disables the ports, and
reports the status of each port to the host controller. The root hub provides the connection
between the host controller and USB ports.

Hub:
Hubs are used to expand the number of devices connected to the USB system. Hubs can detect
when a device is attached or removed from the port. The below figure shows the architecture of
the hub. The upstream port is connected to the host, and USB devices are connected to the
downstream port.

USB Cable:
A USB port has four pins, which consists of four wires, with the V bus used to power the
devices.

USB Device:
USB device is divided into the classes such as a hub, printer, or mass storage. The USB device
has information about its configuration such as class, type, manufacture ID, and data rate. The
host controller uses this information to load device software from the hard disk.

Advantages of USB –
The Universal Serial Bus was designed to simplify and improve the interface between personal
computers and peripheral devices when compared with previously existing standard or ad-hoc
proprietary interfaces.
1. The USB interface is self-configuring. This means that the user need not adjust
settings on the device and interface for speed or data format, or configure interrupts,
input/output addresses, or direct memory access channels.
2. USB connectors are standardized at the host, so any peripheral can use any
available receptacle. USB takes full advantage of the additional processing power
that can be economically put into peripheral devices so that they can manage
themselves. USB devices mostly do not have user-adjustable interface settings.
3. The USB interface is hot pluggable or plug and plays, meaning devices can be
exchanged without rebooting the host computer. Small devices can be powered
directly from the USB interface thus removing extra power supply cables.
4. The USB interface defines protocols for improving reliability over previous
interfaces and recovery from common errors.
5. Installation of a device relying on the USB standard minimal operator action is
required.
Disadvantages of USB –
1. USB cables are limited in length.

13 PREPARED
BY
DR V.UMA RANI ASSO.PROF/CSE
2. USB has a strict “tree” topology and “master-slave” protocol for addressing
peripheral devices. Peripheral devices cannot interact with one another except via
the host, and two hosts cannot communicate over their USB ports directly.
3. Some very high-speed peripheral devices require sustained speeds not available in
the USB standard.
4. For a product developer, the use of USB requires the implementation of a complex
protocol and implies an intelligent controller in the peripheral device.
5. Use of the USB logos on the product requires annual fees and membership in the
organization.

PART C
1. Describe the basic operations of cache in detail with diagram and discuss the
variousmapping schemes used in cache design with example.
Cache Memory :
A faster and smaller segment of memory whose access time is as close as registers are
known as Cache memory. In a hierarchy of memory, cache memory has access time lesser
than primary memory. Generally, cache memory is very smaller and hence is used as a
buffer.
 Cache memory is faster, they can be accessed very fast
 Cache memory is smaller, a large amount of data cannot be stored

Cache mapping
 Cache mapping refers to a technique using which the content present in the main
memory is brought into the memory of the cache
 The correspondence between the main memory blocks and cache is specified by a
“Mapping Function”.
 Mapping functions determine how memory blocks are placed in the cache.

Types of Cache Mapping

14 PREPARED
BY
DR V.UMA RANI ASSO.PROF/CSE
Direct Mapping
 In direct mapping, a certain block of the main memory would be able to map a cache
only up to a certain line of the cache.
 The total line numbers of cache to which any distinct block can map are given by the
following:
 Cache line number = (Address of the Main Memory Block ) Modulo (Total number
of lines in Cache)

 block of the main memory would be able to map to a certain line of the cache only.
 Thus, the incoming (new) block always happens to replace the block that already exists,
if any, in this certain line.
K-way Set Associative Mapping
 The grouping of the cache lines occurs into various sets where all the sets consist of k
number of lines.
 Any given main memory block can map only to a particular cache set.

15 PREPARED
BY
DR V.UMA RANI ASSO.PROF/CSE
 However, within that very set, the block of memory can map any cache line that is freely
available.
 The cache set to which a certain main memory block can map is basically given as
follows:
Cache set number = ( Block Address of the Main Memory ) Modulo (Total Number of sets
present in the Cache)

 k = 2 would suggest that every set consists of two cache lines.

 Since the cache consists of 6 lines, the total number of sets that are present in the cache =
6 / 2 = 3 sets.
 The block ‘j’ of the main memory is capable of mapping to the set number only (j mod 3)
of the cache.
 Here, within this very set, the block ‘j’ is capable of mapping to any cache line that is
freely available at that moment.
 In case all the available cache lines happen to be occupied, then one of the blocks that
already exist needs to be replaced.
Fully Associative Mapping
In the case of fully associative mapping,

 The main memory block is capable of mapping to any given line of the cache that’s
available freely at that particular moment.

16 PREPARED
BY
DR V.UMA RANI ASSO.PROF/CSE
 It helps us make a fully associative mapping comparatively more flexible than direct
mapping.

 Every single line of cache is available freely.

 Thus, any main memory block can map to a line of the cache.
In case all the cache lines are occupied, one of the blocks that exists already needs to be
replaced.

2. Draw the typical block diagram of a DMA controller and explain how it is used for
direct data transfer between memory and peripherals

DMA Controller is a type of control unit that works as an interface for the data bus and the I/O
Devices. As mentioned, DMA Controller has the work of transferring the data without the
intervention of the processors, processors can control the data transfer. DMA Controller also
contains an address unit, which generates the address and selects an I/O device for the transfer
of data

17 PREPARED
BY
DR V.UMA RANI ASSO.PROF/CSE
Working of DMA Controller
The DMA controller registers have three registers as follows.
 Address register – It contains the address to specify the desired location in
memory.
 Word count register – It contains the number of words to be transferred.
 Control register – It specifies the transfer mode.

The CPU initializes the DMA by sending the given information through the data bus.
 The starting address of the memory block where the data is available (to read) or
where data are to be stored (to write).
 It also sends word count which is the number of words in the memory block to be
read or written.
 Control to define the mode of transfer such as read or write.
 A control to begin the DMA transfer

Direct data transfer between memory and peripherals

 DMA is a process of communication for data transfer between memory and input/output,
controlled by an external circuit called DMA controller, without involvement of CPU.
 8085 MP has two pins HOLD and HLDA which are used for DMA operation.
18 PREPARED
BY
DR V.UMA RANI ASSO.PROF/CSE
 First, DMA controller sends a request by making Bus Request (BR) control line high.
When MP receives high signal to HOLD pin, it first completes the execution of current
machine cycle, it takes few clocks and sends HLDA signal to the DMA controller.
 After receiving HLDA through Bus Grant (BG) pin of DMA controller, the DMA
controller takes control over system bus and transfers data directly between memory and
I/O without involvement of CPU. During DMA operation, the processor is free to
perform next job which does not need system bus.
 At the end of data transfer, the DMA controller terminates the request by sending low
signal to HOLD pin and MP regains control of system bus by making HLDA low.
Modes of Data Transfer in DMA
There are 3 modes of data transfer in DMA that are described below.
 Burst Mode: In Burst Mode, buses are handed over to the CPU by the DMA if the
whole data is completely transferred, not before that.
 Cycle Stealing Mode: In Cycle Stealing Mode, buses are handed over to the CPU
by the DMA after the transfer of each byte. Continuous request for bus control is
generated by this Data Transfer Mode. It works more easily for higher-priority
tasks.
 Transparent Mode: Transparent Mode in DMA does not require any bus in the
transfer of the data as it works when the CPU is executing the transaction.
Advantages of DMA Controller
 Data Memory Access speeds up memory operations and data transfer.
 CPU is not involved while transferring data.
 DMA requires very few clock cycles while transferring data.
 DMA distributes workload very appropriately.
 DMA helps the CPU in decreasing its load.
Disadvantages of DMA Controller
 Direct Memory Access is a costly operation because of additional operations.
 DMA suffers from Cache-Coherence Problems.
 DMA Controller increases the overall cost of the system.
 DMA Controller increases the complexity of the software.

19 PREPARED
BY
DR V.UMA RANI ASSO.PROF/CSE

wmq80 Installconfig
No ratings yet
wmq80 Installconfig
1,338 pages
Precision Agriculture
100% (2)
Precision Agriculture
245 pages
Digital Fundamentals & Computer Architecture
No ratings yet
Digital Fundamentals & Computer Architecture
110 pages
Mil STD 464c
No ratings yet
Mil STD 464c
165 pages
4G Lte: Vivek Datar, Anand Jayaraman, David Mindel
No ratings yet
4G Lte: Vivek Datar, Anand Jayaraman, David Mindel
84 pages
How To Write A Cover Letter PDF
No ratings yet
How To Write A Cover Letter PDF
5 pages
DPCO Unit-4
No ratings yet
DPCO Unit-4
44 pages
CS3351 DPCO UNIT 4 Notes
No ratings yet
CS3351 DPCO UNIT 4 Notes
47 pages
Single Cycle Mips
No ratings yet
Single Cycle Mips
25 pages
Statistics - Module 1 BSAC
No ratings yet
Statistics - Module 1 BSAC
4 pages
4 The Processors
No ratings yet
4 The Processors
112 pages
Pipelining ControlUnitAndHazards
No ratings yet
Pipelining ControlUnitAndHazards
109 pages
The Processor
No ratings yet
The Processor
83 pages
Computer Architecture Unit III
No ratings yet
Computer Architecture Unit III
48 pages
8 Pipeline DDP Control
No ratings yet
8 Pipeline DDP Control
54 pages
Chapter 4.1 - 4.4 Designing A Single Cycle Processor
No ratings yet
Chapter 4.1 - 4.4 Designing A Single Cycle Processor
64 pages
STUDENT REGISTRATION FORM - HTML
No ratings yet
STUDENT REGISTRATION FORM - HTML
3 pages
Dpco Unit 4
No ratings yet
Dpco Unit 4
65 pages
CS3352 DPCO UNIT 4 NOTES EduEngg
No ratings yet
CS3352 DPCO UNIT 4 NOTES EduEngg
67 pages
Lecture # Datapat
No ratings yet
Lecture # Datapat
46 pages
4 - SingleCycle Implementation
No ratings yet
4 - SingleCycle Implementation
59 pages
Single Cycle Impentation
No ratings yet
Single Cycle Impentation
52 pages
Lec12 DataPath
No ratings yet
Lec12 DataPath
43 pages
Laboratory Manual CCS
No ratings yet
Laboratory Manual CCS
43 pages
Personal Finance Management
No ratings yet
Personal Finance Management
39 pages
Phy 108
No ratings yet
Phy 108
24 pages
Flanges General For Dummies
No ratings yet
Flanges General For Dummies
68 pages
Embedded Computer Architecture 5SAI0
No ratings yet
Embedded Computer Architecture 5SAI0
59 pages
cs2100 14 Datapath
No ratings yet
cs2100 14 Datapath
43 pages
Unit 4
No ratings yet
Unit 4
53 pages
3 - Processor (Single Cycle)
No ratings yet
3 - Processor (Single Cycle)
53 pages
Datapath (Handling R, I, J Type Instructions With Implementation of Single Cyle Datapath With Control Lines)
No ratings yet
Datapath (Handling R, I, J Type Instructions With Implementation of Single Cyle Datapath With Control Lines)
54 pages
Lecture-3-07 01 2025
No ratings yet
Lecture-3-07 01 2025
16 pages
6 Mips Datapath
No ratings yet
6 Mips Datapath
55 pages
Assessment: Lead and Manage Team Effectiveness
No ratings yet
Assessment: Lead and Manage Team Effectiveness
36 pages
21CS403Notes 5
No ratings yet
21CS403Notes 5
16 pages
Processor DP Control
No ratings yet
Processor DP Control
44 pages
Svcet: 1. What Is MIPS, MIPS Instruction, MIPS Implementation
No ratings yet
Svcet: 1. What Is MIPS, MIPS Instruction, MIPS Implementation
24 pages
Ca06 2014 PDF
No ratings yet
Ca06 2014 PDF
53 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
50 pages
The Processor: (Datapath and Pipelining)
No ratings yet
The Processor: (Datapath and Pipelining)
144 pages
Unit Iii
No ratings yet
Unit Iii
19 pages
Processor Datapath & Control-1
No ratings yet
Processor Datapath & Control-1
16 pages
KAIST cs311 05 Proc I
No ratings yet
KAIST cs311 05 Proc I
28 pages
CAO Unit 3 Notes
No ratings yet
CAO Unit 3 Notes
20 pages
Chapter 11 Single Cycle Datapath
No ratings yet
Chapter 11 Single Cycle Datapath
17 pages
Technical White Paper of JA Solar DeepBlue 4.0X
No ratings yet
Technical White Paper of JA Solar DeepBlue 4.0X
12 pages
2 PCs Connectivity Using IPv6
No ratings yet
2 PCs Connectivity Using IPv6
8 pages
Single Cycle Datapath PDF
No ratings yet
Single Cycle Datapath PDF
30 pages
CS104: Computer Organization: 11 March, 2020
No ratings yet
CS104: Computer Organization: 11 March, 2020
37 pages
Lab 7 PDF
No ratings yet
Lab 7 PDF
9 pages
L24 Pipeline
No ratings yet
L24 Pipeline
40 pages
Cyan 2800398239029h09fn0ivj0vcjb0
No ratings yet
Cyan 2800398239029h09fn0ivj0vcjb0
16 pages
Design of The MIPS Processor
No ratings yet
Design of The MIPS Processor
24 pages
ELEN 350 Single Cycle Datapath: Adapted From The Lecture Notes of John Kubiatowicz (UCB) and Hank Walker (TAMU)
No ratings yet
ELEN 350 Single Cycle Datapath: Adapted From The Lecture Notes of John Kubiatowicz (UCB) and Hank Walker (TAMU)
61 pages
Basic Mips Implementation
No ratings yet
Basic Mips Implementation
7 pages
DataPath Design
No ratings yet
DataPath Design
23 pages
UNIT-3: MIPS Instructions
No ratings yet
UNIT-3: MIPS Instructions
15 pages
7807 Et ET
No ratings yet
7807 Et ET
10 pages
11unit 3 Data Path Implementation
No ratings yet
11unit 3 Data Path Implementation
20 pages
Unit 3 Computer Architecture
No ratings yet
Unit 3 Computer Architecture
3 pages
Dokumen - Tips - Punch List Procedure 2
No ratings yet
Dokumen - Tips - Punch List Procedure 2
15 pages
MHA 850 MHZ Datasheet PDF
No ratings yet
MHA 850 MHZ Datasheet PDF
3 pages
CA Unit 3 Answers
No ratings yet
CA Unit 3 Answers
10 pages
Wa0031.
No ratings yet
Wa0031.
10 pages
Enterprise Networking, Security, and Automation - IPsec
No ratings yet
Enterprise Networking, Security, and Automation - IPsec
7 pages
Chapter 5 - The Processor, Datapath and Control
No ratings yet
Chapter 5 - The Processor, Datapath and Control
23 pages
Designof Mips2part PDF
No ratings yet
Designof Mips2part PDF
13 pages
Skills For Data Manipulation
No ratings yet
Skills For Data Manipulation
9 pages
6 SQL Loader
No ratings yet
6 SQL Loader
3 pages
SAP Note 65109 - Long Delays When Printing During Overload
No ratings yet
SAP Note 65109 - Long Delays When Printing During Overload
4 pages
STW1000 Datasheet e ZIEHL
No ratings yet
STW1000 Datasheet e ZIEHL
1 page
Building A Data Path - Processor - Digital Principles and Computer Organization
No ratings yet
Building A Data Path - Processor - Digital Principles and Computer Organization
6 pages
Scope of Work - MASSANJORE DAM
No ratings yet
Scope of Work - MASSANJORE DAM
2 pages
Y70, Y90 New
No ratings yet
Y70, Y90 New
9 pages
COA Module 2 Notes
No ratings yet
COA Module 2 Notes
46 pages
Cpu Data Path: Professor Michael Mcgarry
No ratings yet
Cpu Data Path: Professor Michael Mcgarry
8 pages
The Processor: Datapath & Control: Simplified Implementation of MIPS With
No ratings yet
The Processor: Datapath & Control: Simplified Implementation of MIPS With
10 pages
Set 4 IBM-322
No ratings yet
Set 4 IBM-322
3 pages
In Pneumatic Systems The Medium Used Is
No ratings yet
In Pneumatic Systems The Medium Used Is
8 pages
UTV400 ROV Mobilization at UTRACO 13052024
No ratings yet
UTV400 ROV Mobilization at UTRACO 13052024
3 pages
Anubhavi Gaur
No ratings yet
Anubhavi Gaur
1 page
Pump Model: CP Smart 9 Bar: Italy Italy
No ratings yet
Pump Model: CP Smart 9 Bar: Italy Italy
2 pages
Frontline Robotics Teleoperated UGV (TUGV)
No ratings yet
Frontline Robotics Teleoperated UGV (TUGV)
3 pages
Smoke Detector
No ratings yet
Smoke Detector
1 page
A Single-Cycle MIPS Processor
No ratings yet
A Single-Cycle MIPS Processor
13 pages
Precast Box and Pipe Culvert Casting
No ratings yet
Precast Box and Pipe Culvert Casting
1 page
Chapter 5: The Processor: Datapath and Control: I. MIPS Implementation
No ratings yet
Chapter 5: The Processor: Datapath and Control: I. MIPS Implementation
6 pages
Computer Science II Essentials
From Everand
Computer Science II Essentials
Randall Raus
No ratings yet
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet