0% found this document useful (0 votes)
10 views

Computer Architecture

Computer Architecture Revision + Assignments

Uploaded by

lovawi2264
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Computer Architecture

Computer Architecture Revision + Assignments

Uploaded by

lovawi2264
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Modern University for technology & information

Faculty of computers & Information


Fall 2018

Computer Architecture
Revision
+
Assignments

Contents:
Lecture 1: Computer Evolution & Performance
Lecture 2: Computer Function and Interconnection
Lecture 3: Memory Organization
Lecture 4: Computer Arithmetic Basic Concept
Lecture 5: Arithmetic Logic Unit (ALU)
Lecture 6: Instruction Sets

Fall 2018
Important:

L1: Assignments:
1 4 5 6

‫لو جات القانون‬ ‫القانون مش‬


‫هيكون في ورقة‬ ‫هيجي في ورقة‬
‫األسئلة‬ ‫األسئلة‬

Performance Assessment ‫ علي ال‬47 ‫ ل‬38 ‫النظري الهام من ساليد‬

L2: Assignments:
7 9

Interrupt ‫علي ال‬32 ‫ ل‬18 ‫النظري الهام من ساليد‬

L3: Assignments: ‫القانون مش‬


1 14 ‫هيجي في ورقة‬
‫األسئلة‬
1
Memory Types ‫النظري الهام علي‬

L4:
• Hardware for signed magnitude ‫رسمة ال‬
• Assignment #15
• Types of adders slide #18 + its assignment
• Multiplication slide 31
• Division

L5:
• ALU Structure
• ALU Design ‫شرح‬
• 13 ‫ ل‬6 ‫النظري الهام من ساليد‬

L6:
instruction set based slides from 12 to 21
addressing modes slides from 26 to 42 ‫القانون في ورقة األسئلة‬

2
Computer Architecture Fall 2018
Lecture 1: Computer Evolution & Performance
Definition of computer Architecture: It is the design of a computer system.
i.e. it is the art of assembling logical elements into a computing device.
Computer Architecture has some

Attributes Specifications
which has direct impact on the logical like the operational units and its
execution of a program interconnections
e.g.

• instruction set
• number of bits to represent various
data types
• I/O mechanisms
• and techniques for addressing memory

4 main basic functions for the computer:


1. Data processing
2. Data storage
3. Data movement
4. Control

Structure View: It is the way in which the components for the computer are interrelated.
There are 4 main components:
1. Central processing unit (CPU or processor) which consists of Control unit, ALU ,
registers and CPU interconnection
2. Main memory to store data
3. I/O to move data between the computer and outside world
4. System interconnection for communication among CPU, main memory, and I/O
=
Between

3
Computer Architecture Fall 2018
A performance balance: is urgent due to the difference between speed of processor and
other critical computer components (e.g. memory and I/O devices)
To achieve this balance:

• For the memory:


1. Increase the number of bits that can be retrieved at one time.
2. Include a cache or other buffer scheme on the memory chip.
3. Reduce the frequency of memory access.
4. Increase the interconnect bandwidth between processors and memory by
using higher-speed buses.

• For the I/O devices:


1. Cache and buffering schemes
2. Higher speed interconnection buses
3. Multiple processor configuration
But we still need to increase the processor speed (How?)
There are three approaches to improve the chip organization and architecture:
1. Increase the H/W speed of the processor
2. Increase the size and speed of caches between memory and processor.
3. Make change to the processor organization and architecture to increase the effective
speed of instruction execution.
Practically, Intel try making changes to processor organization and architecture to increase
the clock speed and logic density (third policy) by solving the following serious problems:
1. Power density (watts/cm2): and its corresponding heat dissipation due to the
increase in the density of logic and the clock speed on the chip
2. RC delay: the speed at which electrons can flow on a chip between transistors is
limited by the resistance and capacitance of the metal wires connecting them;
As components on the chip decrease in size, the wire interconnects become thinner,
increasing resistance (R= ( *L)/A).

Also, the wires are closer together, increasing capacitance So, the RC product
increases and delay () increases (  RC).

3. Memory latency: memory speeds lag processor speeds (management and


technological techniques)
I stop here

4
Computer Architecture Fall 2018
Performance Assessment
Clock Speed
In a computer, clock speed refers to the number of pulses per second generated by
an oscillator that sets the tempo for the processor.
• Key parameters
Performance, cost, size, security, reliability, power consumption
• System clock speed
In Hz or multiples of
o Clock rate, clock cycle, clock tick, cycle time
• Signals in CPU take time to settle down to 1 or 0
• Signals may change at different speeds
• Operations need to be synchronised
• Instruction execution in discrete steps
Fetch, decode, load and store, arithmetic or logical
Usually require multiple clock cycles per instruction
• Pipelining gives simultaneous execution of instructions
Conclusion: clock speed is not the whole story

Recent Alternatives

• Designers improve performance without increasing the clock rate by placing multiple
processors on the same chip, with a large shared cache (multiple cores)
• Studies indicates that, within a processor, the increase in performance is proportional
to the square root of the increase in complexity

(performance  processor complexity)

• but if the software can support the effective use of multiple processors, then doubling
the number of processors almost double performance.
• So, the strategy is to use two simpler processors on the chip with more caches rather
than one more complex processor (IBM Power4 in 2001)

5
Computer Architecture Fall 2018
Instruction Execution Rate
• Millions of instructions per Second (MIPS)
• Millions of Floating-point Operations instructions per Second (MFLOPS)
• Heavily dependent on instruction set, compiler design, processor implementation,
cache & memory hierarchy

SPEC
• SPEC is an acronym for the Standard Performance Evaluation Corporation.
• SPEC is a non-profit organization composed of computer vendors, systems
integrators, universities, research organizations, publishers and consultants whose
goal is to establish, maintain and endorse a standardized set of relevant
benchmarks for computer systems

Benchmarks
• Programs designed to test performance
• Written in high level language
• Represents style of task
Systems, numerical, commercial
• Easily measured
• Widely distributed

SPEC Speed Metric


• Single task
• Base runtime defined for each benchmark using reference machine
• Results are reported as ratio of reference time to system run time
Trefi execution time for benchmark i on reference machine
Tsuti execution time of benchmark i on test system
𝑻𝒓𝒆𝒇𝒊
𝒓𝒊 =
𝑻𝒔𝒖𝒕𝒊
• Overall performance calculated by averaging ratios for all 12 integer benchmarks
− Use geometric mean 12‫متوسط ال‬
− Appropriate for normalized numbers such as ratios

𝒏 𝟏⁄
𝒏

𝒓𝑮 = (∏ 𝒓𝒊 )
𝒊=𝟏

6
Computer Architecture Fall 2018
Amdahl’s Law

• Gene Amdahl [AMDA67]


• Potential speed up of program using multiple processors
• Concluded that:
− Code needs to be parallelizable
− Speed up is bound, giving diminishing returns for more processors
• Task dependent
− Servers gain by maintaining multiple connections on multiple processors
− Databases can be split into parallel tasks
− For program running on single processor

• Amdahl’s Law Formula


− f: Fraction f of code infinitely parallelizable with no scheduling overhead.
− (1-f): Fraction (1-f) of code inherently serial.
− T: is total execution time for program on single processor.
− N: is number of processors that fully exploit parallel portions of code.

𝒕𝒊𝒎𝒆 𝒕𝒐 𝒆𝒙𝒄𝒖𝒓𝒆 𝒑𝒓𝒐𝒈𝒓𝒂𝒎 𝒐𝒏 𝒂 𝒔𝒊𝒏𝒈𝒍𝒆 𝒑𝒓𝒐𝒄𝒆𝒔𝒔𝒐𝒓 𝑻(𝟏 − 𝒇) + 𝑻𝒇 𝟏


𝒔𝒑𝒆𝒆𝒅𝒖𝒑 = = =
𝒕𝒊𝒎𝒆 𝒕𝒐 𝒆𝒙𝒄𝒖𝒕𝒆 𝒑𝒓𝒐𝒈𝒓𝒂𝒎 𝒐𝒏 𝑵 𝒑𝒂𝒓𝒂𝒍𝒍𝒆𝒍 𝑷𝒓𝒐𝒄𝒆𝒔𝒔𝒐𝒓𝒔 𝑻𝒇 𝒇
𝑻(𝟏 − 𝒇) + (𝟏 − 𝒇) +
𝑵 𝑵

Conclusions:
If f is small: parallel processors has little effect
If N tends to infinity N ->∞: speedup is bound by 1/(1 – f) so, diminishing returns for using
more processors

7
Computer Architecture Fall 2018
Moore, cofounder of Intel in 1965, stated that number of transistors is doubled every year
and performance doubled every 18 months in a compounded rate of value 48% each year.

Assignment # 1

Given that, the final performance was 32. what was its initial value 3 years
before? (Detailed computation in backward direction).

Solution

8
Computer Architecture Fall 2018
Assignment #4

a) Given that  = 0.6 * RC, assuming that the specific resistance () is 0.8 Ohm.m, the
length of the wire (L) is 0.2m and the cross-section area of the wire (A) is 0.01m2. If
you have a wiring capacitance C = 10F.

• Compute the delay .

b) Suppose that Intel’ researchers tested another wiring material having different 2
with the same wiring length (L1=L2) and half cross-section area (A2= 0.5A1) with
double capacitance (C2 = 2C1). The result of the second delay experiment, was one-
fourth that of the first delay experiment (2= 0.25 1).

• Compute R2 and 2 for the new wiring material.

c) What will be the impact on performance and cost in case (b) compared to case (a)?

Solution

a)
Given:
Rules:
 = 0.6 * RC
P1 = 0.8 ohm.m 1 = 0.6 * RC
L1 = 0.2 m
R1 = (P * L)/A
A1 = 0.01 m2

C1 = 10F

(𝟎.𝟖∗𝟎.𝟐)
R1 = = 16 ohm
𝟎.𝟎𝟏

1 = 0.6 * 16 * 10 * 10-6 = 9.6 * 10-5 seconds

9
Computer Architecture Fall 2018
b)
Given:
P2 =??
∵ = 0.6 * RC

L2 = 0.2 m ∴ 2.4 * 10-5 = 0.6 * R2 * 20* 10-6


A2 = 0.005 m2 𝟐.𝟒∗𝟏𝟎−𝟓
∴ R2 = 𝟎.𝟔∗𝟐𝟎∗𝟏𝟎−𝟔 = 2 ohm
C2 = 20F
∵ R2 = (P * L)/A
𝟏 ∴ P2 = (R * A) / L
2 = 1 = 2.4 * 10-5 sec.
𝟒 𝟐∗𝟎.𝟎𝟎𝟓
R2=??
∴ P2 = 𝟎.𝟐
= 0.05 ohm.m

c)

Performance can be determined from the delay time ().


Cost can be determined from the cross-section area.

Case a Case b
P 0.8 ohm.m 0.5 ohm.m
L 0.2 m 0.2 m
A 0.01 m2 0.005 m2
C 10F 20F
R 16 ohms 2 ohms
T 9.6 * 10-5 sec 2.4 * 10-5 sec

Conclusion:
Using different wiring material in case 2 with less specific Resistance(P) Led to the decrease
of the Resistance (R) causing the decrease of delay time () so the performance increase and
the cost may increase depending on the chosen wiring material.
Here we find that Case (b) is better.

10
Computer Architecture Fall 2018
Assignment #5
Compute ri, rG based on the given table of running 12 benchmarks on reference m/c and
test system.
Benc # i Trefi Tsuti
1 15 18
2 14 19
3 13 25
4 16 19
5 12 17
6 18 21
7 11 13
8 15 20
9 16 19
10 14 22
11 17 23
12 14 18

Solution
Benc # i Trefi Tsuti ri
1 15 18 0.833
2 14 19 0.736
3 13 25 0.52 Trefi: execution time for benchmark i on
4 16 19 0.842 reference machine
5 12 17 0.705 Tsuti: execution time of benchmark i on test
6 18 21 0.857 system
7 11 13 0.846 ri: ratio of reference time to system run time
8 15 20 0.75
9 16 19 0.842
10 14 22 0.636
11 17 23 0.739
12 14 18 0.777
Overall performance calculated by averaging
ratios for all 12 integer benchmarks

RG =(∏𝑛𝑖=1 𝑟𝑖 )(1/n)
= (0.0318)1/12
= 0.75
11
Computer Architecture Fall 2018
Assignment 6
Compute the speedup based on the given table of running a program on 5 processors then
increasing the parallel portion f and rerun it on 50 and 500 processors.
Give your comments.
Case # No. of f 1-f
processors parallel Serial
portion portion
1 5 0.45 0.55
2 50 0.75 0.25
3 500 0.93 0.07

Solution
Use Amdahl’s law formula:

𝒕𝒊𝒎𝒆 𝒕𝒐 𝒆𝒙𝒄𝒖𝒓𝒆 𝒑𝒓𝒐𝒈𝒓𝒂𝒎 𝒐𝒏 𝒂 𝒔𝒊𝒏𝒈𝒍𝒆 𝒑𝒓𝒐𝒄𝒆𝒔𝒔𝒐𝒓 𝑻(𝟏 − 𝒇) + 𝑻𝒇 𝟏 ‫لو المسئلة‬


𝒔𝒑𝒆𝒆𝒅𝒖𝒑 = =
𝑻
= ‫جات في‬
𝒕𝒊𝒎𝒆 𝒕𝒐 𝒆𝒙𝒄𝒖𝒕𝒆 𝒑𝒓𝒐𝒈𝒓𝒂𝒎 𝒐𝒏 𝑵 𝒑𝒂𝒓𝒂𝒍𝒍𝒆𝒍 𝑷𝒓𝒐𝒄𝒆𝒔𝒔𝒐𝒓𝒔 𝒇 𝒇
𝑻(𝟏 − 𝒇) + (𝟏 − 𝒇) + ‫االمتحان‬
𝑵 𝑵
‫هتيجي‬
‫منغير‬
𝟏 ‫القانون‬
𝑩𝒐𝒖𝒏𝒅 =
𝟏−𝒇

Case # No. of f 1-f Speed up Bound


processors parallel Serial 𝟏 𝟏
portion portion 𝒇 (𝟏 − 𝒇)
(𝟏 − 𝒇) +
𝑵
default 1 0.45 0.55 1 1.818
1 5 0.45 0.55 1.562 1.818
2 50 0.75 0.25 3.773 4
3 500 0.93 0.07 13.915 14.285

Comment:
By increasing the parallel portion compared to the serial portion and increasing No. of
processor, we get higher speed up.

12
Computer Architecture Fall 2018
Lecture 2: Computer Function & Interconnection
At a top-level, we can explore the functionality of the computer system by describing its:
➢ Data and control signals: for each component, that exchange with other
components
➢ Interconnection structure: and its control
Studying the top-level structure and function to insight into:
➢ System bottlenecks (between CPU and Memory)
➢ Alternate pathways (using caches)
➢ Magnitude of system failure if a component fails
➢ Ease of adding performance enhancement

Computer Components
Top-level Major Components:
• CPU: which contains
1. Memory Address Register (MAR): specify the address in memory for the next read
or write
2. Memory Buffer Register (MBR): contains the data to be written into memory or
receives the data read from memory
3. I/O Address Register (I/O AR): specify I/O device
4. I/O Buffer Register (I/O BR): to exchange data between an I/O module and the CPU
5. PC: contains the address of the next instruction-pair to be fetched from memory
6. AC: hold temporarily operands and results of ALU operations
7. IR: contains the 8-bit opcode instruction being executed
• Memory module:

− consists of a set of locations, defined by sequentially numbered addresses.

− Each location can contain binary numbers that could be interpreted either an
instruction or data
• I/O module:

− transfer data from external devices to CPU and memory, and vice versa.

− It contains internal buffers for temporarily holding these data until they can be sent
on

13
Computer Architecture Fall 2018
Computer Function
Instruction Fetch & Execute
The basic function by the computer is the execution of a program, which consists of set of
instructions stored in memory in instruction cycle as follows:
1. Fetch Cycle: Processor reads (fetches) instruction (its address stored at its PC in 16-
bit word) from memory one at a time and load it into its IR.
2. Execute Cycle: Processor execute (perform operations) in each instruction by doing
one or mixture of actions (processor-memory, processor-I/O, Data processing and
Control).
Partial Program Execution Example
The next program fragment adds the contents of the memory word at address 940(0003) to
the contents of the memory word at address 941(0002) and stores the result (0005) in the
later location (941). Three instructions, which can be described as three fetch and three
execute cycles; are required:
1. The PC contains 300 which is the address of the first instruction. This instruction (the
value 1940 in hexadecimal) is loaded into the instruction register IR and the PC is
incremented (300 to 301)
2. The instruction (1940) is executed where the first hexa digit (which is 1) in the IR
indicate that the AC is loaded from memory. The remaining 12 bits (3 hexa digits)
which is (940) specify the address in the memory from which data (0003) is loaded
into AC.

14
Computer Architecture Fall 2018
3. The next instruction in memory (5941) traced by PC (301) is fetched from its location
(301) into IR and the PC is incremented (301 to 302). Note that 5 is 1010 means add
to AC from memory.
4. The old contents of AC (0003) and the contents of the location 941(0002) are added
and the result (0005) is stored in the AC
5. Now, the PC contains 302. So, the next instruction fetched from memory is (2941)
and loaded to IR. 0010 which is 2 in hexa means store AC to memory. PC is
incremented (302 to 303).
6. The contents of the AC (0005) are stored in the location 941(later location) as a result
of the performed addition operation

Memory CPU Registers Memory CPU Registers

Step 1 Step 2

Memory CPU Registers Memory CPU Registers

3+2 = 5

Step 3 Step 4

Memory CPU Registers Memory CPU Registers

Step 5 Step 6

15
Computer Architecture Fall 2018
Interrupts
1. Program Flow of Control without Interrupts
• Suppose that the processor is transferring data to a printer. After, each write
operation, the processor must pause and remain idle until the printer catches up
(wasting time with no memory use)
• The user program contains series of system calls in form of WRITE calls interleaved
with code processing segments 1,2,3 without use of I/O
• The I/O program consists of:
− The preparation code, a sequence of instructions (4) to prepare for the I/O
operation (copy data to buffer and prepare the device)
− The actual I/O command where the processor waits for the I/O device to perform
the operation with periodic testing instructions
− A sequence of instructions (5) to complete the operation with operation
success/failure flag

16
Computer Architecture Fall 2018
2. Program Flow of Control with Interrupts & short I/O wait
• With interrupts, processor can execute other instructions while an I/O operation is
in progress
• The user program contains series of system calls in form of WRITE calls
• The I/O program contains the preparation code & the actual I/O command. After
these few instructions have been executed, control returns to the user program
• Meanwhile, the external device is busy accepting data from computer memory and
printing it as an I/O operation and user program executes its instructions
concurrently
• When the external device is ready to accept more data from processor, its I/O
module sends an interrupt request signal to the processor indicating that it is ready
to be serviced (I am ready)
• The processor suspends its operation, branch off to an interrupt handler program (as
a part of the operating system) to service that particular I/O device and resume the
original execution after the device is serviced (2a,2b,3a,3b)
• Short time means, time for I/O operation is less than the time required to complete
the execution of instructions between write operation in the user program (Time of
I/O operation < 2b or 3b)

17
Computer Architecture Fall 2018
3. Program Flow of Control with Interrupts & Long I/O wait
• Consider the case of slow I/O device such as printer, where the I/O operation will
take much more time than executing a sequence of user instructions
• The user program will reach the second WRITE call before the I/O operation
completion required by the first call and the user program will hung up at this
point.
• When the preceding I/O operation is completed, the new WRITE call may be
processed, and the new I/O operation may be started.

18
Computer Architecture Fall 2018
Multiple Interrupts
• Suppose a program may be receiving data from a communications line (interrupt 1)
and printing results ( interrupt 2 )
• There are two approaches to handle:
1) Sequential interrupt processing: (FIFO) disable interrupts while an interrupt is
being processed until finishing and enable interrupts later before resuming
the user program to check for existence of interrupts
2) Nested Interrupt processing: follow a priority policy and allow an interrupt of
higher priority to cause a lower priority interrupt handler to be itself
interrupted

Transfer of control with Multiple Interrupts

19
Computer Architecture Fall 2018
Example Time Sequence of Multiple Interrupts (Nested interrupt processing)

Tabular Form
Step t Description

1 0 • Program begins

2 10 • Printer interrupt occurs


• Place user information on the system stack
• Execution continues at the printer interrupt service routine (ISR)
3 15 • Routine in (2) still executing
• Communication interrupt occurs higher priority than printer
4 20 • Routine in (3) still executing
• Disk interrupt occurs low priority than communication ISR, but priority
is higher than printer
5 25 • Communication ISR is completed
• Continue to execute disk ISR
6 35 • Disk ISR is completed
• Continue to execute printer ISR
7 40 • Routine completes
• Return to the user program

20
Computer Architecture Fall 2018
Assignment #7
Assume that a partial list of CPU opcode is as follows:
0001 Load AC from memory
0010 store AC to memory
0101 Add to AC from memory
0011 load AC from I/O
0111 Store AC to I/O

• If you have the following program:


1. Load AC from device 5 (in memory)
2. Add contents of memory location 940
3. Store AC to device 6 (in memory)
• Assume that the next value retrieved from device 5 is 4 and that location 940 contains a value of 3
Show the six program execution steps, expanding your description in each step to include
the use of the MAR and MBR
Solution

21
Computer Architecture Fall 2018
Assignment #8
Repeat the 6 program execution steps to subtract 3 from 5
0001 Load AC from memory
0010 store AC to memory
0101 Add to AC from memory
Solution
Without using MAR and MBR

22
Computer Architecture Fall 2018
Explanation:

An implementation of 3 instruction cycles has been made, each consists of fetch cycle (2
steps) and an execute cycle (2 steps).

Explanation of the 6 steps:

1. The PC contains the address of the first instruction 300 which has a value of 1620 in
hexa decimal, this instruction is loaded into the instruction register IR. the pc is
incremented by 1 Fetch
2. The AC is loaded from memory from the address 620 which contains the data (0011),
and the PC holds the address of the next instruction. Execute
3. The next instruction in memory 5621 traced by PC (301) is fetched from its location
(301) into IR. PC is incremented (301 302) Fetch
4. The old contents of AC (1110) and the contents of the location 620 (0011) are
subtracted and the result is stored in the AC. the Execute
5. Now the PC contains 302, so the next instruction 2621 is fetched from memory
location 302, loaded into IR and PC is incremented (302 to 303). Fetch
6. The last step, the content of the AC the AC are stored in the location (622) as the
result of the performed subtraction operation. The PC is incremented (302 303).

Execute

23
Computer Architecture Fall 2018
Assignment #9
Repeat the multiple interrupt example, with the same priorities; in the following
two cases:

a) interrupt comes first from printer interrupt service routine(P2), then from disk
interrupt service routine(P4) and finally from communication interrupt service
routine(P5). Given that times of execution of each interrupt is 10-unit times.
b) interrupt comes first from Printer interrupt service routine (P2) then from disk
interrupt service routine(P4), and finally from communication interrupt
service routine (P5). Given that times of execution of interrupt 2 is 10-unit
times, times of execution of interrupt 4 is 8-unit times and times of execution of
interrupt 5 is 12.
c) Interrupt comes first from communication interrupt service routine (P5) then
from printer interrupt service routine (P2), and finally from disk interrupt
service routine (P4). Given that times of execution of each interrupt is 10-unit
times.
Note: put your results in tabular form.

24
Computer Architecture Fall 2018
25
Computer Architecture Fall 2018
Lecture 3: Memory Organization

Contents:
1. Memory Hierarchy
2. Main Memory
3. Auxiliary Memory
4. Associative Memory
5. Virtual Memory
Memory Hierarchy:

• The memory unit is used for storing programs and data.


• Not all accommodated information is needed by the processor at the same time ;
So additional auxiliary storage units beyond the limited capacity of the main
memory are needed.
• The total memory capacity of a computer system can be visualized as being a
hierarchy of components.
• These hierarchy starts from
− Slow but high capacity Auxiliary Memory (magnetic disks and tapes):
▪ stores other information not in main memory and transfer them to
main memory when needed.
− To relatively faster Main Memory
▪ Holding Programs and data currently needed by the processor.
− To an smaller and faster Cache Memory accessible
▪ storing segments of programs currently being executed in CPU and
temporary data frequently needed in present calculations.
▪ It Doesn’t have direct access to CPU.
− To the high-speed processing logic

26
Computer Architecture Fall 2018
Main Memory (RAM+ROM):

• It is the central storage unit in the computer system consisting of two parts:
➢ RAM (Random access memory or read / write memory)
➢ ROM (Read only memory)
• The principle technology for the main memory is the semiconductor IC.
• Chips of IC RAM, as a volatile memory (contents changed when power goes off and
on again); are used for storing bulk of programs and data that are subject to change.
RAM Types:
1. Static RAM:
Consists of internal flip flops (an electronic circuit that has two stable states and thereby
is capable of serving as one bit of memory ) that store binary information as long as
power is applied to the unit.
Advantage:

• it is easier to use
• it has short read and write cycles

2. Dynamic RAM:
Stores binary information in the form of electric charges applied to capacitors, which
need periodic refreshment (cycling through the words every few millisecond) to
recharge it; since capacitors discharge with time
Advantage:

• it offers reduced power consumption


• it has larger storage capacity
ROM:
• ROM (read only memory) is used for storing:
1. Programs that are permanently resident in the computer (e.g. bootstrap loader
program)
2. Tables of constants that do not change in value once the production of the computer
is completed (e.g. π (PI),  (Tau) ).
• It is non-volatile memory (its contents remain unchanged after power is turned off and on
again)
RAM Chip Design ROM Chip Design

27
Computer Architecture Fall 2018
Cache Memory:
• The basic operation of the cache is as follows:
1. When CPU needs to access memory, cache memory is examined first to check
if the cache contains this word or not.
2. Otherwise, the main memory is accessed to read the word.
3. A block of words containing the one just accessed is then transferred from
the main memory to the cache memory (block may be 1-16 words)
4. Data related to this word is transferred such that the next reference to
memory find its needs
• Locality of reference: analysis of large number of typical programs has shown that
reference to memory at any given interval of time tend to be concentrated in a few
localized areas in memory.
• We can place the active portion of the program and data in a fast-small memory
(cache memory) -------------- > the average memory access
Time can reduced ----------------->the total execution time of the
Program is reduced---------- >enhanced performance.

Cache Memory Performance:


• The performance of the cache memory is measured in terms of a quantity called hit
ratio
• When the CPU refers to the main memory and finds its word in the cache, it said to
produce a hit.
Otherwise, it is in main memory and it produce a miss.

Cache read and write policies:

28
Computer Architecture Fall 2018
Auxiliary Memory:
• The most common auxiliary memory devices are:

magnetic disks and tapes


• Other auxiliary memory devices:
− magnetic drums
− magnetic bubble memory
− optical disks
• The important characteristics of each device is:
access mode, access time (seek time + transfer time), transfer rate, capacity, and cost
• Magnetic drums and disks consist of high-speed rotating surfaces
(cylinder for drums and round flat plate for disks) coated with a magnetic
recording medium.
• Disks have more surface available for recording than drums.
• Magnetic disk storage surface is divided into tracks and sectors.

Memory hierarchy connection in a computer system

Auxiliary Memory

Magnetic tapes
I/O processor Main memory

Magnetic disks

CPU Cache memory

29
Computer Architecture Fall 2018
Assignment #11
▪ Design a microprocessor system that has the following specifications:
1) 30 Kbytes ROM to hold the main program. The available ROM units are of 4
Kbytes each.
2) Eight RAM chips ( each of 2 Kbytes), to hold the sampled data and processing
results. The available RAM units are of 2 Kbytes each.
▪ Illustrate the ROM & RAM design issues
▪ Illustrate the worksheet for RAM(s) & ROM(s)

Solution
RAM Design Issues
1- Analyze the program RAM Requirements:
• We need to add the system 8 RAMs, each of a 2Kbyte capacity.
• 2K = 2*1024 = 2048 ( 0 2047) hex = 7FF
• 2K = 21 *210 =211 we will need 11 address lines to address each
chip. A1-A11

2- Determine the requirements of select lines:


• Since we have 8 RAMs chips 8=23 we will use 3 select
lines. A12 , A13 , A14
• A15 = 0 for RAM

3- Notice that the lowest and highest address lines for each chip don’t
change we wrote the Address lines don’t care <x>

4- Last memory address occupied by RAM= 03FFF

30
Computer Architecture Fall 2018
ROM Design Issues

1- Analyze the program ROM Requirements:


• We need 30 Kbytes capacity for memory storage.
• We can use 8 ROMs each of 4 Kbytes to have enough memory (32
Kbyte) to hold the program.
• 4K = 4 * 1024 = 4096 (0 4095) hex=FFF
• 4K = 2 * 2
2 10
2 12
we will need 12 address lines to address each
chip. A1-A12

2- Determine the requirements of select lines:


• Since we have 8 RAMs chips 8=23 we will use 3 select
lines. A13, A14, A15
• The address lines (A15, A16) are used to discriminate among the RAM
& ROM chips, A16=1 for RAM

3- Notice that the lowest and highest address lines for each chip don’t change
we wrote the Address lines don’t care <x>

4- First memory address occupied by ROM = 03FFF + 00001= 04000

31
Computer Architecture Fall 2018
RAM Chip Design

Chip Select 1 CS1

Chip Select 2 CS2

Chip Select 3 CS3 2048 x 8 8- bit data bus


Read RD
RAM
Write WR

11-bit address AD11


(211=2048)
Address space

ROM Chip Design

Chip Select 1 CS1

Chip Select 2 CS2


4096 x 8 8- bit data bus
Chip Select 3 CS3
ROM
12-bit address
(212=4096) AD12
Address space

32
Computer Architecture Fall 2018
Memory Address Map (MAM) for RAM

Compo- Hexadecimal Address Bus


nent Address 20 ---- 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
00000 0 ---- 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
RAM 1 007FF 0 ---- 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1
00800 0 ---- 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
RAM2
00FFF 0 ---- 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
01000 0 ---- 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
RAM3 017FF 0 ---- 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1
01800 0 ---- 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
RAM 4 01FFF 0 ---- 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
02000 0 ---- 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
RAM 5 027FF 0 ---- 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1
02800 0 ---- 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0
RAM 6 02FFF 0 ---- 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1
03000 0 ---- 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
RAM 7
037FF 0 ---- 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1
03800 0 ---- 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0
RAM 8
03FFF 0 ---- 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Memory Address Map (MAM) for ROM

Compo- Hexadecimal Address Bus


nent Address 20 ---- 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
04000 0 ---- 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
RAM 1 04FFF 0 ---- 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
05000 0 ---- 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
RAM2
05FFF 0 ---- 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
06000 0 ---- 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
RAM3 06FFF 0 ---- 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1
07000 0 ---- 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
RAM 4 07FFF 0 ---- 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
08000 0 ---- 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
RAM 5 08FFF 0 ---- 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1
09000 0 ---- 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0
RAM 6 09FFF 0 ---- 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1
0A000 0 ---- 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
RAM 7
0AFFF 0 ---- 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1
0B000 0 ---- 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
RAM 8 0BFFF 0 ---- 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

33
Computer Architecture Fall 2018
Assignment #14

Compute the hit ratio for the Cache memory if the CPU hits it 4420 times and miss it 80
times during a total of 4500 reference to the main memory.

If the time per hit is  100 ns, time per miss is  300 ns.
Compute the effective access time of the cache memory

Solution
𝑁𝑜. 𝑡𝑖𝑚𝑒𝑠 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑑 𝑤𝑜𝑟𝑑𝑠 𝑎𝑟𝑒 𝑖𝑛 𝑐𝑎𝑐ℎ𝑒
𝐻𝑖𝑡 𝑟𝑎𝑡𝑖𝑜 =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑚𝑒𝑚𝑜𝑟𝑦 𝑎𝑐𝑐𝑒𝑠𝑠𝑒𝑠
(#ℎ𝑖𝑡𝑠)(𝑇𝑖𝑚𝑒 𝑝𝑒𝑟 ℎ𝑖𝑡) + (#𝑚𝑖𝑠𝑠𝑒𝑠)(𝑇𝑖𝑚𝑒 𝑝𝑒𝑟 𝑚𝑖𝑠𝑠)
𝐸𝑓𝑓. 𝑎𝑐𝑐𝑒𝑠𝑠 𝑡𝑖𝑚𝑒 =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑚𝑒𝑚𝑜𝑟𝑦 𝑎𝑐𝑐𝑒𝑠𝑠𝑒𝑠

Given:
• #hits = 4420 times
• #misses= 80 times
• Total number of memory access = 4500 times
• Time per hit  100 ns
• Time per miss  300 ns
Answer:
4420
➢ Hit Ratio = = 0.98
4500
(4420)(100∗10−9 )+(80)(300∗10−9 )
➢ Eff. Access time = = 1.035 *10-7 sec. =103.5 ns.
4500

34
Computer Architecture Fall 2018
Lecture 4: Computer Arithmetic Basic Concept
• Arithmetic is branch of mathematics concerned with the study of numbers and their
properties

• Modular or modulo arithmetic, sometimes known as residue arithmetic or clock


arithmetic, can take only a specific number of digits, whatever the value.

• For example, in modulo 4 (mod 4) the only values any number can take are 0, 1, 2, or 3.

• In this system, 7 is written as 3 mod 4, and 35 is also 3 mod 4. Notice 3 is the residue, or
remainder, when 7 or 35 is divided by 4.

• To obtain results for solutions of computational problems, we use arithmetic processor


to perform four basic arithmetic operations:
− addition
− subtraction
− multiplication
− division
• Significant issues:
− fixed point arithmetic(for integers and fractions)
− floating point arithmetic
− overflow and underflow
− handling of signed numbers
− Performance

Fixed Point Representation


▪ When an integer binary number is positive , the sign is represented by 0 and the
magnitude by a positive binary number.
▪ When the number is negative , the sign is represented by 1 but the rest of the number
may be represented in one of the next possible ways:
1. Signed-magnitude representation: needs comparison of the signs and magnitudes
and then performing either addition or subtraction
2. Signed-1’s complement representation: it is only used in old computers for logical
operations
3. Signed-2’s complement representation: does not need comparisons or subtractions,
only addition and complementation)
Example:

Hardware for signed-magnitude Addition and subtraction


35
Computer Architecture Fall 2018
For Addition

Flip flop to hold sign of B number

Bs B register

B
Add overflow flip flop to hold
overflow bit
AVF Complementer M (mode control)
(Exclusive-Or gates)
M=0
(B x M + B x M) =B

B or complement of B
depending on mode M
Output carry flip flop to
hold output carry bit Output
carry Parallel adder input carry=0
E (full adder circuits)
Input carry

Sum=A+B+0
S A

As A register Load sum

Flip flop to hold sign of A number

36
Computer Architecture Fall 2018
For Subtraction

Flip flop to hold sign of B number

Bs B register

B
Add overflow flip flop to hold
overflow bit
AVF Complementer M (mode control)
(Exclusive-Or gates)
M=1
Comp of B = B

B or complement of B
depending on mode M
Output carry flip flop to
hold output carry bit Output
carry Parallel adder input carry=1
E (full adder circuits)
Input carry

S= A +B +1 A+ 2’s comp of B  A-B


S A

As A register Load sum

Flip flop to hold sign of A number

37
Computer Architecture Fall 2018
Arithmetic Addition/Subtraction for signed-2’s complement

+6 00000110 (6)10 -6 11111010 (256 -6)10 = (250)10


+13 00001101 (13)10 +13 00001101 (13)10
+19 00010011 (19)10 +7 00000111 (+ 7)10

+6 00000110 (6)10 -6 11111010 (256 -6)10 = (250)10


-13 11110011 (256 -13)10 = (243)10 -13 11110011 (256 -13)10 = (243)10
-7 11111001 (249)10 = (256 -7)10 -19 11101101(237)10 = (256 -19)10

Only addition is performed where we add the two numbers , including their sign bits , and
discard any carry out of the sign bit position.
Negative numbers must initially be in 2’s complement and that if the sum obtained after the
addition is negative, it is in 2’s complement form

38
Computer Architecture Fall 2018
Assignment #15

Given that:
X = 1010100 = (84)10 , Y = 1000011 = (67) 10
1. Compute using 2’s complement X-Y =?
2. Compute using 2’s complement Y-X =?

Solution
1. X-Y = (+X)+(-Y)
X= 1010100 (84)10
2’s complement of Y = 10111101 (256-67)10 =(189)10
1 1 1 1
𝟎 𝟏 𝟎 𝟏 𝟎 𝟏 𝟎 𝟎
+ 𝟏 𝟎 𝟏 𝟏 𝟏 𝟏 𝟎 𝟏
Carry Discarded 1 0 0 0 1 0 0 0 1 = (17)10
Therefore X-Y=(17)10

2. Y-X = (+y)+(-x)
X= 1000011 (67)10
2’s complement of X (-x) = 10101100 (256-84)10 =(172)10

𝟎 𝟏 𝟎 𝟎 𝟎 𝟎 𝟏 𝟏
+ 𝟏 𝟎 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎
There is no carry 1 1 1 0 1 1 1 1 = (239)10=(256-17)10

Therefore Y-X =(-17)10

39
Computer Architecture Fall 2018
Types of adders
1. A half Adder: is a logical circuit that performs an addition operation on
two binary digits. The half adder produces a sum and a carry value which
are both binary digits.
has two inputs, generally labeled A and B, and two outputs, the sum S
and carry C. S is the two-bit XOR of A and B, and C is the AND of A and B.
Essentially the output of a half adder is the sum of two one-bit numbers,
with C being the most significant of these two outputs.

S= A  B
C= A.B

2. A full Adder: is a logical circuit that performs an addition operation on


three binary digits. The full adder produces a sum and carry value, which
are both binary digits.
has three inputs - A, B, and a carryin C, such that multiple adders can be
used to add larger numbers. To remove ambiguity between the input
and output carry lines, the carry in is labeled Ci or Cin while the carry out
is labeled Co or Cout.

40
Computer Architecture Fall 2018
3. Ripple Carry Adder(4 bit binary adder)
• Add microoperation is an elementary addition operation performed
on data stored in registers
• Two binary numbers A and B are added from right to left, creating a
sum and a carry at the outputs of each full adder for each bit
position.

4. Larger Adders
• A 16-bit adder can be made up of a cascade of four 4-bit ripple-carry
adders.

41
Computer Architecture Fall 2018
Multiplication and Division
• No problem with unsigned (always positive) numbers, just use the same standard
techniques as in base 10 (remembering that x[n] × y[n] = z[2n])

Multiplication
Example 1: (1100101)2 × (111101)2 = (101)10 × (61)10
Solution 1: we add all results in one step

1 1 0 0 1 0 1
× 1 1 1 1 0 1
1 1 0 0 1 0 1
0 0 0 0 0 0 0
+ 1 1 0 0 1 0 1
1 1 0 0 1 0 1
1 1 0 0 1 0 1
1 1 0 0 1 0 1
1 1 0 0 0 0 0 0 1 0 0 0 1
Remember:

1+1= 10 = (2)10
(1100000010001)2 = (6161)10
1+1+1=011 = (3)10
1+1+1+1=100 = (4)10
Solution 2: (Intermediary Results) we add results every step

1 1 0 0 1 0 1
x 1 1 1 1 0 1
1 1 0 0 1 0 1
+ 0 0 0 0 0 0 0
1 1 0 0 1 0 1
+ 1 1 0 0 1 0 1
1 1 1 1 1 1 0 0 1
+ 1 1 0 0 1 0 1
1 0 1 0 0 1 0 0 0 0 1
+ 1 1 0 0 1 0 1
1 0 1 1 0 1 1 1 0 0 0 1
+ 1 1 0 0 1 0 1
1 1 0 0 0 0 0 0 1 0 0 0 1

(1100000010001)2 = (6161)10

42
Computer Architecture Fall 2018
Example 2: Multiplication of unsigned Integers
Multiplication of two 4-bit unsigned binary integers produces an 8-bit result

Example 3: Multiplication of Signed Integers

128 64 32 16 8 4 2 1

26 25 24 23 22 22 21 20

43
Computer Architecture Fall 2018
Division
• The division algorithm is a theorem in mathematics which precisely expresses the
outcome of the usual process of division of integers.
• In particular, the theorem asserts that integers called the quotient q and remainder r
always exist and that they are uniquely determined by the dividend a and divisor d,
with d ≠ 0.
• Formally, the theorem is stated as follows:
There exist unique integers q and r such that: a = qd + r and 0 ≤ r < | d |, where | d |
denotes the absolute value of d.

Example 1: (100101)2 ÷ (101)2 = (37)10 ÷ (5)10

‫المقسوم عليه‬ ‫ناتج القسمة‬

divisor = 5 quotient = 7

0 0 0 1 1 1 ‫المقسوم‬
1 0 1 01 10=2 0 0 1 0 1 Dividend a = 37
- 1 0 1
0 01 10 10 20
- 1 0 1
0 0 1 1 1
- 1 0 1 ‫باقي القسمة‬
1 0 remainder= 210

Watch this helpful Video: https://www.youtube.com/watch?v=RTJ5iGqkm3k

44
Computer Architecture Fall 2018
Remember
Remember

45
Computer Architecture Fall 2018
Assignment #16
Using the truth table and circuit diagram for the half adder and full adder, compute:

a. The function of S and C for the half adder


b. The function of S and Cout for the full adder

Solution
a. Half Adder

Output of half adder circuit


Input Output
A B Sum (XOR gate) Carry (AND gate)
0 0 0+0= 0 0.0 = 0 0
0 1 0+1=1 0.1 = 0 1
1 0 1+0=1 1.0 = 0 2
1 1 1+1=0 1.1 = 1 3

Deriving the equation of half adder from K-Map


K- Map for sum K- Map for Carry
B B
0 1 0 1
0 1 1
0 0 0 1

A 1 2 1 3
A 1 2 3 1

Equation of Sum (XOR)= A`B + AB` Equation of Carry (AND)= A.B

Truth table of Half-Adder circuit

Input Sum Carry


A B A' B' A'B AB' A`B + AB` A.B
0 0 1 1 0 0 0 0
0 1 1 0 1 0 1 0
1 0 0 1 0 1 1 0
1 1 0 0 0 0 0 1

46
Computer Architecture Fall 2018
Detailed Calculations

Essential Output of half adder is sum of two one-bit numbers:

CASE 1: Inputs: A=0 , B=0


𝑆𝑢𝑚 = 𝐴 𝐵 = (𝐴. 𝐵̅ ) + (𝐴̅. 𝐵) = (0.1) + (1.0) = 0
𝐶𝑎𝑟𝑟𝑦 = 𝐴. 𝐵 = 0.0 = 0

CASE 2: Inputs: A=0 , B=1

𝑆𝑢𝑚 = 𝐴 𝐵 = (𝐴. 𝐵̅ ) + (𝐴̅. 𝐵) = (0.0) + (1.1) = 1

𝐶𝑎𝑟𝑟𝑦 = 𝐴. 𝐵 = 0.1 = 0

CASE 3: Inputs: A=1 , B=0


𝑆𝑢𝑚 = 𝐴 𝐵 = (𝐴. 𝐵̅ ) + (𝐴̅. 𝐵) = (1.1) + (0.0) = 1
𝐶𝑎𝑟𝑟𝑦 = 𝐴. 𝐵 = 1.0 = 0
CASE 4: Inputs: A=1 , B=1
𝑆𝑢𝑚 = 𝐴 𝐵 = (𝐴. 𝐵̅ ) + (𝐴̅. 𝐵) = (1.0) + (0.1) = 0

𝐶𝑎𝑟𝑟𝑦 = 𝐴. 𝐵 = 1.1 = 1

47
Computer Architecture Fall 2018
b. Full Adder

AB
ABCin

(AB).Cin
(A.B)+((AB).Cin)

A.B

• We know that the essential output of full adder circuit is sum of two-one bit number with an
input carry, It is used in circuits that adds a sequence of bits by connecting multiple full adders
together.

Output of full adder circuit


Input Output
A B Cin Sum Carry
0 0 0 0 0 0
0 0 1 1 0 1
0 1 0 1 0 2
0 1 1 0 1 3
1 0 0 1 0 4
1 0 1 0 1 5
1 1 0 0 1 6
1 1 1 1 1 7

Deriving the equation of half adder from K-Map

K- Map for sum K- Map for Carry


B B

00 01 11 10 00 01 11 10
0 1 3 2 0 1 3 1 2
0 1 1 0
6 4 5 7 6
A 1
4
1 5 7
1 A 1 1 1 1

Cin Cin
n
Equation of Sum Equation of Carry

= AB`C`in+A`B`Cin+ABCin+A`BC`in = ACin+BCin+AB

48
Computer Architecture Fall 2018
Truth table of Full-Adder circuit

Input Sum Cout


A.Cin+ BCin+
A B Cin A` B` C`in AB`C`in A`B`Cin ABCin A`BC`in ABCin A.Cin BCin A.B A.B
0 0 0 1 1 1 0 0 0 0 0 0 0 0 0
0 0 1 1 1 0 0 1 0 0 1 0 0 0 0
0 1 0 1 0 1 0 0 0 1 1 0 0 0 0
0 1 1 1 0 0 0 0 0 0 0 0 1 0 1
1 0 0 0 1 1 1 0 0 0 1 0 0 0 0
1 0 1 0 1 0 0 0 0 0 0 1 0 0 1
1 1 0 0 0 1 1 0 0 0 0 0 0 1 1
1 1 1 0 0 0 0 0 1 0 1 1 1 1 1

Detailed Calculations

Essential Output of Full adder is sum of two one-bit numbers with input carry

CASE 1: Inputs: A=0 , B=0 , Cin= 0


𝑆𝑢𝑚 = 𝐴 𝐵 𝐶 = 𝐴𝐵̅ 𝐶𝑖𝑛
̅ + 𝐴̅𝐵̅ 𝐶𝑖𝑛 + 𝐴𝐵𝐶𝑖𝑛 + 𝐴̅𝐵𝐶𝑖𝑛
̅

= 0.1.1 + 1.1.0 + 0.0.0 + 1.0.1 = 0 + 0 + 0 + 0 = 0


𝐶𝑎𝑟𝑟𝑦 = 𝐴. 𝐶𝑖𝑛 + 𝐵. 𝐶𝑖𝑛 + 𝐴. 𝐵 = 0.0 + 0.0 + 0.0 = 0 + 0 + 0 = 0

CASE 2: Inputs: A=0 , B=0 , Cin= 1


𝑆𝑢𝑚 = 𝐴 𝐵 𝐶 = 𝐴𝐵̅ 𝐶𝑖𝑛
̅ + 𝐴̅𝐵̅ 𝐶𝑖𝑛 + 𝐴𝐵𝐶𝑖𝑛 + 𝐴̅𝐵𝐶𝑖𝑛
̅

= 0.1.0 + 1.1.1 + 0.0.1 + 1.0.0 = 0 + 1 + 0 + 0 = 1


𝐶𝑎𝑟𝑟𝑦 = 𝐴. 𝐶𝑖𝑛 + 𝐵. 𝐶𝑖𝑛 + 𝐴. 𝐵 = 0.1 + 0.1 + 0.0 = 0 + 0 + 0 = 0

.
.
.
.
CASE 8: Inputs: A=1 , B=1 , Cin= 1
𝑆𝑢𝑚 = 𝐴 𝐵 𝐶 = 𝐴𝐵̅ 𝐶𝑖𝑛
̅ + 𝐴̅𝐵̅ 𝐶𝑖𝑛 + 𝐴𝐵𝐶𝑖𝑛 + 𝐴̅𝐵𝐶𝑖𝑛
̅

= 1.0.0 + 0.0.1 + 1.1.1 + 0.1.0 = 0 + 0 + 1 + 0 = 1


𝐶𝑎𝑟𝑟𝑦 = 𝐴. 𝐶𝑖𝑛 + 𝐵. 𝐶𝑖𝑛 + 𝐴. 𝐵 = 1.1 + 1.1 + 1.1 = 1 + 1 + 1 = 1

49
Computer Architecture Fall 2018
Lecture 5: Arithmetic Logic Unit (ALU)
Contents:
1. ALU Structure
2. ALU Design
3. ALU Integrated Circuit
1. ALU Structure
• ALU is a combinational circuit that performs:
1. Arithmetic Operations:
These include addition, subtraction, multiplication, division, and negation.
2. Logical Operations:
These include AND, OR, XOR, complement, and sign extension.
3. Miscellaneous operations:
These include rotate left/right, shift left/right and comparison.

• The ALU has a number of selection lines used to determine the operation to be
performed
• The selection lines are decoded within the ALU, so the K-selection lines can specify
up to 2k distinct operation
• The n data inputs from A are combined with the n-data input from B to generate
the result of an operation at the G outputs.
• The mode-select input S distinguishes between arithmetic and logic operations
2
• The two operation select inputs S and S and the carry input C specify the eight
1 0 in
arithmetic operations with S = 0.
2
• Operand select input S and S specify the four logic operations with S = 1.
0 1 2

50
Computer Architecture Fall 2018
2. ALU Design
Design of ALU is implemented in three stages:

Stage 1
Arithmetic section

Stage 2
Logic section

Stage 3
Integration to form the ALU

Stage 1: Arithmetic section

Block Diagram of an Arithmetic Circuit

51
Computer Architecture Fall 2018
Function Table for Arithmetic Circuit

equal : -1 in 2’s complement representation 2’s complement of B

B input logic for one stage of Arithmetic Circuit

Truth
table

52
Computer Architecture Fall 2018
Stage 2: Logic section (one stage of logic circuit)

Function Table

53
Computer Architecture Fall 2018
Stage 3: Integration (One stage of ALU)

Arithmetic and Logic Functions of ALU


Operation Select

S2 S1 S0 Cin Operation Function

0 0 0 0 G=A Transfer A

0 0 0 1 G=A+1 Increment A

0 0 1 0 G=A+B Addition

0 0 1 1 G=A+B+1 Add with carry input of 1

0 1 0 0 G=A+B A plus 1’s complement of B

0 1 0 1 G = A – B +1 = A-B Subtraction

0 1 1 0 G=A-1 Decrement A

0 1 1 1 G=A Transfer A

1 0 0 X G=A٨B AND

1 1 0 X G=A٧B OR

1 0 1 X G=A B XOR

1 1 1 X G=A NOT (1’s complement)

54
Computer Architecture Fall 2018
Assignment # 17
Using slide 11, justify the truth table in slide 12 including the 8 arithmetic
functions and the 4 logic functions.

Answer
Integration (one stage of ALU)

55
Computer Architecture Fall 2018
Integration (one stage of ALU) in details

Arithmetic and Logic Functions of ALU


Operation Select
S2 S1 S0 Cin Operation Function
0 0 0 0 G=A Transfer A
0 0 0 1 G=A+1 Increment A
0 0 1 0 G=A+B Addition
0 0 1 1 G=A+B+1 Add with carry input of 1
0 1 0 0 G=A+B A plus 1’s complement of
0 1 0 1 G = A + B +1 = A-B Subtraction
0 1 1 0 G=A-1 Decrement A
0 1 1 1 G=A Transfer A
1 0 0 X G=A٨B AND
1 1 0 X G=A٧B OR
1 0 1 X G=A B XOR
1 1 1 X G=A NOT (1’s complement)

56
Computer Architecture Fall 2018

You might also like