0% found this document useful (0 votes)

33 views

L-2 (Computer Performance)

The document discusses computer performance and factors that influence it such as clock cycle time, number of instructions, and average cycles per instruction. It defines key performance metrics like response time and throughput. It also covers topics like Amdahl's law, benchmarks, and trends like the power wall that limit increasing single thread performance.

Uploaded by

jubairahmed1678

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

L-2 (Computer Performance)

Uploaded by

jubairahmed1678

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

CSE 213

Computer Architecture

Lecture 2: Computer Performance

Military Institute of Science

and Technology
Performance

• Performance is the key to understanding underlying motivation for the

hardware and its organization

• Why is some hardware better than others for different programs?

• What factors of system performance are hardware related?

(e.g., do we need a new machine, or a new operating system?)
Performance

3
What do we measure?
Define performance….

• How much faster is the Concorde compared to the 747?

• How much bigger is the Boeing 747 than the Douglas DC-8?
Defining Performance

5
Computer Performance: TIME, TIME,
TIME!!!
• Response Time (elapsed time, latency):
• How long does it take to complete (start to finish) a task? Individual user
• Eg: how long must I wait for the database query? concerns…

Individual is more interested in response time. As a user of a smart phone/laptop,

the one that responds faster is better!

Response time (computer ): the total time required by computer to complete a

task including :
Disk access Memory access I/O activities OS overheads CPU exec. time etc
Computer Performance: TIME, TIME,
TIME!!!
• Throughput:
• Total work done per unit time……(per hr,day etc)
• how many jobs can the machine run at once? Systems
manager
• what is the average execution rate? concerns…
• how much work is getting done?
Response Time and Throughput
• If we upgrade a machine with a new processor what do we increase?

8
Relative Performance

9
Relative Performance

10
Execution Time
• Elapsed Time
• counts everything (disk and memory accesses, waiting for I/O, running other
programs, etc.) from start to finish
• a useful number, but often not good for comparison purposes
Elapsed time = CPU time + wait time (I/O, other programs, etc.)

• CPU time
• doesn't count waiting for I/O or time spent running other programs
• can be divided into user CPU time and system CPU time (OS calls)
CPU time = user CPU time + system CPU time

• Our focus:
• user CPU time (CPU execution time or, simply, execution time)
• time spent executing the lines of code that are in our program
• For easier writing, user CPU time has been termed simply as CPU time in rest of
the studies.
Execution Time
CPU Clocking
Clock cycle. The speed of a computer processor, or CPU, is determined
by the clock cycle, which is the amount of time between two pulses of an
oscillator.

13
CPU Time

Performance Equation - I

14
Example

• Our favorite program runs in 10 seconds on computer A, which has a

2Ghz. clock.
• We are trying to help a computer designer build a new machine B, that
will run this program in 6 seconds. The designer can use new (or
perhaps more expensive) technology to substantially increase the clock
rate, but has informed us that this increase will affect the rest of the CPU
design, causing machine B to require 1.2 times as many clock cycles as
machine A for the same program.

• What clock rate should we tell the designer to target?

CPU Time Example

16
CPU Time Example

17
CPU Time Example

18
Instruction Count and CPI

Performance Equation - II 19
Factors Influencing Performance

Execution time = clock cycle time x number of instrs x avg CPI

• Clock cycle time: manufacturing process (how fast is each

transistor), how much work gets done in each pipeline stage
(more on this later)

• Number of instrs: the quality of the compiler and the

instruction set architecture

• CPI: the nature of each instruction and the quality of the

architecture implementation

20
CPU Time Example

21
CPU Time Example

22
CPU Time Example

23
Self Help

• Suppose we have two implementations of the same instruction set

architecture (ISA). For some program:
• machine A has a clock cycle time of 10 ns. and a CPI of 2.0
• machine B has a clock cycle time of 20 ns. and a CPI of 1.2

• Which machine is faster for this program, and by how much?

• If two machines have the same ISA, which of our quantities (e.g., clock
rate, CPI, execution time, # of instructions, MIPS) will always be
identical?
CPI Example
• A compiler designer is trying to decide between two code sequences for
a particular machine.
• Based on the hardware implementation, there are three different classes
of instructions: Class A, Class B, and Class C,

• Which code sequence has the most instructions? Which sequence will be
faster? How much? What is the CPI for each sequence?
CPI Example
Which code sequence has the most instructions?

Which sequence will be faster?

What is the CPI for each sequence

Self Help

• Two different compilers are being tested for a 500 MHz. machine with
three different classes of instructions: Class A, Class B, and Class C,
which require 1, 2 and 3 cycles (respectively). Both compilers are used
to produce code for a large piece of software.
• Compiler 1 generates code with 5 billion Class A instructions, 1 billion
Class B instructions, and 1 billion Class C instructions.
• Compiler 2 generates code with 10 billion Class A instructions, 1 billion
Class B instructions, and 1 billion Class C instructions.

• Which sequence will be faster according to MIPS?

• Which sequence will be faster according to execution time?
Example
Benchmarks

• Performance best determined by running a real application

• use programs typical of expected workload
• or, typical of expected class of applications
e.g., compilers/editors, scientific applications, graphics, etc.

• Benchmark suites
• Each vendor announces a SPEC rating for their system
• a measure of execution time for a fixed collection of programs
• is a function of a specific CPU, memory system, IO system, operating
system, compiler enables easy comparison of different systems

• The key is coming up with a collection of relevant programs

SPEC (System Performance Evaluation
Corporation)
• Sponsored by industry but independent and self-managed – trusted by
code developers and machine vendors
• Clear guides for testing, see www.spec.org
• Regular updates (benchmarks are dropped and new ones added
periodically according to relevance)
• Specialized benchmarks for particular classes of applications
SPEC CPU

• The 2006 version includes 12 integer and 17 floating-point applications

• The SPEC rating specifies how much faster a system is, compared to
a baseline machine – a system with SPEC rating 600 is 1.5 times
faster than a system with SPEC rating 400

• Note that this rating incorporates the behavior of all 29 programs – this
may not necessarily predict performance for your favorite program!

31
Summary

• Performance is specific to a particular program

• total execution time is a consistent summary of performance
• For a given architecture performance increases come from:
• increases in clock rate (without adverse CPI affects)
• improvements in processor organization that lower CPI
• compiler enhancements that lower CPI and/or instruction count
Important Trends

• Running out of ideas to improve single thread performance

• Power wall makes it harder to add complex features

• Power wall makes it harder to increase frequency

33
34
Power Wall
Power Wall

The energy of a pulse during the logic

transition of 0 → 1 → 0 or 1 → 0 → 1
Energy ∝ Capacitive
load X Voltage2
The energy of a single transition
Energy ∝ ½ X Capacitive
load X Voltage2
The power required per transistor
Power ∝ ½ X Capacitive load 36
XVoltage2 X Frequency switched
Power Wall
Power Wall

The energy of a pulse during the logic

39
+ ■ Gene Amdahl [AMDA67]

■ Deals with the potential speedup of a

program using multiple processors
compared to a single processor
Amdahl’s ■ Illustrates the problems facing industry
in the development of multi-core
Law machines
■ Software must be adapted to a highly
parallel execution environment to
exploit the power of parallel
processing

■ Can be generalized to evaluate and

design technical improvement in a
computer system
Amdahl’s law

Amdahl’s law states that in parallelization

❑P is the proportion of a system or program that can be made parallel

❑1-P is the proportion that remains serial
❑Then the maximum speedup that can be achieved using N number of
processors is 1/((1-P)+(P/N).

If N tends to infinity then the maximum speedup tends to 1/(1-P).

If a program needs 20 hours using a single processor core, and a particular

part of the program which takes one hour to execute cannot be parallelized, If
there are no limitations of using processors, then calculate the minimum
execution time of the program.
41
Amdahl’s law

If a program needs 20 hours using a single processor core, and a particular part of the program
which takes one hour to execute cannot be parallelized, If there are no limitations of using
processors, then calculate the minimum execution time of the program.

Answer:
while the remaining 19 hours (p = 0.95) of execution time can be parallelized, then regardless of
how many processors are devoted to a parallelized execution of this program, the minimum
execution time cannot be less than that critical one hour. Hence, the theoretical speedup is limited
to at most 20 times (1/(1 − p) = 20).

42
+
Amdahl’s Law
+
Little’s Law
■ Fundamental and simple relation with broad applications

■ Can be applied to almost any system that is statistically in

steady state, and in which there is no leakage

■ Queuing system
■ If server is idle an item is served immediately, otherwise an
arriving item joins a queue
■ There can be a single queue for a single server or for multiple
servers, or multiples queues with one being for each of multiple
servers
+
Little’s Law
■ Average number of items in a queuing system equals the
average rate at which items arrive multiplied by the time that
an item spends in the system
■ Relationship requires very few assumptions
■ Because of its simplicity and generality it is extremely useful

Number of items in the system (L)= the rate items enter

and leave the system (A / arrival rate / departure rate /
throughput / λ) x the average amount of time items spend
in the system (w/ lead time)

L = A xW
46

Little’s Law

You are estimating number of threads required by your server to execute clients requests
efficiently and initially you starts 4 threads on the server. Request arrival rate on your server is 4
request/sec and each request takes fixed amount of time to complete with following time
descriptions. The arrival rate is fixed and all new request arrivals have fixed service time.

Request_1= 0.2 sec; Request_2 =0.9 sec; Request_3=0.6 sec Request_4=0.5 sec

What improvement factor should you think for the maximization of your thread uses?
47

Little’s Law

Since each request would be assigned to each thread so no waits are there so Average Response
time is (0.2+0.9+0.6+0.5)/4=0.55

Request arrival rate is 4.

Applying Little's law to estimate required threads on server to serve requests is now as follows:

Required threads on server =Request arrival rate ∗ average response time=4∗(0.55)=2.2

So, there should be 2 threads only for request arrivals of 4 req/sec with given service times,

and in this case any 2 requests must wait on each cycle of arrivals because we have only 2
resources (threads) and arrival rate is 4 requests/sec. So any 2 requests must wait and this
results in increased response time. And two 2 threads on the server will always remain idle.

L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
52 pages
CSE 332 L4 - 14 Nov 2020
No ratings yet
CSE 332 L4 - 14 Nov 2020
41 pages
02 Performance
No ratings yet
02 Performance
23 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
Computer Performance
No ratings yet
Computer Performance
22 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
28 pages
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
No ratings yet
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
23 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
17 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
Computer Organization The Role of Performance
No ratings yet
Computer Organization The Role of Performance
45 pages
Performance
No ratings yet
Performance
51 pages
Module 2 [26-10-2024]
No ratings yet
Module 2 [26-10-2024]
50 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
Performance Measures For Computers
No ratings yet
Performance Measures For Computers
53 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
The Role of Performance: Chapter - 2
No ratings yet
The Role of Performance: Chapter - 2
40 pages
Chapter 1 Performance
No ratings yet
Chapter 1 Performance
32 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
18 pages
Lecture 02 CH01 Performance Power
No ratings yet
Lecture 02 CH01 Performance Power
76 pages
Quatitative Principle
No ratings yet
Quatitative Principle
56 pages
CH 02a-Computer Performance
No ratings yet
CH 02a-Computer Performance
22 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
Inroduction and Performance Analysis
No ratings yet
Inroduction and Performance Analysis
29 pages
DA_CI
No ratings yet
DA_CI
13 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
28 pages
Computer Performance
No ratings yet
Computer Performance
18 pages
Intro
No ratings yet
Intro
14 pages
C A Lecture-3
No ratings yet
C A Lecture-3
41 pages
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
No ratings yet
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
52 pages
Lecture # 2
No ratings yet
Lecture # 2
33 pages
Performance of Processor1
No ratings yet
Performance of Processor1
9 pages
CCE 131 Lecture1
No ratings yet
CCE 131 Lecture1
26 pages
Cse - 321 - 2
No ratings yet
Cse - 321 - 2
37 pages
09 Perf
No ratings yet
09 Perf
22 pages
CS5204/EE5364 - Advanced Computer Architecture - Performance
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Performance
56 pages
Computer Architecture Measurement
No ratings yet
Computer Architecture Measurement
26 pages
CSC232 - Chp1 (Compatibility Mode)
No ratings yet
CSC232 - Chp1 (Compatibility Mode)
50 pages
Lesson 3 - Computing For Performance
No ratings yet
Lesson 3 - Computing For Performance
38 pages
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
No ratings yet
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
52 pages
Computer Performance
No ratings yet
Computer Performance
17 pages
Lecture 3: Performance/Power, MIPS Instructions
No ratings yet
Lecture 3: Performance/Power, MIPS Instructions
18 pages
ACA Lec2 New
No ratings yet
ACA Lec2 New
44 pages
Lecture 2: Performance/Power, MIPS Instructions
No ratings yet
Lecture 2: Performance/Power, MIPS Instructions
28 pages
Lecture Ch4 Performance
No ratings yet
Lecture Ch4 Performance
25 pages
L14 Introduction To Performance Evaluation
No ratings yet
L14 Introduction To Performance Evaluation
48 pages
Performance Chap4
No ratings yet
Performance Chap4
20 pages
SEN307-Lecture-5
No ratings yet
SEN307-Lecture-5
34 pages
Computer Architecture 2
No ratings yet
Computer Architecture 2
17 pages
3310
No ratings yet
3310
26 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
Ilovepdf_merged (4) 36 274 Converted
No ratings yet
Ilovepdf_merged (4) 36 274 Converted
120 pages
2_Computer Organization and Architecture
No ratings yet
2_Computer Organization and Architecture
21 pages
Module 3.3 - Problems On Performance
No ratings yet
Module 3.3 - Problems On Performance
54 pages
Assessing and Understanding Performance
No ratings yet
Assessing and Understanding Performance
31 pages
Computer Architecture Measuring Performance
No ratings yet
Computer Architecture Measuring Performance
33 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
Lec 3
No ratings yet
Lec 3
21 pages
Lecture - 4 - Performance
No ratings yet
Lecture - 4 - Performance
31 pages
Cpu Performance
No ratings yet
Cpu Performance
13 pages
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
Hotel Information MAnagement Sysrem
No ratings yet
Hotel Information MAnagement Sysrem
128 pages
Design Strategies For Signal Integrity Power Integrity and EMI EMC Issues in Com
No ratings yet
Design Strategies For Signal Integrity Power Integrity and EMI EMC Issues in Com
4 pages
Innovation in Procurement
No ratings yet
Innovation in Procurement
20 pages
ICT Grade9 B2 W4 U2 RevisionSheet1 Extra (October) (1)
No ratings yet
ICT Grade9 B2 W4 U2 RevisionSheet1 Extra (October) (1)
7 pages
AI Based Healthcare Chatbot System Using Natural Language Processing
No ratings yet
AI Based Healthcare Chatbot System Using Natural Language Processing
5 pages
Product Literature
No ratings yet
Product Literature
3 pages
Smart Grid Technologies
No ratings yet
Smart Grid Technologies
9 pages
Product Integration TWS and ITOM
No ratings yet
Product Integration TWS and ITOM
44 pages
Open Plant PID
No ratings yet
Open Plant PID
68 pages
Case Study 7
No ratings yet
Case Study 7
2 pages
MIS - Project Title Proposal
100% (1)
MIS - Project Title Proposal
14 pages
ArcMate0iB Controller30ib
No ratings yet
ArcMate0iB Controller30ib
1 page
Sales Guide, Companion Valco Melton ERO Folding Carton Industry Solutions
No ratings yet
Sales Guide, Companion Valco Melton ERO Folding Carton Industry Solutions
15 pages
90203-1104DED E Operation Manual PDF
No ratings yet
90203-1104DED E Operation Manual PDF
420 pages
IT Executive at Lobak Biru BHD, A Finance-Related Company
No ratings yet
IT Executive at Lobak Biru BHD, A Finance-Related Company
5 pages
Allied Complete Catalogue
No ratings yet
Allied Complete Catalogue
23 pages
3rd Sem Old
No ratings yet
3rd Sem Old
2 pages
Kansai LX
No ratings yet
Kansai LX
60 pages
EM7 Thermionic Emission
No ratings yet
EM7 Thermionic Emission
3 pages
Installation and Usage of The Congatec System Utility
No ratings yet
Installation and Usage of The Congatec System Utility
63 pages
Az 104t00a Enu Powerpoint 01
No ratings yet
Az 104t00a Enu Powerpoint 01
25 pages
Speeding Up Secure Web Transactions Using Elliptic Curve Cryptography
No ratings yet
Speeding Up Secure Web Transactions Using Elliptic Curve Cryptography
9 pages
Sales and Inventory System Thesis
100% (2)
Sales and Inventory System Thesis
7 pages
Applying The Benefits of Network On A Chip Architecture To FPGA System Design
No ratings yet
Applying The Benefits of Network On A Chip Architecture To FPGA System Design
9 pages
United States Patent: (10) Patent No.: (45) Date of Patent
No ratings yet
United States Patent: (10) Patent No.: (45) Date of Patent
9 pages
Implementation of Profile Matching and TOPSIS in Decision Support System For New Employee Recruitment at DBC Manufacturing Group in Indonesia
No ratings yet
Implementation of Profile Matching and TOPSIS in Decision Support System For New Employee Recruitment at DBC Manufacturing Group in Indonesia
7 pages
UPS Manual Operaciones
No ratings yet
UPS Manual Operaciones
21 pages
Merged Hcia Ai Huawei Mock Exam Written
No ratings yet
Merged Hcia Ai Huawei Mock Exam Written
28 pages
JFT AngularJs Syllabus
No ratings yet
JFT AngularJs Syllabus
4 pages
VSX 816 K
No ratings yet
VSX 816 K
144 pages

L-2 (Computer Performance)

Uploaded by

L-2 (Computer Performance)

Uploaded by

CSE 213

Lecture 2: Computer Performance

Military Institute of Science

• Performance is the key to understanding underlying motivation for the

• Why is some hardware better than others for different programs?

• What factors of system performance are hardware related?

• How much faster is the Concorde compared to the 747?

Individual is more interested in response time. As a user of a smart phone/laptop,

Response time (computer ): the total time required by computer to complete a

• Our favorite program runs in 10 seconds on computer A, which has a

• What clock rate should we tell the designer to target?

Execution time = clock cycle time x number of instrs x avg CPI

• Clock cycle time: manufacturing process (how fast is each

• Number of instrs: the quality of the compiler and the

• CPI: the nature of each instruction and the quality of the

• Suppose we have two implementations of the same instruction set

• Which machine is faster for this program, and by how much?

Which sequence will be faster?

What is the CPI for each sequence

• Which sequence will be faster according to MIPS?

• Performance best determined by running a real application

• The key is coming up with a collection of relevant programs

• The 2006 version includes 12 integer and 17 floating-point applications

• Performance is specific to a particular program

• Running out of ideas to improve single thread performance

• Power wall makes it harder to add complex features

• Power wall makes it harder to increase frequency

The energy of a pulse during the logic

The energy of a pulse during the logic

■ Deals with the potential speedup of a

■ Can be generalized to evaluate and

Amdahl’s law states that in parallelization

❑P is the proportion of a system or program that can be made parallel

If N tends to infinity then the maximum speedup tends to 1/(1-P).

If a program needs 20 hours using a single processor core, and a particular

■ Can be applied to almost any system that is statistically in

Number of items in the system (L)= the rate items enter

Request arrival rate is 4.

Required threads on server =Request arrival rate ∗ average response time=4∗(0.55)=2.2

You might also like