Complete Group A - CMP 408 Assignment
Complete Group A - CMP 408 Assignment
What are the strategies that can be used for the measurement techniques of
performance evaluation?
Several articles have been published on the performance evaluation of IEEE 802.11n
WLANs. Some of the benchmarks used in these articles include:
Throughput: The rate at which data is transmitted over the network, typically measured in
bits per second (bps) or packets per second (pps).
Data rate: The speed at which data is transmitted over the network, typically measured in
megabits per second (Mbps).
Signal-to-noise ratio (SNR): The ratio of the signal strength to the noise level, which affects
the quality of the wireless signal.
Packet error rate (PER): The percentage of packets that are lost or corrupted during
transmission.
Latency: The time it takes for a packet to be transmitted from the sender to the receiver,
typically measured in milliseconds (ms).
Channel utilization: The percentage of time that the wireless channel is in use, which affects
the network's capacity.
Coverage area: The physical area over which the wireless signal can be received, typically
measured in square meters.
Interference: The level of interference caused by other wireless networks or devices, which
can affect the performance of the WLAN.
Mobility: The ability of the network to maintain a stable connection as a user moves around
the coverage area.
Several articles have been published on the performance evaluation of the Cray XC50. Some
of the performance metrics used in these articles include:
Throughput: The rate at which the system processes data, typically measured in operations
per second or floating-point operations per second (FLOPS).
Latency: The time it takes for a request to be processed by the system, typically measured in
microseconds.
Scalability: The ability of the system to maintain or improve performance as the number of
processors or nodes increases.
Memory bandwidth: The rate at which data can be transferred to or from the system's
memory, typically measured in bytes per second.
I/O performance: The rate at which data can be transferred to or from external storage
devices, typically measured in bytes per second.
Communication overhead: The time and resources needed for processors to communicate
with each other.
Load balance: The distribution of workload across processors, to ensure that no processor is
overworked or underutilized.
These performance metrics were used in different publications to evaluate the performance of
the Cray XC50 under different workloads and configurations, in order to understand the
system's capabilities and limitations.
Hardware monitors are useful for providing accurate and real measurements about the
behavior of a system because they are directly connected to the system's hardware and can
measure various performance metrics at the hardware level. This can provide a more accurate
picture of the system's behavior, as it eliminates the need to rely on software-based
measurements that may be affected by the system's software and operating system.
While it is possible to replace the hardware monitor with a software monitor, there are
advantages and disadvantages to doing so. The advantage of using a software monitor is that
it can be easily implemented and can provide measurements on a wide range of systems,
regardless of the hardware. However, the disadvantage of using a software monitor is that it
can be affected by the system's software and operating system, which can lead to inaccurate
measurements.
In the case of this study, measuring the performance of an InfiniBand network, hardware
counters were essential to measure the packet rate and packet loss with high accuracy.
Software-based monitoring systems may not have been able to provide such accurate
measurements, as they would have been affected by the system's software and operating
system.
In conclusion, hardware monitors are useful for providing accurate and real measurements
about the behavior of a system, particularly when it comes to high-speed networks or
specialized systems. However, software monitors can also provide useful measurements, but
they may not be as accurate as hardware monitors. The choice between hardware and
software monitors will depend on the requirements of the measurement study and the system
being evaluated.
Answer
a. To calculate the effective CPI of the machine, we need to know the number of clock cycles
it takes to execute each instruction, including memory access time.
If each memory access takes one clock cycle, and 30% of the instructions require one
memory access, then the total number of clock cycles spent on memory access is 0.3 * 1 =
0.3.
If 5% of the instructions require two memory accesses, then the total number of clock cycles
spent on memory access is 0.05 * 2 = 0.1.
To find the total number of clock cycles per instruction, we add the clock cycles spent on
memory access to the number of clock cycles spent on instruction execution, which is
assumed to be 1.
Therefore, the effective CPI of the machine is 1 + 0.3 + 0.1 = 1.4.
b. To calculate the performance of the upgraded processor, we need to know the number of
clock cycles per instruction and the clock frequency of the processor.
The clock frequency of the upgraded processor is 1000 MHz, which is twice as fast as the
original processor. However, the memory subsystem remains unchanged, so each memory
access now takes two clock cycles instead of one.
If 30% of the instructions require one memory access, then the total number of clock cycles
spent on memory access is 0.3 * 2 = 0.6.
If 5% of the instructions require two memory accesses, then the total number of clock cycles
spent on memory access is 0.05 * 4 = 0.2.
To find the total number of clock cycles per instruction, we add the clock cycles spent on
memory access to the number of clock cycles spent on instruction execution, which is
assumed to be 1.
Therefore, the effective CPI of the upgraded machine is 1 + 0.6 + 0.2 = 1.8.
To calculate the performance of the upgraded processor, we divide the clock frequency of the
processor by the effective CPI.
Therefore, the performance of the upgraded processor is 1000 MHz / 1.8 = 555.56 MIPS.
The time needed by an 8-stage pipeline processor to execute the same task is the number of
operands divided by the number of stages, divided by the CPU clock frequency.
Therefore, the time needed by an 8-stage pipeline processor to execute the task is:
(600 / 8) / 1.5 GHz = 0.05 us.
The speedup factor, Sk, is the ratio of the time needed by a one-stage pipeline processor to
execute the task to the time needed by an 8-stage pipeline processor to execute the same task.
Therefore, the speedup factor, Sk, for an 8-stage pipeline processor is:
Sk = T1/Tk = 0.4 us / 0.05 us = 8
10. Devise an experiment to find out the performance metrics for an IEEE 802.3 local
area network (LAN)
a. The throughput of the network as a function of the number of nodes in the LAN.
b. The average packet delay as a function of the number of nodes in the LAN.
c. The throughput-delay relationship.
Answer
One way to devise an experiment to find out the performance metrics for an IEEE 802.3 LAN
would be as follows:
a. To measure the throughput of the network as a function of the number of nodes in the
LAN, we can use a network traffic generator to send a large amount of data packets to the
LAN. We can then measure the number of packets that are successfully received by the
destination node, and divide that by the time it took to send the packets. We can repeat this
process for different numbers of nodes in the LAN, and plot the results to see how the
throughput changes with the number of nodes.
b. To measure the average packet delay as a function of the number of nodes in the LAN, we
can use a network traffic generator to send a large number of packets to the LAN. We can
then measure the time it takes for each packet to be successfully received by the destination
node. We can calculate the average delay for each number of nodes in the LAN, and plot the
results to see how the delay changes with the number of nodes.
c. To measure the throughput-delay relationship, we can use the data obtained from the
previous two steps and plot them on a graph with throughput on the x-axis and delay on the
y-axis. This will show us how the delay changes as the throughput of the network increases.
CHAPTER 9 QUESTIONS
1. Describe what do you think would be the most effective way to study
each of the following systems:
a. A wireless local area network that consists of 100 nodes.
b. A 1000-procesor massively parallel computer system.
c. The performance of an Asynchronous Transfer Mode (ATM) based
local area network LAN system.
d. The operation of a simple bank branch in a town
Answer
a. To study the performance of a wireless local area network with 100 nodes, it would be
most effective to set up a test environment that mimics the network's real-world conditions as
closely as possible. This could include using similar hardware and software, and simulating
the same types of traffic and usage patterns. Testing tools such as network analyzers and
performance monitoring software could then be used to measure key performance indicators
such as throughput, latency, and packet loss.
d. To study the operation of a simple bank branch in a town, it would be most effective to use
a combination of observational and analytical methods. This could include conducting
interviews with bank employees and customers, observing branch operations firsthand, and
analyzing data such as transaction logs and customer surveys. Additionally, it would be
useful to compare the branch's performance to other branches in the area, and to industry
benchmarks.
2. For each of the systems in problem 1, assume that it has been decided to make a study via a
simulation model. Discuss whether the simulation should be static or dynamic, deterministic
or stochastic, and continuous or discrete.
Answer
a. A wireless local area network that consists of 100 nodes: The simulation of this system
should be dynamic, stochastic, and discrete. The dynamic simulation would be useful in order
to capture the time-varying nature of the network traffic and usage patterns. Stochastic
simulation would be useful to capture the randomness and uncertainty of the network traffic,
such as the arrival and departure of nodes. Discrete simulation would be useful to model the
discrete events that occur in the network, such as packet transmissions and receptions.
c. The performance of an Asynchronous Transfer Mode (ATM) based local area network
LAN system: The simulation of this system should be dynamic, stochastic, and discrete. The
dynamic simulation would be useful in order to capture the time-varying nature of the
network traffic and usage patterns. Stochastic simulation would be useful to capture the
randomness and uncertainty of the network traffic, such as the arrival and departure of nodes.
Discrete simulation would be useful to model the discrete events that occur in the network,
such as packet transmissions and receptions.
d. The operation of a simple bank branch in a town: The simulation of this system should be
static, stochastic, and discrete. The static simulation would be useful in order to capture the
overall processes and operations of the branch and how they are affected by the different
factors. Stochastic simulation would be useful to capture the randomness and uncertainty of
the customer behavior, such as the arrival and departure of customers. Discrete simulation
would be useful to model the discrete events that occur in the branch, such as customer
transactions and interactions with employees.
3. The technique for producing an exponential random variate with mean interarrival time of
1/l uses the formula, 1/l Ln U, where U is a uniformly distributed random variate between 0
and 1, U (0, 1). This approach could correctly be modified to return 1/l Ln (1 U). Explain
why this is possible.
Answer
This modification is possible because the exponential distribution is memoryless, meaning
that the probability of an event occurring at any given time is independent of when the last
event occurred. Therefore, the result of subtracting a uniformly distributed random variable
from 1 does not affect the probability distribution of the exponential random variable. The
exponential distribution is defined by its rate parameter, which is represented by the inverse
of the mean interarrival time (1/λ). Therefore, by multiplying this rate parameter by the
natural logarithm of (1-U), the resulting random variable will still have a probability
distribution that follows an exponential distribution with the same mean interarrival time.
4. Which type of simulation would you use for the following problems:
a. To model traffic in a wireless cell network given that the traffic is bursty.
b. To model scheduling in a multiprocessor computer system given that the request arrivals
have a geometric distribution.
c. To verify the value of p, which is defined as the ratio of a circle’s circumference to its
diameter; C/D.
Answer
a. Discrete event simulation
b. Discrete event simulation
c. Monte Carlo simulation
5. Using the multiplicative congruential method, find the period of the generator for a= 17, m
= 26, and X0= 1, 2, 3, and 4. Comment on the produced numbers and resulting periods
Answer
The Multiplicative Congruential Method is a method for generating pseudo-random numbers
using the following formula:
Xn = (a*Xn-1) mod m
Where Xn is the nth generated number, Xn-1 is the previous generated number, a is the
multiplier, and m is the modulus.
X0 = 1:
X1 = (17 * 1) mod 26 = 17
X2 = (17 * 17) mod 26 = 9
X3 = (17 * 9) mod 26 = 15
X4 = (17 * 15) mod 26 = 21
X5 = (17 * 21) mod 26 = 20
X6 = (17 * 20) mod 26 = 12
X7 = (17 * 12) mod 26 = 8
X8 = (17 * 8) mod 26 = 16
X9 = (17 * 16) mod 26 = 24
X10 = (17 * 24) mod 26 = 18
X11 = (17 * 18) mod 26 = 14
X12 = (17 * 14) mod 26 = 7
X13 = (17 * 7) mod 26 = 4
X14 = (17 * 4) mod 26 = 6
X15 = (17 * 6) mod 26 = 10
X16 = (17 * 10) mod 26 = 5
X17 = (17 * 5) mod 26 = 3
X18 = (17 * 3) mod 26 = 1
The period of the generator for X0 = 1 is 18. It means that there are 18 unique numbers
generated before the sequence repeats
X0 = 2:
X1 = (17 * 2) mod 26 = 5
X2 = (17 * 5) mod 26 = 3
X3 = (17 * 3) mod 26 = 1
The period of the generator for X0 = 2 is 3. It means that there are 3 unique numbers
generated before the sequence repeats
X0 = 3:
X1 = (17 * 3) mod 26 = 1
The period of the generator for X0 = 3 is 1. it means that there is only one number generated
and it is the initial value itself. It is not useful for applications that require a random number
X0 = 4:
X1 = (17 * 4) mod 26 = 6
X2 = (17 * 6) mod 26 = 10
X3 = (17 * 10) mod 26 = 5
X4 = (17 * 5) mod 26 = 3
X5 = (17 * 3) mod 26 = 1
The period of the generator for X0 = 4 is 5. It means that there are 5 unique numbers
generated before the sequence repeats
It is important to note that the period of the generator can vary greatly depending on the
choice of the initial value, multiplier, and modulus, and it should be carefully considered
when choosing a generator for a specific application.5.
6. Generate five 6-bit numbers using the Tauseworthe method for the following
characteristic polynomial starting with a seed of X0= (0.111111)2 X6 +X+1.
Answer
X1 = (0.011111)2
X2 = (0.101111)2
X3 = (0.110111)2
X4 = (0.111011)2
X5 = (0.111101)2