Engineering Data Analysis Part 1 23241stsem Notes
Engineering Data Analysis Part 1 23241stsem Notes
PART 1
STATISTICS
POPULATION
• It refers to the totality of objects under consideration.
• Characteristics of a population are measured and are referred
to as parameters.
SAMPLE
• It is a subset of a population.
• Characteristics of a population are called statistics and these
characteristics are gathered through sampling.
STATISTICS
DATA
PRIMARY DATA
• Primary Data are data collected directly from the data source.
• Primary data refers also to the first hand data gathered by
the researcher himself.
• Sources of Primary Data:
• Interviews
• Questionnaires
• Experiments
• Observations
• Documents
STATISTICS
SECONDARY DATA
QUALITATIVE DATA
QUALITATIVE DATA
QUANTITATIVE DATA
MEAN
• It refers to the value obtained by adding all the numbers in
the data set and then dividing the sum by the number of values
in that set.
• Population mean is denoted by 𝑢𝑥 and sample mean is denoted
by 𝑥.ҧ
n n
1 1
ux = xi 𝑥ҧ = xi
n n
i=1 i=1
MEASURES OF CENTRAL TENDENCY
MEAN
• Find the mean of 4, 8, 12, 0, -2, 10 ,14.
MEASURES OF CENTRAL TENDENCY
MEDIAN
• Median is the middle number in a data set after
arranging the data set from the lowest number to the
highest number.
• If the number of terms in the data set is an odd
number, then the median is the middle number of the
data set.
• If the number of terms in the data set is an even
number, then the median is the average of the two
middle numbers.
MEASURES OF CENTRAL TENDENCY
MEDIAN
• Find the median of 4, 8, 12, 0, -2, 10, 7
MEASURES OF CENTRAL TENDENCY
MEDIAN
• Find the median of 25, 28, 22, 20, 18, 23, 30, 24
MEASURES OF CENTRAL TENDENCY
MODE
• The mode of a data set is a value (are values) with most
occurrence or greatest frequency
Examples:
• The mode of 14, 8, 16, 10, 16, 20, 14, 16 is 16 since it occurs
most of the time (3 times)
• The modes of 9, 9, 27, 26, 26, 37, 47, 58 are 9 and 26 since
they occur the most out of the other entries (each occur
twice).
• The data set 3.3, 1.2, 5.6, 7.8, 9.2, 6.5 has no mode since no
number occurs more than once
MEASURES OF DISPERSION
MEASURES OF DISPERSION
• These are characteristics of data that show how value in the
data set differ relative to the average.
• RANGE
• AVERAGE DEVIATION
• VARIANCE
• STANDARD DEVIATION
• SKEWNESS
MEASURES OF DISPERSION
RANGE: R = HV – LV
Where:
HV – highest number in the data
LV – highest number in the data
Example:
Find the range of the given data set:
37, 19, 31, 29, 21, 26, 33, 36
MEASURES OF DISPERSION
AVERAGE DEVIATION
σ𝐧𝐢=𝟏 𝐱 𝐢 − 𝐱ത
𝐀𝐃 =
𝐧
Where:
𝐱 𝐢 – ith entry of the data set
𝐱ത - mean of the data set
n – number of data
MEASURES OF DISPERSION
AVERAGE DEVIATION
• Find the average deviation of 23, 30, 31, 15 and 46.
MEASURES OF DISPERSION
VARIANCE
A measurement of how far each number in a data set is from the
mean and thus from every other number in the set.
σ 𝐧
𝐢=𝟏 𝐱 𝐢 − ത
𝐱 𝟐 Where:
𝛔𝟐 = 𝛔 𝟐
– population variance
𝐧
𝐧
𝐬 𝟐
– sample variance
σ ത 𝟐
𝟐
𝐱
𝐢=𝟏 𝐢 − 𝐱 𝐱 𝐢 - ith data
𝐬 =
𝐧−𝟏 𝐱ത - mean of the data set
n – number of data
STANDARD DEVIATION
It is the square root of variance.
MEASURES OF DISPERSION
SKEWNESS
It is a measure of asymmetry or distortion of symmetric
distribution. Particularly, this measures the deviation of a
distribution from the normal distribution.
SKEWNESS
• If the skewness is equal to 0, then the distribution is symmetric.
This means that the data is evenly distributed on both sides of
the mean. In addition, the mean, median and mode are all equal.
• If the skewness is greater than 0, then the distribution is right-
skewed or positively-skewed. In this distribution, the mean is on
the right side of the median and the mode is on the left of the
median.
• If the skewness is lesser than 0, then the distribution is left-
skewed or negatively-skewed. In this distribution, the mean is
on the left side of the median and the mode is on the right of
the median.
MEASURES OF DISPERSION
MEASURES OF DISPERSION
SKEWNESS
A data set has the following samples: 1, 2, 4, 5, 9, 25, 1, 3, 4, 1.
Compute for its skewness.
MEASURES OF LOCATION
MEASURES OF LOCATION
• These are measures that aid in subdividing a population or
sample into equal subgroups
• QUARTILE
• DECILE
• PERCENTILE
MEASURES OF LOCATION
QUARTILE
• These are any of the three values that divide the items of a
frequency distribution into four classes with each containing
one fourth or 25% of the total population.
𝐣 𝐧+𝟏
𝐪𝐣 =
𝟒
Where:
𝐪𝐣 = location of the jth quartile in the data set
j = quartile number
n = number of values in the data set
MEASURES OF LOCATION
QUARTILE
• Find the third quartile of the data 20, 30, 25, 23, 22, 32, 36
MEASURES OF LOCATION
DECILE
• These are any of the nine values that divide the items of a
frequency distribution into 10 classes with each containing one
tenth or 10% of the total population.
𝐣 𝐧+𝟏
𝐝𝐣 =
𝟏𝟎
Where:
𝐝𝐣 = location of the jth decile in the data set
j = decile number
n = number of values in the data set
MEASURES OF LOCATION
DECILE
• Following are the number of defective items produced in a month by a machine for the
last 24 months. 45, 30, 36, 26, 16, 21, 33, 40, 32, 14, 10, 29, 23, 39, 17, 11, 18, 34, 19,
24, 21, 35, 42, 37. Find the 5th decile.
MEASURES OF LOCATION
PERCENTILE
• These are any of the 99 values that divide the items of a
frequency distribution into 100 classes with each containing 1%
of the total population.
𝐣 𝐧+𝟏
𝐩𝐣 =
𝟏𝟎𝟎
Where:
𝐩𝐣 = location of the jth percentile in the data set
j = percentile number
n = number of values in the data set
MEASURES OF LOCATION
PERCENTILE
• Following are the number of defective items produced in a month by a machine for the
last 24 months. 45, 30, 36, 26, 16, 21, 33, 40, 32, 14, 10, 29, 23, 39, 17, 11, 18, 34, 19,
24, 21, 35, 42, 37. Find the 50th percentile.
PROBABILITY DISTRIBUTION
PERMUTATION AND COMBINATION
ADDITION PRINCIPLE
“If a first event can be performed in “n1” different ways and a second
event can be performed in “n2” different ways…
• EXAMPLE 1:
Suppose there are 3
beef dishes and 4 fish
dishes. How many
selections does a
customer have if only 1
dish will be bought?
PERMUTATION AND COMBINATION
• EXAMPLE 2:
• A manufacturer is studying
the effects of cooking
temperature, cooking time
and type of cooking oil
making potato chips. There
are 3 different
temperatures, 4 different
cooking times, and 3
different oils are to be
used. How many
combinations will be used?
PERMUTATION AND COMBINATION
• EXAMPLE 3:
How many 4-letter
words can be formed
from the letters A, B, C,
D, E and F if each letter
is to be used only once
in each word?
PERMUTATION AND COMBINATION
PERMUTATION
An arrangement of a set of n objects in a SPECIFIED
ORDER.
Permutation of n objects taken r at a time is
n n!
P n, r = P =
r n−r !
COMBINATION
An arrangement of a set of n objects where ORDER DOES
NOT COUNT.
Combination of n objects taken r at a time is
n n!
C n, r = C =
r n − r ! r!
PERMUTATION VS COMBINATION
Choose 2 balls from 3 balls colored red, green and blue. In
how many ways are we going to arrange them?
PERMUTATION (order counts): COMBINATION (order does not
nPr = 3P2 = 6 count):
nCr = 3C2 = 3
1. red, green 2. green, red 1. red, green OR green, red
3. blue, red 4. red, blue 2. blue, red OR red, blue
5. blue, green 6. green, blue 3. blue, green OR green, blue
PERMUTATION AND COMBINATION
• EXAMPLE 4:
There are 3 boys and 4
girls. In how many ways
can you sit them in a row
such that:
(a) no particular order is
observed
(b) the girls are seated
together
(c) the boys are not
seated together
PERMUTATION AND COMBINATION
• EXAMPLE 5:
How many permutations
can be made out of the
word ENGINEERING?
PERMUTATION AND COMBINATION
• EXAMPLE 6:
From a group of 6 women
and 8 men, a committee of
5 is to be formed. In how
many ways can this be
done, if each committee
(a) is to be consisted of
exactly 3 men?
(b) is to be consisted of at
least 3 men?
(c) is to be consisted of at
most 2 men?
PROBABILITY
SAMPLE SPACE
The set of all possible outcomes of a statistical experiment.
Each outcome in a sample space is called an ELEMENT or a MEMBER
OF THE SAMPLE SPACE or simply a SAMPLE POINT.
PROBABILITY
PROBABILITY
If an experiment can result in any one of N different equally likely
outcomes, and if exactly n of these correspond to event A, then the
probability of event A is
n
P A =
N
PROBABILITY
What is the probability of getting a Heart from a standard deck of
cards.
PROBABILITY
PROBABILITY
What is the probability of getting a red ball from a jar containing 2
red balls, 4 green balls and 5 blue balls.
PROBABILITY
PROPERTIES OF PROBABILITY
1. For every event A, 0 ≤ 𝑃(𝐴) ≤ 1. This means that the probability
of an event to happen can only be greater than or equal to 0 and less
than or equal to 1.
2. The probability of an event that will surely happen (called a
CERTAIN EVENT) has a probability of 1.
3. The probability of an event that surely NOT happen has a
probability of 0.
PROBABILITY
PROPERTIES OF PROBABILITY
4. If A and B are any two events, then
P A ∪ B = P A + P B − P A ∩ B OR
P A OR B = P A + P B − P A AND B
PROBABILITY
PROPERTIES OF PROBABILITY
Find the probability of getting a heart card or a face card in a
standard deck of card.
PROBABILITY
PROPERTIES OF PROBABILITY
5. If A and B are any two mutually exclusive (disjoint) events, then
P A ∪ B = P A + P B OR
P A OR B = P A + P B
PROBABILITY
PROPERTIES OF PROBABILITY
Find the probability of getting a spade or a clover in a standard deck
of card.
PROBABILITY
PROPERTIES OF PROBABILITY
6. If P(AC) is the probability that A will not happen (AC is a
complement of A), then P(AC) = 1 – P(A).
PROBABILITY
PROPERTIES OF PROBABILITY
If the probabilities that an automobile mechanic will service 3,4,5,6,7
or 8, or more cars on any given workday are, respectively, 0.12, 0.19,
0.28, 0.24, 0.1 and 0.07. What is the probability that he will service at
least 5 cars on his next day at work?
PROBABILITY
PROPERTIES OF PROBABILITY
The probability that John and Bill passes the exam is 2/5 and 2/3
respectively.What is the probability that both of them fails?
PROBABILITY
PROPERTIES OF PROBABILITY
7. Let A1, A2,…, An be mutually exclusive events of a subspace. The
following are true:
a. P A1 ∪ A2 ∪ A3 = P A1 + P A2 + P A3
b. P A1 + P A2 + P A3 + ⋯ + P An = 1
PROBABILITY
PROPERTIES OF PROBABILITY
There are 2 red balls, 4 blue balls and 5 green balls in an urn. If we are
going to randomly choose a ball, find:
(a) The probability of getting a red ball.
(b) The probability of getting a blue ball.
(c) The probability of getting a green ball.
PROBABILITY
• EXAMPLE 7:
Roll a pair of dice one
time. What is the
probability of getting a
sum of 5 or 9?
PROBABILITY
• EXAMPLE 8:
A bag contains 3 white
balls and 5 black balls.
If two balls are drawn
in succession without
replacement, what is
the probability that
both balls are white?
PROBABILITY
• EXAMPLE 9:
Three light bulbs are
chosen at random
from 15 bulbs of which
5 are defective. Find
the probability that
exactly one is
defective.
PROBABILITY
CONDITIONAL PROBABILITY
Formula:
P A∩B
P AB =
P B
Where:
P A B = probability that A occurs given that B occurs
P A ∩ B = probability that both A and B occurs
P B = probability that B occurs
PROBABILITY
CONDITIONAL PROBABILITY
Formula:
P A∩B
P AB =
P B
• EXAMPLE 10:
• A random sample of 200 people is classified by gender and their
highest level of education attained.The following data were obtained:
• EXAMPLE 11:
Given the following:
P A = 2/5,
P A ∩ B = 2/15. Find
P B|A
PROBABILITY
INDEPENDENT EVENTS
Two events A and B are independent if and only if P A B = P(A)
and P B A = P B . So that, P A ∩ B = P A ∙ P B
PROBABILITY
• EXAMPLE 12:
• Let A and B be
events. What is the
probability P(B|A) if A
and B are
independent events?
RANDOM VARIABLES
SAMPLE SPACE
The set of all possible outcomes of a statistical experiment.
Each outcome in a sample space is called an ELEMENT or a MEMBER
OF THE SAMPLE SPACE or simply a SAMPLE POINT.
RANDOM VARIABLES
Example:
number of heads in a series of toss coins
number of cars passing through the skyway
numbers of successful trials
RANDOM VARIABLES
Example:
height of the student
waiting time
RANDOM VARIABLES
2 2 2
Var x = σ = E x − Ex
RANDOM VARIABLES
• EXAMPLE 13:
The discrete random variable X has the following probability distribution
below. Compute for the mean, the variance and the standard deviation.
RANDOM VARIABLES
• EXAMPLE 14:
What is the expected value of the continuous random variable X given its
probability distribution
1/x 4 x≥1
f x =ቊ
0 otherwise
DISCRETE PROBABILITY DISTRIBUTION
Uniform Distribution:
• A random variable X follows a uniform distribution on the integers
1, 2, 3, . . . , n, denoted X ∼ U[1...n], if the pmf of X is defined by
1
𝑓 𝑥 =𝑃 𝑋=𝑥 =
𝑛
If 𝑋~𝑈 1 … 𝑛 , then
2
𝑛+1 2
𝑛 −1
𝜇= , 𝜎 =
2 12
DISCRETE PROBABILITY DISTRIBUTION
Bernoulli Distribution:
• Consider an experiment with two mutually exclusive outcomes, say
a success and a failure.
• A random variable X follows a Bernoulli distribution, denoted by X
∼ Be(p), if the pmf of X is given by
f x = px q1−x
If X~Be p , then
E X = p, Var X = pq
DISCRETE PROBABILITY DISTRIBUTION
• EXAMPLE 15:
In a toss of a biased
coin that is twice as
likely to come up
heads as tails, what is
the probability of
getting a head?
DISCRETE PROBABILITY DISTRIBUTION
Binomial Distribution:
• Consider an experiment where a sequence of n independent Bernoulli
trials are performed. The probability of success, p, remains constant from
trial to trial. The resulting experiment gives rise to counting the number
of successes (with probability p) out of the n trials.
• A random variable X follows a Binomial distribution, denoted by X ∼
Bi(n, p), if the pmf of X is given by
n
f x = p x qn−x
x
If X~Bi n, p , then
E X = np, Var X = npq
DISCRETE PROBABILITY DISTRIBUTION
• EXAMPLE 16:
It is reported that in the
Republican group, 60%
are in favor of the
Reproductive Health
(RH) Bill. If a group of 8
republicans is chosen at
random, find the
probability that at least 6
are in favor of the RH
Bill.
DISCRETE PROBABILITY DISTRIBUTION
• EXAMPLE 17:
Find the probability
that in five tosses of a
fair die a 3 appears at
most once?
DISCRETE PROBABILITY DISTRIBUTION
Hypergeometric Distribution:
• Consider a random experiment of selecting n objects without replacement, from a set
of M objects, where K is of one kind, and M − K is of another. The random variable X is
defined as the number of successes in a random sample of n ≤ M elements.
• A random variable X follows a Hypergeometric distribution, denoted by X ∼ HG(n, M,
K), if the pmf is given by
K M−K
f x = x n−x
M
n
If X~HG n, M, K , then
nK nK M − K M − n
E X = , Var X =
M M M M−1
DISCRETE PROBABILITY DISTRIBUTION
• EXAMPLE 18:
A bin of 10 lightbulbs
contains 6 that are
nondefective. If 3 bulbs
are chosen without
replacement from the
bin, what is the
probability that exactly 2
of the bulbs in the
sample are nondefective?
DISCRETE PROBABILITY DISTRIBUTION
Poisson Distribution:
• Consider an experiment where the number of times a certain event
occurs during a given unit of time (measurement). Any random
phenomenon for which a count of some sort of interest is a candidate
for what is called an exponential distribution.
• A random variable X follows a Poisson distribution, denoted by X ∼
Po(λ), λ > 0 if the pmf of X is given by
𝜆𝑋 𝑒 −𝜆
𝑓 𝑥 =
𝑋!
If 𝑋~𝑃𝑜(𝜆), then
𝐸 𝑋 = 𝜆, 𝑉𝑎𝑟 𝑋 = 𝜆
DISCRETE PROBABILITY DISTRIBUTION
• EXAMPLE 19:
The average number of
homes sold by the
KUBO Company is 2
homes per day. What is
the probability that
exactly 3 homes will be
sold tomorrow?
DISCRETE PROBABILITY DISTRIBUTION
Geometric Distribution:
• Consider an experiment where Bernoulli trials are performed. A random
variable X counting the number of trials before the first success out of
the independent trials is called a geometric random variable.
• A random variable X follows a Geometric distribution, denoted by X ∼
Ge(p), if the pmf is given by
𝑓 𝑥 = 𝑞𝑥𝑝
If 𝑋~𝐺𝑒(𝑝), then
𝑞 𝑞
𝐸 𝑋 = , 𝑉𝑎𝑟 𝑋 = 2
𝑝 𝑝
DISCRETE PROBABILITY DISTRIBUTION
• EXAMPLE 20:
An oil company has
determined that the
probability of striking oil
on any particular drilling
is 0.2. Accordingly, what
is the probability that 4
dry wells are drilled
before striking oil on the
5th drilling?
DISCRETE PROBABILITY DISTRIBUTION
• EXAMPLE 21:
Robert is a football
player. His success rate
of goal hitting is 70%.
What is the probability
that Robert hits his
third goal on his fifth
attempt?
CONTINUOUS PROBABILITY DISTRIBUTION
Exponential Distribution:
• The waiting times between successive occurrences are continuous
positive random variables. In particular, the waiting time for the first
occurrence can be represented by an exponential random variable.
• A random variable X follows an exponential distribution, denoted by X
∼ Exp(λ), λ > 0, if its pdf is given by
f x = λe−λx
If X~Exp(λ), then
1 1
E X = , Var[X] = 2
λ λ
CONTINUOUS PROBABILITY DISTRIBUTION
• EXAMPLE 22:
On the average, a
certain computer has a
life time of 10 years. If
the life of the computer
is exponentially
distributed. What is the
probability that a
computer has a life of
less than 7 years?
CONTINUOUS PROBABILITY DISTRIBUTION
• EXAMPLE 23:
A postal clerk spends
with his or her
customer at an average
of 4 minutes each. Find
the probability that a
clerk spends four to
five minutes with a
randomly selected
customer.
CONTINUOUS PROBABILITY DISTRIBUTION
• EXAMPLE 24:
The number of miles that
particular car can run
before its battery wears
out is exponentially
distributed with an average
of 10,000 miles. The owner
of the car needs to take a
5000 – mile trip. What is
the probability that he will
be able to complete the
trip without having to
replicate the car battery?
CONTINUOUS PROBABILITY DISTRIBUTION
• EXPONENTIAL: • POISSON:
(continuous) (discrete)
A postal clerk spends with A postal clerk serves 4 customers
his or her customer at an per minute. Find the probability
average of 4 minutes each. that he or she will serve at least
Find the probability that he three customers in the next
or she will spend at least an minute.
additional three minute with
the postal clerk.
CONTINUOUS PROBABILITY DISTRIBUTION
Normal Distribution:
• A random variable X follows a normal distribution of parameters μ, σ ∈
R, σ > 0, denoted X ∼ N (μ, σ^2), if the pdf is given by
1 1 𝑥−𝜇 2
−2 𝜎
𝑓 𝑥 = 𝑒
𝜎 2𝜋
Z-Score:
• This entails a transformation in the values of the normal distribution into
a standard score z:
𝑥−𝜇
𝑍=
𝜎
CONTINUOUS PROBABILITY DISTRIBUTION
• EXAMPLE 25:
It was found that the
mean length of 100 parts
produced by a lathe was
20.05 mm with a standard
deviation of 0.02 mm. Find
the probability that a part
selected at random would
have a length less than
20.01 mm.
CONTINUOUS PROBABILITY DISTRIBUTION
• EXAMPLE 26:
It was found that the
mean length of 100 parts
produced by a lathe was
20.05 mm with a
standard deviation of
0.02 mm. Find the
probability that a part
selected at random
would have a length
greater than 20.09 mm.
CONTINUOUS PROBABILITY DISTRIBUTION
• EXAMPLE 27:
It was found that the
mean length of 100 parts
produced by a lathe was
20.05 mm with a
standard deviation of
0.02 mm. Find the
probability that a part
selected at random
would have a length
between 20.03 mm and
20.08 mm.
ENGINEERING DATA ANALYSIS
PART 1