Stat Notes-Week 1-9
Stat Notes-Week 1-9
of Random Variables
RANDOM VARIABLE
–a function that associates a real number to each element in the sample space.
– a variable whose values are determined by chance.
DISCRETE VS. CONTINUOUS Examples
Discrete
Discrete:
• Possible outcomes are 1. The number of voters favoring a candidate
countable 2. The number of deaths per year attribute to
• Presented by countable data lung cancer
• There is a limit
Continuous
Continuous 1. The average amount of electricity consumed
• Possible outcomes are on per household per month
a continuous scale (height 2. The weight of newborns each year in a hospital
weight, or temperature) 3. The speed of a bus
• There is no limit
4. X 1 5 7 8 9
1 1 1 1 1
P(X) 5 5 5 5 5
example problem
Suppose that two coins are tossed at the same time. Let Y be the random variable representing
the number of heads that occur. Find the values of the random variable Y. Construct the
probability distribution of the random variable Y and its histogram. Solve the mean, variance,
and standard deviation.
STEP 1:
Determine the sample space. Let H represent head and T represent tail.
T TT
T
H TH
SAMPLE SPACE: S = {TT, TH, HT, HH}
T HT
H
H HH
STEP 2:
Count the number of heads in each outcome in the sample space and assign this number to this
outcome.
Probability P(Y)
0 0.4
4
0.3
1 1
2 0.2
2 1 0.1
4
0 0 1 2
Number of Heads Y
0 1 (0) 41 = 0
4
1 1 (1) 21 = 21
2
2 1 (2) 41 = 42
4
u = Y• P(Y) = 0 + 21 + 42 = 1
VARIANCE of a Discrete Probability
Distribution
The variance of a random variable with a discrete probability distribution is given by:
o2 = (x– u )2 • P(x)
Where
x = value of the random variable
P(x) = probability of the random variable X
u = mean of the probability distribution
STEP 5:
Find the variance.
Number of Probability Y • P(Y) Y– u (Y– u)2 (Y– u)2 • P(Y)
Heads Y P(Y)
0 1 (0) 41 = 0 0 – 1 = –1 (–1)2 = 1 (1) 41 = 0.25
4
1 1 (1) 21 = 21 1–1=0 (0)2 = 0 (0) 41 = 0
2
2 1 (2) 41 = 42 2–1=1 (1)2 = 1 (1) 41 = 0.25
4
o2 = (Y– u )2 • P(Y) = 0.25 + 0 + 0.25 = 0.5
u = Y• P(Y) = 0 + 21 + 42 = 1
o2 = 0.5 = 0.71
The probability distribution of the number of heads after tossing a coin two times has the
mean, variance, and standard deviation of 1, 0.5, and 0.71 respectively. The results show that it
is expected that the average outcome of all the tosses will be 1 head. In addition, the number of
heads is near to the mean as indicated by the variance and standard deviation.
example problem
Suppose that three coins are tossed at the same time. Let X be the random variable
representing the number of tails that occur. Find the values of the random variable X.
STEP 1:
Determine the sample space. Let H represent head and T represent tail.
H TH HHH
HHT
H T T H HTH SAMPLE SPACE: S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
HTT
H T H THH
THT
T T T H TTH
TTT
STEP 2:
Count the number of tails in each outcome in the sample space and assign this number to this
outcome.
STEP 3:
Identify the probability of each value of the variable X and create a Histogram to represent the
data.
Number of Tails 0.4
Probability P(X)
X 0.35
0 1 0.3
8
Probability P(X)
0.25
1 3 0.2
8
0.15
2 3
8 0.1
0.05
3 1
8 0 0 1 2 3
Number of Tails X
STEP 4:
Find the mean.
u = X• P(X)
Number of Tails X Probability P(X) X • P(X)
0 1 (0) 81 = 0
8
1 3 (1) 38 = 38
8
2 3 (2) 38 = 68
8
3 1 (3) 81 = 38
8
STEP 5:
Find the variance.
Number of Probability X • P(X) X– u (X– u)2 (X– u)2 • P(X)
Tails X P(X)
0 8
1 (0) 81 = 0 0 – 1.5 = –1.5 (–1.5)2 = 2.25(2.25) 81 = 0.28125
3 (1) 38 = 38 3
1 8 1 – 1.5 = –0.5 (–0.5)2 = 0.25 (0.25) 8 = 0.09375
The probability distribution of the number of tails after tossing a coin three times has the mean,
variance, and standard deviation of 1.5, 0.75, and 0.87 respectively. The results show that it is
expected that the average outcome of all the tosses will be 1.5 tails. In addition, the number of
tails is near to the mean as indicated by the variance and standard deviation.
Normal Distribution
– also called the normal curve, is the distribution of data where the mean, median, and mode are
equal
– the distribution is clustered at the center
– the graph is a bell-shaped curve, and symmetrical
-4 -3 -2 -1 1 2 3 4
Standard Deviation
THE Z-SCORE
– also called z-values, are the areas under the normal curve
Where:
z= X-u X = given measurement
o u = population mean
o = standard deviation
example problems
Given the mean u = 50 and the standard deviation,
o = 4 of a population of Reading scores. Find the
z-value that corresponds to a score X = 58.
z = X - u = 58 –4 50 = 2
o -3 -2 -1 1 2 3
38 42 46 50 54 58 62
Find the z-value of the following set of data. Tell whether the score is above or below the mean.
1. u = 45, o = 6, X = 39 z=–1 Below
2. u = 40, o = 8, X = 52 z = 1.5 Above
3. u = 75, o = 15, X = 82 z = 0.47 Above
TABLE OF AREAS UNDER THE NORMAL CURVE
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
For values of z above 3.09, use 0.4999 for the area.
Find the area under the standard normal curve between z = 0 and the following z-scores
a. z = 0.96 or P(0 < z < 0.96)
Step 1: Express the z-value into 3-dig its Step 4: Read the area (or probability) at the
intersection of the row and the column
z = 0.96
A = 0.3315
Step 2: In the table find the first two dig its
on the row
z = 0.9
Step 3: Match the third dig it with the
appropriate column heading
z = 0.06
-3 -2 -1 1 2 3
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
-3 -2 -1 1 2 3
c. z = 2.38 or P(0 < z < 2.38)
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
-3 -2 -1 1 2 3
Find the area under the standard normal curve bounded by the following pairs of z-scores.
a. z = 1 and z = 2 or P(1 < z < 2)
Z 0.00 Z 0.00
1.0 0.3413 2.0 0.4772
The area of the region in between: 0.3413
z = 1 —> A = 0.3413
Half of the Normal Curve —> A = 0.5 0.3413 0.5
A = 0.3413 + 0.5 = 0.8413 -3 -2 -1 1 2 3
b. To the left of z = -1.5 or P(z < -1.5)
Z 0.00
1.5 0.4332 0.4332
The area of the region in between: 0.5
Percentile
– a measure of relative standing.
– a descriptive measure of the relationship of a measurement to the rest of the data
– divides a set of data into 100 equal parts
– always refer to quantities below or less than the percentile rank
example problems
a. What percentile does the z-score 2.34 represent?
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
-3 -2 -1 1 2 3
-3 -2 -1 1 2 3
0.4842
-3 -2 -1 1 2 3
73M 80M
2. Fifty job applicants took an IQ test and their scores are normally distributed with a mean of 100
a. How many applicants obtain a score between 74 and 126 if the standard deviation is 20?
Given:
n = 50 z1 = X1 - u = 74 20 – 100 = –1.3 z2 = X2 - u = 126 20
– 100 = 1.3
u = 100 o o
X1 = 74
X2 = 126
o = 20
Number of branches
= [Area of P(–1.3 < z < 1.3)](n)
= (0.4032 + 0.4032) (50)
= 40.32 0.4032 0.4032
40 applicants
Therefore, 40 applicants obtained -3 -2 -1 1 2 3
a score between 74 and 126 if the 74 100 126
standard deviation is 20
b. The management decided not to hire the lowest 20% of the applicants, what must be the
score an applicant must obtain to get hired if the standard deviation is 20?
Given:
n = 50
Nearest value to 0.30:
A = 0.3021 z= X-u X = zo + u
= (–0.85)(20) + 100
o
u = 100 z = –0.85 X = 83
o = 20
A = 30% below mean
20% 30%
-3 -2 -1 1 2 3
100
Random Sampling
– a part of the sampling technique in which each sample has an equal probability of being
chosen. A sample chosen randomly is meant to be an unbiased representation of the total
population.
PARAMETERS
– are descriptive measures computed from a population
STATISTICS
– are descriptive measures computed from a sample
SAMPLING DISTRIBUTION OF SAMPLE MEANS
– is a frequency distribution using the means computed from all possible random samples of a
specific size
FINITE POPULATION
– is one that consists of a finite or fixed number of elements, measurements, or observations
INFINITE POPULATION
– contains, hypothetically, at least, an infinite number of elements
example problems
Data Average Sample Mean X Frequency Probability P(X)
3,4,6 4.33 4.33 1 0.10
3,4,7 4.67
4.67 1 0.10
3,4,9 5.33
3,6,7 5.33 5.33 2 0.20
3,6,9 6.00 5.67 1 0.10
3,7,9 6.33 6.00 1 0.10
4,6,7 5.67
6.33 2 0.20
4,6,9 6.33
4,7,9 6.67 6.67 1 0.10
6,7,9 7.33 7.33 1 0.10
example problems
a. The average time it takes a group of college students to complete a certain examination
is 46.2 minutes. The standard deviation is 8 minutes. Assume that the variable is normally
distributed. If 50 randomly selected college students take the examination, what is the
probability that the mean time it takes the group to complete the test will be less than 43
minutes?
Given: The problem is dealing with data about the sample
u = 46.2 mean. Thus, the formula above will be used.
o=8
X = 43
n = 50 Z = X o– u = 43 –846.2 = -2.83
n 50
Point Estimate
ESTIMATION
– the process of finding an approximate value of some parameter—such as the mean—of a
population from random samples of the population
– population parameters are usually unknown fixed values but there are 2 ways to determine
and report them:
1. Report a number that describes the average. This number is called the point estimate
• A point estimate is a specific numerical value of a population parameter. The sample
mean y is the best point estimate of the population mean
• The mean is the best estimator because any change in value affects the result
2. Report a range of values that contains the number that truly describes the data. This
number is called interval estimate
• An interval estimate is a range of values that may contain the parameter of a
population
example problems
1. Mr. Santiago’s company sells bottles of coconut juice. He claims that a bottle contains 500ml of
such juice. A consumer group wanted to know if his claim is true. They took six random samples of
10 such bottles and obtained the capacity, in ml, of each bottle. The result is shown as follows.
Sample 1 500 498 497 503 499 497 497 497 497 495
Sample 2 500 500 495 494 498 500 500 500 500 497
Sample 3 497 497 502 496 497 497 497 497 497 495
Sample 4 501 495 500 497 497 500 500 495 497 497
Sample 5 502 497 497 499 496 497 497 499 500 500
Sample 6 496 497 496 495 497 497 500 500 496 497
Assuming that the measurements were carefully obtained and that the only kind of error present is
the sampling error, what is the point estimate of the population mean?`
The point estimate of the population mean u is also known as the mean of the means or the
overall mean. To find its value, simply find the sum of the mean values and divide this sum by the
total number of sample means
SOLUTION 1: Sample Row Mean
2. The US Census Bureau publishes annual price figures for new mobile homes in Manufactured
Housing Statistics. The figures are obtained from sampling, not from a census. A simple random
sample of 36 new mobile homes yielded the prices, in thousands of dollars, shown in the Table
below. Use the data to estimate the population mean price, u, of all new mobile homes.
67.8 68.4 59.2 56.9 63.9 62.2 55.6 72.9 62.6
67.1 73.4 63.7 57.7 66.7 61.7 55.5 49.3 72.9
49.9 56.5 71.2 59.1 64.3 64.0 55.9 51.3 53.7
56.0 76.7 76.8 60.6 74.5 57.9 70.4 63.8 77.9
Family Expenses
1 Php 14,200
2 Php 15,500
3 Php 16,800 The mean consumption of the 6 families in one month
4 Php 17,500 is Php 18,500
5 Php 20,000
6 Php 27,000
Mean Php 18,500
example problems
1. Calculate the 95% confidence interval for 150 employees receiving a monthly salary of Php 15,000
with a standard deviation of Php 2,500.
Given: u – Z on , u + Z on
u = 15,000
o = 2,500
n = 150 = 15,000 – 1.96 2,500 150 , 15,000 + 1.96 150
2,500
z = 1.96 (95% confidence interval)
= (14,599.92, 15,400.08) or (14,600, 15,400)
The data shows that employees with salaries from Php 14,600 to Php 15,400 belong to the
95% of the true population receiving a monthly salary of Php 15,000. This implies that with
95% confidence that the mean salary is between Php 14,600 to Php 15,400.
2. A marketing officer wishes to select female receptionists from 300 employees with an average
height of 170 cm and a sample standard deviation of 25 cm. What is the 99% confidence interval
of their height?
Given: X – Z on , X + Z on
X = 170
o = 25 25 , 170 + 2.576 25
n = 300 = 170 – 2.576 300 300
z = 2.576 (99% confidence interval)
= (166.28, 173.71) or (166,174)