0% found this document useful (0 votes)
53 views

Module 2 MIDTERM StatProb

This document discusses measures of central tendency including the mean, median, mode and weighted mean. It provides examples and properties of each measure. The mean is the average value where all data points are equally weighted. The median is the middle value when data is arranged in order. The mode is the most frequent value. The weighted mean accounts for different contributions by weighting each value.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Module 2 MIDTERM StatProb

This document discusses measures of central tendency including the mean, median, mode and weighted mean. It provides examples and properties of each measure. The mean is the average value where all data points are equally weighted. The median is the middle value when data is arranged in order. The mode is the most frequent value. The weighted mean accounts for different contributions by weighting each value.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

MajMath 7 – Elementary Statistics and Probability

Prepared by: NORLY F. POLO

STATISTICS
Introduction:
Whatever exists at all exists in some amount…and whatever
existsin some amount can be measured.
Edward Thorndike

Learning Objectives:

1. Compare the forms (textual, tabular, and graphical) of data.


2. Identify the essential parts of a table and describe the different kinds of graphs for data
presentation.
3. Draw the graph/table to present the data.
4. Analyze and interpret the data presented in a graph/table.
5. Discuss the uses, characteristics, advantages and disadvantages of measures of dispersions.

MEASURES OF CENTRAL TENDENCY


Measure of central tendency (commonly referred to as an average)
- is a single value that represents a data set
- its purpose is to locate the center of a data set

A. Mean
Arithmetic mean – the only common measure in which all values play an equal role, meaning, to
determine its values you would need to consider all the values of any given data set. It is appropriate
to determine the central tendency of an interval or ratio data.

The symbol 𝑥̅, called “x bar”, is used to represent the mean of a sample and the symbol 𝜇,
called“mu”, is used to denote the mean of a population.

Properties of Mean
1. A set of data has only one mean.
2. Mean can be applied for interval and ratio data.
3. All values in the data set are included in computing the mean.
4. The mean is very useful in comparing two or more data sets.
5. Mean is affected by the extreme small or large values on a data set.
6. Mean is most appropriate in symmetrical data.

𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠


𝑀𝑒𝑎𝑛 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
∑ 𝑥̅ ∑ 𝑥̅
𝑆𝑎𝑚𝑝𝑙𝑒 𝑀𝑒𝑎𝑛: 𝑥̅ = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑀𝑒𝑎𝑛: 𝜇 =
𝑛 𝑁

where: 𝑥̅ = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 (𝑖𝑡 𝑖𝑠 𝑟𝑒𝑎𝑑 x bar)


𝜇 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 (𝑖𝑡 𝑖𝑠 𝑟𝑒𝑎𝑑 mu)
𝑥̅ = 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑎𝑛𝑦 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑟 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑚𝑒𝑛𝑡
∑ 𝑥̅ = 𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑥̅′𝑠
𝑛 = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒
𝑁 = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
Example 1: The daily salaries of a sample of eight employees at GMS Inc. are P550, P420, P560, P500,
P700, P670, P860, P480. Find the mean daily rate of employees.

Solution:

∑ 𝑥̅ 𝑥̅1+𝑥̅2+𝑥̅3+𝑥̅4+𝑥̅5+𝑥̅6+𝑥̅7+𝑥̅8
𝑥̅ = =
𝑛 𝑛

550+420+560+500+700+670+860+480 4,740
𝑥̅ = = = 592.50
8 8

The sample mean daily salary of employees is P592.50.

Example 2: Find the population mean of the ages of 9 middle-management employees of a certain company.
The ages are 53, 45, 59, 48, 54, 46, 51, 58, and 55.

Solution:

∑ 𝑥̅ 𝑥̅ +𝑥̅ +𝑥̅ +𝑥̅ +𝑥̅ +𝑥̅ +𝑥̅ +𝑥̅ +𝑥̅


𝜇= = 1 2 3 4 5 6 7 8 9
𝑁 𝑁

53+45+59+48+54+46+51+58+55 469
𝜇= = = 52.11
9 9

The mean population age of middle-management employees is 52.11.

A. Median
- Is the midpoint of the data array.
- Is an appropriate measure of central tendency for data that are ordinal or above, but is more
valuable in an ordinal type of data.

Properties of Median
1. The median is unique, there is only one median for a set of data.
2. The median is found by arranging the set of data from lowest or highest (or highest to lowest) and
getting the value of the middle observation.
3. Median is not affected by the extreme small or large values.
4. Median can be applied for ordinal, interval and ratio data.
5. Median is most appropriate in a skewed data.

To determine the value of median for ungrouped, we need to consider two rules:
1. If n was odd, the median is the middle ranked.
2. If n was even, then the median is the average of the two middle ranked values.
𝑛+1
𝑀𝑒𝑑𝑖𝑎𝑛 (𝑅𝑎𝑛𝑘 𝑉𝑎𝑙𝑢𝑒) =
2

Note that n is the population/sample size.

Example 1: Find the median of the ages of 9 middle-management employees of a certain company. The
ages are 53, 45, 59, 48, 54, 46, 51, 58, and 55.

Solution:
Step 1: Arrange the data in order.

45, 46, 48, 51, 53, 54, 55, 58, 59


Step 2: Select the middle rank value.
𝑛+1 9+1 10
𝑀𝑒𝑑𝑖𝑎𝑛 (𝑅𝑎𝑛𝑘 𝑉𝑎𝑙𝑢𝑒) = = = =5
2 2 2

Step 3: Identify the median in the data set.

45, 46, 48, 51, 53, 54, 55, 58, 59

5th

Hence, the median age is 53 years.

Example 2: The daily rates of a sample of eight employees at GMS Inc. are P550, P420, P560, P500, P700,
P670, P860, P480. Find the median daily rate of employee.

Solution:
Step 1: Arrange the data in order.

P420, P480, P500, P550, P560, P670, P700, P860

Step 2: Select the middle rank value.


𝑛+1 8+1 9
𝑀𝑒𝑑𝑖𝑎𝑛 (𝑅𝑎𝑛𝑘 𝑉𝑎𝑙𝑢𝑒) = = = = 4.5
2 2 2

Step 3: Identify the median in the data set.

P420, P480, P500, P550, P560, P670, P700, P860

4.5th

Since the middle point falls between P550 and P560, we can determine the median of the
data set by getting the average of the two values.
550+560 1,110
𝑀𝑒𝑑𝑖𝑎𝑛 = = = 555
2 2

Therefore, the median daily rate is P555

B. Mode
- is the value in a data set that appears most frequently. Like the median and unlike the mean,
extreme values in a data set do not affect the mode.

Unimodal – a data set that has only one value that occurs the greatest frequency.
Bimodal – if the data has two values with the same greatest frequency, both values are considered
the mode and the data set is bimodal.
Multimodal – if a data set has more than two modes.
No mode - when a data set values have the same number frequency.

Properties of Mode
1. The mode is found by locating the most frequently occurring value.
2. The mode is the easiest average to compute.
3. There can be more than one mode or even no mode in any given data set.
4. Mode is affected by the extreme small or large values.
5. Mode can be applied for nominal, ordinal, interval and ratio data.
Example 1: The following data represents the total unit sales for Smartphones from a sample of 10
Communication Centers for the month of August: 15, 17, 10, 12, 13, 10, 14, 10, 8, and 9. Find the mode.

Solution:
The ordered array for these data is 8, 9, 10, 10, 10, 12, 13, 14, 15, 17.

Because 10 appear 3 times, more times than the other values, therefore the mode is 10.

Example 2: An operations manager in charge of a company’s manufacturing keeps track of the number of
manufactured LED television in a day. Compute for the following data that represents the number of LED
television manufactured for the past three weeks: 20, 18, 19, 25, 20, 21, 20, 25, 30, 29, 28, 29, 25, 25, 27,
26, 22, and 20. Find the mode of the given data set.

Solution:
The ordered array for these data is 18, 19, 20, 20, 20, 20, 21, 22, 25, 25, 25, 25, 26, 27, 28, 29, 29, 30.

There are two modes 20 and 25, since each of these values occurs four times.

Example 3: Find the mode of the ages of 9 middle-management employees of a certain company. The ages
are 53, 45, 59, 48, 54, 46, 51, 58, and 55.

Solution:
The ordered array for these data is 45, 46, 48, 51, 53, 54, 55, 58, 59.

There is no mode since the data set has the same frequency.

C. Weighted Mean
- Is particularly useful when various classes or groups contribute differently to the total.
- Is found by multiplying each value by its corresponding weight and dividing by the sum of the
weights.
𝑥̅1𝑤1 + 𝑥̅2𝑤2 + 𝑥̅3𝑤3 + ⋯ + 𝑥̅𝑛𝑤𝑛
𝑥̅𝑤 =
𝑤1 + 𝑤2 + 𝑤3 + ⋯ + 𝑤𝑛

where: 𝑥̅
𝑤= 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑚𝑒𝑎𝑛

𝑤𝑖 = 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑤𝑒𝑖𝑔ℎ𝑡
𝑥̅𝑖 = 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑎𝑛𝑦 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑜𝑟 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑚𝑒𝑛𝑡

Example 1: At the Mathematics Department of San Sebastian College there are 18 instructors, 12 assistant
professors, 7 associate professors, and 3 professors. Their monthly salaries are P30, 500, P33,700, P38,600,
and P45,000. What is the weighted mean salary?

Solution:
Let 𝑤1 = 18 𝑤2 = 12 𝑤3 = 7 𝑤4 = 3
𝑥̅1 = 30,500 𝑥̅2 = 33,700 𝑥̅3 = 38,600 𝑥̅4 = 45,000

The weighted mean salary is P33,965


Example 2: Riana’s first quarter grade is shown in the table below. Use the weighted mean formula to find
Riana’s GPA for the first quarter.

Subjects English Mathematics Filipino Science P.E. Religion


Grade 90 87 88 93 95 96
Units 3 3 3 3 2 1

Solution:
Let 𝑤1 = 3 𝑤2 = 3 𝑤3 = 3 𝑤4 = 2 𝑤5 = 1
𝑥̅1 = 90 𝑥̅2 = 87 𝑥̅3 = 88 𝑥̅4 = 95 𝑥̅5 = 96

𝑥̅1𝑤1 + 𝑥̅2𝑤2 + 𝑥̅3𝑤3 + 𝑥̅4𝑤4 + 𝑥̅5𝑤5


𝑥̅𝑤 =
𝑤1 + 𝑤2 + 𝑤3 + 𝑤4 + 𝑤5
90(3) + 87(3) + 88(3) + 95(2) + 96(1) 1,088

𝑥̅𝑤 = = = 90.67
3+3+3+2+1 12

The weighted mean of Riana’s GPA for the first quarter is 90.67.

Example 3: A certain subdivision in Laguna consists of 50 homes. The table shows the frequency distribution
of homes with respect to the number of bedrooms it has. Find the mean number of bedrooms for the 50
homes.

No. of Bedrooms 2 3 4 5 6
No. of Homes 13 21 10 4 2

Solution:
Let 𝑤1 = 2 𝑤2 = 3 𝑤3 = 4 𝑤4 = 5 𝑤5 = 6
𝑥̅1 = 13 𝑥̅2 = 21 𝑥̅3 = 10 𝑥̅4 = 4 𝑥̅5 = 2

𝑥̅1𝑤1 + 𝑥̅2𝑤2 + 𝑥̅3𝑤3 + 𝑥̅4𝑤4 + 𝑥̅5𝑤5


𝑥̅𝑤 =
𝑤1 + 𝑤2 + 𝑤3 + 𝑤4 + 𝑤5
13(2) + 21(3) + 10(4) + 4(5) + 2(6) 161
𝑥̅𝑤 = = = 8.05
2+3+4+5+6 20

The weighted mean of bedrooms per home is 8.05.

MEASURES OF DISPERSION
Standard deviation – is a statistical term that provides a good indication of volatility. It measures how widely
values are dispersed from the average.

Dispersion – is the difference between the actual value and the average value.

A. Range
- Is the difference of the highest value and the lowest value in the data set.

Advantages:
a. It is easy to compute.
b. It is easy to understand.

Disadvantages:
a. It can be distorted by a single extreme value (or outliner)
b. Only two values are used in the calculation
Example 1: The daily rates of a sample of eight employees at GMS Inc. are P550, P420, P560, P500, P700,
P670, P860, P480. Find the range.

Solution:
Step 1: Determine the highest value and lowest value in the data set.
Highest Value (HV) = P860 Lowest Value (LV) = P420

Step 2: Solve for the range.


Range = Highest Value (HV) – Lowest Value (LV) = P860 – P420 = P440

The range in daily rate salary is P440.

B. Variance and Standard Deviation

Standard deviation
- is calculated as the square root of variance.
- In finance, it is applied to the annual rate of return of an investment to measure the investment’s
volatility.
- Is also known as historical volatility and is used by investors as a gauge for the amount of expected
volatility.

Variance
- Is a mathematical expectation of the average squared deviations from the mean.

Volatility
- Is a measure of risk, so this statistic can help determine the risk an investor might take on when
purchasing a specific security.

Sample Variance and Sample Standard Deviation for Ungrouped Data

∑(𝑥̅−𝑥̅)2 ∑(𝑥̅−𝑥̅)2
𝑠2 = 𝑠 =√
𝑛−1 𝑛−1

2 (∑ 𝑥̅)2 (∑ 𝑥̅)2
∑ ∑ 𝑥̅ 2−
𝑠2 = 𝑥̅ − 𝑛 𝑠 = √ 𝑛−1 𝑛
𝑛−1

where:
𝑠2 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒.
𝑠 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛.
𝑥̅ = 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑎𝑛𝑦 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑟 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑚𝑒𝑛𝑡.
𝑥̅ = 𝑠𝑎𝑚𝑝𝑙𝑒𝑚𝑒𝑎𝑛.
𝑛 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛.

Example 2: The daily rates of a sample of eight employees at GMS Inc. are P550, P420, P560, P500, P700,
P670, P860, P480. Find the variance and standard deviation.

Solution:
Step 1: Compute the mean of the data set.

∑ 𝑥̅ 550+420+560+500+700+670+860+480 4,740
𝑥̅ = = = = 592.50
𝑛 8 8
Step 2: Subtract the mean from each of the value in the data set.

𝑥̅ 𝑥̅ − 𝑥̅
550 −42.5
420 −172.5
560 −32.5
500 −92.5
700 107.5
670 77.5
860 267.5
480 −112.5
∑ 𝑥̅ = 4,740 ∑(𝑥̅ − 𝑥̅) = 0

Step 3: Square the 𝑥̅ − 𝑥̅, then get the sum.

𝑥̅ 𝑥̅ − 𝑥̅ (𝑥̅ − 𝑥̅)2
550 −42.5 1,806.25 (−42.5)2 = 1,806.25
420 −172.5 29,756.25
560 −32.5 1,056.25
500 −92.5 8,556.25
700 107.5 11,556.25
670 77.5 6,006.25
860 267.5 71,556.25
480 −112.5 12,656.25
∑ 𝑥̅ = 4,740 ∑(𝑥̅ − 𝑥̅) = 0 ∑(𝑥̅ − 𝑥̅)2 = 142,950

Step 4: Solve for variance and the standard deviation. We can also obtain the standard deviation by
simply extracting the square root of the variance.

∑(𝑥̅−𝑥̅)2
= 20,421.43 = 142.90
142,950 ∑(𝑥̅−𝑥̅)2 142,950
𝑠2 = = = 20,421.43 𝑠=√ =√
𝑛−1 8−1 𝑛−1 8−1

Hence, the variance is P20,421.43 and the standard deviation is P142.90.

ALTERNATIVE SOLUTION: (Using the other formulas).

Step 1: Get the sum f the data set.

𝑥̅
550
420
560
500
700
670
860
480
∑ 𝑥̅ = 4,740
Step 2: Square the values in the data set and get the sum.

𝑥̅ 𝑥̅2
550 302,500
420 176,400
560 313,600
500 250,000
700 490,000
670 448,900
860 739,600
480 230,400
∑ 𝑥̅ = 4,740 ∑ 𝑥̅2 = 2,951,400

Step 3: Solve for the values of the variance and standard deviation.
2
2 (∑ 𝑥̅) (4,740)2

2,951,400−
∑ 𝑥̅ − 2,951,400−2,808,450
𝑠2 = 𝑛−1
𝑛
= 8−1
8 = 7
= 20,421.43
2 (∑ 𝑥̅)2 (4,740)2

𝑥̅ − 2,951,400− 2,951,400−2,808,450
𝑠=√ 𝑛−1
𝑛
=√ 8
=√ = √20,421.43 = 142.90
8−1 7

Thus, the variance is P20,421.43 and the standard deviation is P142.90.

Population Variance and Population Standard Deviation

∑(𝑥̅−𝜇)2 ∑(𝑋−𝜇)2
𝜎2 = 𝜎 =√
𝑁 𝑁

where: 𝜎2 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒.


𝜎 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛.
𝑥̅ = 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑎𝑛𝑦 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑟 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑚𝑒𝑛𝑡.
𝜇 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛.
𝑁 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛.

Example 3: The monthly incomes of the five research directors of Recoletos schools are: P55,000, P59,500,
P62,500, P57,000, and P61,000. Find the variance and standard deviation.

Solution:
Step 1: Compute the mean of the data set.

∑ 𝑥̅ 55,000+59,500+62,500+57,000+61,000 295,000
𝜇= = = = 59,000
𝑁 5 5

Step 2: Subtract the population mean from each of the value in the data set.

𝑥̅ 𝑥̅ − 𝜇
55,000 −4,000
59,500 500
62,500 3,500
57,000 −2,000
61,000 2,000
Step 3: Get the square of 𝑥̅ − 𝜇, then get the sum.

𝑥̅ 𝑥̅ − 𝜇 (𝑥̅ − 𝜇)2
55,000 −4,000 16,000,000
59,500 500 250,000
62,500 3,500 12,250,000
57,000 −2,000 4,000,000
61,000 2,000 4,000,000
∑ 𝑥̅ = 295,000 ∑(𝑥̅ − 𝜇) = 0 ∑(𝑥̅ − 𝜇)2 = 36,500,000

Step 4: Solve for the population variance and population standard deviation.

∑(𝑥̅−𝜇)2 36,500,000 ∑(𝑥̅−𝜇)2 730,000 = 2,701.85


𝜎2 = = = 730,000 𝜎=√ =
𝑁 5 𝑁

Hence, the population variance is 730,000 and the population standard deviation is 2,701.85.

MEASURES OF RELATIVE POSITION


*When presenting or analyzing data set it is sometimes helpful to group subjects into several equal groups.
For example, to create four equal groups we need the values that split the data such that 25% of the
observations are in each group. The cut off points are called quartiles, and there are 3 of them (the middle
one also being called the median). The general term for such cut off points is quantiles; other values likely
to be encountered are deciles, which split data into 10 parts, and percentiles, which split the data into 100
parts (also called centiles). Values such as quartiles can also be expressed as percentiles; for example, the
lowest quartile is also the 25th percentile and the median is the 50th percentile or the 5th decile.

A. Quartiles

𝑘(𝑁+1)
𝑄𝑘 = 4

where: 𝑄𝑘 = 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒.
𝑁 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛.
𝑘 = 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛.

Example 1: Find the first, second, and third quartiles of the ages of 9 middle-management employees of a
certain company. The ages are 53, 45, 59, 48, 54, 46, 51, 58, and 55.

Solution:
Step 1: Arrange the data in order.

45, 46, 48, 51, 53, 54, 55, 58, 59

Step 2: Select the first, second, and third quartiles value using Formula 4-14.
1(𝑁+1) 1(9+1) 10
𝑄1 = = = = 2.5
4 4 4

2(𝑁+1) 2(9+1) 2(10)


𝑄2 = = = =5
4 4 4

3(𝑁+1) 3(9+1) 3(10)


𝑄3 = = = = 7.5
4 4 4

Step 3: Identify the first, second, and third quartiles values in the data set.

45, 46, 48, 51, 53, 54, 55, 58, 59

2.5th 5th 7.5th


Since the 2.5th falls between 46 and 48; and 7.5th falls between 55 and 58 we can determine the first
and third quartiles of the data set by getting the average of the two values.

46+48 94 55+58 113


𝑄1 = 2
= 2
= 47 𝑄3 = 2
= 2
= 56.5

Therefore, 𝑄1 = 47, 𝑄2 = 53, 𝑎𝑛𝑑 𝑄3 = 56.5.

B. z-Score
- is used to know the position of one observation relative to others in a set of data.
- measures the distance between an observation and the mean, measured in units of standard
deviation.

The following formulas show how to compute the z-score for a data value x in a population and in a sample.
𝑥̅−𝜇 (𝑥̅−𝑥̅)
𝑧= (for population) 𝑧= (for sample)
𝜎 𝑠

Example 1: The monthly expenditures of a large group of households are normally distributed with a mean
of P48,700 and a standard deviation of P10,400. What is the z-value of monthly expenditures of P59,400 and
P38,300?

Solution:
Let 𝜇 = 48,700 𝜎 = 10,400

Using the formula of z to determine z-values for the two x values (P59,400 and P38,300) are computed as
follows:
(𝑥̅−𝜇) 59,400−48,700
For 𝑥̅ = 59,400 𝑧= = = 1.00
𝜎 10,400

(𝑥̅−𝜇) 38,300−48,700
For 𝑥̅ = 38,300 𝑧= = = −1.00
𝜎 10,400

*The z of 1.00 indicates that a monthly expenditure of P59,400 for households is one standard deviation
above the mean, and a 𝑧 of −1.00 shows that a P38,300 monthly expenditures is one standard deviation
below the mean. Note that both household monthly expenditures (P59,400 and P38,300) are the same
distance (P10,400) from the mean.

Example 2: A normal curve has a mean of 650 and a standard deviation of 40. An analyst is interested in
value of 575 and wants to find its equivalent z-score.

Solution:
Given: 𝑥̅ = 650 𝑠 = 40 𝑥̅ = 575

Substitute the given values into the z-score of −1.875.

(𝑥̅−𝑥̅) 575−650
𝑧= = = −1.875
𝑠 40

The value 575 has a z-score of −1.875.

Example 3: A time study reports indicates that an assembly line task should be finished in an average of
5.64 minutes, with a standard deviation of 0.97 minutes. One particular item had a z-score of 1.53. What was
the completion time of this item?

Solution:
Given: 𝑥̅ = 5.64 𝑠 = 0.97 z = 1.53
Substituting the given values to determine the 𝑥̅ value, we get

(𝑥̅−𝑥̅)
𝑧= 𝑥̅ = 𝑥̅ + 𝑧𝑠
𝑠

𝑥̅ = 𝑥̅ + 𝑧𝑠 = 5.64 + (1.53)(0.97) = 5.64 + 1.4841 = 7.1241 ≈ 7.12 minutes

The item had an assembly time of 7.12 minutes.

Example 4: The salary of junior executives in a large corporation in Ortigas area is normally distributed with
a standard deviation of P15,600. Cutback is pending, at which time those who earn less than P85,000 will be
discharged. If such a cut represents a z-score of −1.28 of the junior executives, what is the mean salary of
the group of junior executives?

Solution:
Given: 𝑠 = 15,600 𝑥̅ = 85,000 𝑧 = -1.28

Substituting the given values to determine the 𝑥̅ value, we get

(𝑥̅−𝑥̅)
𝑧= 𝑥̅ = 𝑥̅ − 𝑧𝑠
𝑠

𝑥̅ = 𝑥̅ − 𝑧𝑠 = 85,000 − (−1.28)(15,600) = 85,000 + 19,968 = 104,968

Thus, the mean salary of junior executives is P104,968.

C. Box-and-Whisker Plot
- Introduced by John Wilder Tukey (1915-2000) in the 1970’s.
- a boxplot (or box-and-whisker plot) is graph of a data set obtained by drawing a horizontal line
from the minimum data value to first quartile (𝑄1), drawing a horizontal line to third quartile (𝑄3) to
the maximum data value, and drawing a box whose vertical line passes through 𝑄1 and 𝑄3 with a
vertical line inside the box passing through the median or second quartile (𝑄2).

The boxplot will give the following information:


1. if the median is near the center of the box, the distribution is approximately symmetric.
2. If the median falls to the right of the center of the box, the distribution is negatively skewed.
3. If the median falls to the left of the center of the box, the distribution is positively skewed.
4. If the lines are about the same length, the distribution is approximately symmetric.
5. If the left line is larger than the right line, the distribution is negatively skewed.
6. If the right line is larger than the left line, the distribution is positively skewed.

Figure 4.10: Boxplot

𝑋 𝑙𝑜𝑤𝑒𝑠𝑡 Q 𝑋 ℎ𝑖𝑔ℎ𝑒𝑠𝑡

𝑄2 = 𝑀𝑒𝑑𝑖𝑎𝑛

0 10 20 30 40 50 60
Example 2: Construct a boxplot for the data set of the ages of 9 middle-management employees of a
certain company. The ages are 53, 45, 59, 48, 54, 46, 51, 58, and 55. What can we say about the
distribution of the data set?

Solution:
Step 1: Determine the 𝑄1, Median, and 𝑄3 of the given data set.
Recall that 𝑄1 = 47, Median= 53, and 𝑄3 = 56.5.

Step 2: Locate the lowest value, 𝑄1, the median, 𝑄3, and the highest value on the scale.

Step 3: Draw a box around 𝑄1 and 𝑄3, draw a vertical line through the median, and connect the upper
and lower values, as shown in Figure 4.11.

Figure 4.11: Boxplot for the Middle Management Employee’s Age

𝑄1 = 47 𝑄3 = 56.5

45 59

Median = 53

40 45 50 55 60

The data set of the distribution is negatively-skewed, since the median falls to the right of the center of the
box.

PROBABILITIES AND NORMAL DISTRIBUTION

Normal distribution or Gaussian distribution


- Is a continuous probability distribution that describes data that clusters around a mean.

Gaussian function or bell curve


- The graph of the associated probability density function is bell-shaped, with a peak at the mean.

Normal curve
- Was developed mathematically in 1733 by Abraham de Moivre (1667-1754) as an approximation
to the binomial distribution.
- Is often called the Gaussian distribution

Abrahan de Moivre (1667-1754)


- Developed the normal curve in 1733 but his paper was not discovered until 1924 by Karl Pearson
(1857-1936).

Pierre-Simon Laplace (1749-1827)


- Used the normal curve in 1783 to describe the distribution of errors.

Carl Friedrich Gauss (1777-1855)


- Used the normal curve to analyze astronomical data in 1809.
Example: If a research investigator selects a random sample of 100 adult males, measures their height, and
construct a histogram, the researcher gets the graph similar to the one presented in Figure 4.12(a). Now if
the investigator increases the sample size and decreases the width of the classes, the histogram will look
like the ones presented in Figure 4.12(b) and (c). Lastly, if it were possible to measure the heights of all adults
in the Philippines, the histogram would come close to what is called a normal distribution, presented in Figure
4.12(d).

Figure 4.12: Histogram for the Distribution of Heights of Adult Male in the Philippines

(a) Random Sample of 100 Male (b) Sample size increased & class width decreased

(c ) Sample size increased & class width decreased (d) Normal distribution for the population
further

Figure 4.13: Normal Distribution Curve


Normal distribution
- is a continuous, symmetric, bell-shaped distribution of a variable.

Properties of Normal Distribution:


1. The distribution is bell-shaped.
2. The mean, median, and mode are equal and are located at the center of the distribution.
3. The normal distribution is unimodal.
4. The normal distribution curve is symmetric about the mean.
5. The normal distribution is continuous.
6. The normal curve is asymptotic (it never touches the x-axis).
7. The total area under the normal distribution curve is 1.00 or 100%.
8. The area under the part of a normal curve that lies within 1 standard deviation of the mean 68%;
within 2 standard deviations, about 95%; and with 3 standard deviations, about 99.7%. See Figure
4.13, which shows the area in each region.

A. Standard Normal Distribution

A normal distribution can be converted into a standard normal distribution by obtaining the z value.

z value – is the signed distance between a selected value, designated x, and the mean, 𝜇, divided by the
standard deviation.
- also called as z scores, the z statistics, the standard normal deviates, or the standard normal
values.
𝑥̅−𝜇
Standard normal value: 𝑧=
𝜎

where: 𝑧 = 𝑧 𝑣𝑎𝑙𝑢𝑒
𝑥̅ = 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑎𝑛𝑦 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑟 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑚𝑒𝑛𝑡.
𝜇 = 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛.
𝜎 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛.

Example 1: Determine the area under the standard normal distribution curve between 𝑧 = 0 and 𝑧 = 1.85.

Solution:
Draw the figure and represent the area as shown in the figure below.

Since Table A gives the area between 0 and any z value to the right of 0, we only need to look up the z value
in the table. Find 1.8 in the left column and 0.05 in the top row. The value where the column and row meet in
the table in the answer, 0.4678.

Table A: Standardized Normal Distribution

𝑧 0.00 0.01 … 0.05 …


0.0 0.0000 0.0040 … 0.0199 …
0.1 0.0398 0.0438 … 0.0596 … 𝑃(0 < 𝑧 < 1.85)
0.2 0.0793 0.0832 … 0.0987 …
: : : :
1.8 0.4641 0.4649 … 0.4678 …
: : : :
𝑃(0 < 𝑧 < 1.85) = 0.4678

0.4678

0 1.85
Hence, the area is 0.4678 or 46.78%.

Example 2: Determine the area under the standard normal distribution curve between 𝑧 = 0 and 𝑧 = −1.15.

Solution:
The desired area is shown below.

0.3749

−1.15 0

The area between 𝑧 = 0 and 𝑧 = −1.15 𝑜𝑟 𝑃(−1.15 < 𝑧 < 0) is 0.3749. Therefore, the area is 0.3749 or
37.49%.

B. Application of Normal Distribution


𝑥̅−𝜇
𝑧= where 𝑧 = 𝑧 𝑣𝑎𝑙𝑢𝑒
𝜎
𝑥̅ = 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑎𝑛𝑦 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑟 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑚𝑒𝑛𝑡.
𝜇 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛
𝜎 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

The formula is used to gain information about an individual data value when the variable is normally
distributed.

Example 1: The average Pag-ibig salary loan for RFS Pharmacy Inc. employees is P23,000. If the debt is
normally distributed with a standard deviation of P2,500, find the probability that the employee owes less
than P18,500.

Solution:
Step 1: Draw a figure and represent the area.

P(x<18,500)

18,500 23,000
Step 2: Find the z value for P18,500.

𝑥̅−𝜇 18,500−23,000 −4,500


𝑧= = = = −1.80
𝜎 2,500 2,500

Step 3: Find the appropriate area. The area obtained in the Standardized Normal Distribution Table (kindly
download your own copy from the internet) is 0.4641, which corresponds to the area between 𝑧 = 0 and 𝑧 =
−1.80.

𝑃(−1.80 < 𝑧 < 0) = 0.4641

Step 4: Subtract 0.4641 from 0.50000.

𝑃(𝑥̅ < 18,500) = 𝑃(𝑧 < −1.80) = 0.5000 − 𝑃(−1.80 < 𝑧 < 0) = 0.5000 − 0.4641 = 0.0359

0.0359

18,500 23,000

Hence, the probability that the employee owes less than P18,500 in Pag-ibig salary loan is 0.0359 or 3.59%.

Example 2: Consider an investment in stock market whose return is normally distributed with a mean of
20% and standard deviation of 8%. Determine the probability of earning money.

Solution:
Note that the investment earns money when the return is positive. Thus, we can represent the return in terms
of P(x>0).

Step 1: Represent x and 0 in a standard probability statement.


𝑥̅−𝜇 0−20
𝑃(𝑥̅ > 0) = 𝑃 ( > ) = −2.50
𝜎 8

Step 2: Find the appropriate arear for 𝑧 = −2.50.

𝑃(−2.50 < 𝑧 < 0) = 0.4938

Step 3: Add 0.4938 to 0.5000.

𝑃(𝑥̅ > 0) = 𝑃(𝑧 < −2.50) = 𝑃(−2.50 < 𝑧 < 0) + 0.5000 = 0.4938 + 0.5000 = 0.4938

Hence, the probability of earning money is 0.9938 or 99.38%.

CORRELATION AND LINEAR REGRESSION


Correlation – is a statistical method used to determine whether a relationship between variables exists.

Regression analysis – is a statistical method used to describe the nature of the relationship between
variables, that is, either positive or negative, linear or nonlinear.
Two types of relationships
a. Simple
- There are 2 variables (a) an independent variable (or explanatory variable or predictor variable)
and, (b) a dependent variable (or response variable).
- Can be positive or negative – (a) positive relationship exists when either variables increase at the
same time or both decrease at the same time, and on the contrary, in a (b) negative relationship,
as one variable increases, the other variable decreases or vice versa.
b. Multiple

A. Pearson Product-Moment Correlation


- Is the most widely used in statistics to measure the degree of the relationship between the linear
related variables.

Correlation – refers to the departure of two random variables from independence.

Correlation coefficient – is defined as the covariance divided by the standard deviations of the variables.

The following formula is used to calculate the Pearson 𝑟 correlation:

∑(𝑥̅−𝑥̅)(𝑦−𝑦) 𝑛 ∑ 𝑥̅𝑦−(∑ 𝑥̅)(∑ 𝑦)


𝑟= or 𝑟=
√[∑(𝑥̅−𝑥̅)2 ][∑(𝑦−𝑦)2 ] √[𝑛(∑ 𝑥̅2)−(∑ 𝑥̅)2][𝑛(∑ 𝑦2)−(∑ 𝑦)2]

Pearson’s product-moment correlation coefficient or simply correlation coefficient (or Pearson’s 𝑟)


- Is a measure of the linear strength of the association between two variables.
- Founded by Karl Pearson

*The value of the correlation coefficient varies between +1 and -1. When the value of the correlation
coefficient lies around ±1, then it is said to be a perfect degree of association between the two variables. As
the value of the correlation coefficient goes closer to zero, the relationship between the two variables will be
weaker. This information is summarized in the charts below.

Perfect Positive Correlation (𝑟 = 1.00) Perfect Negative Correlation (𝑟 = −1.00)


Positive Correlation (𝑟 = 0.80) Negative Correlation (𝑟 = −0.80) Zero Correlation (𝑟 = 0.00)

Non-linear Correlation

The following summarizes the correlation coefficient and strength of relationships:

0.0 - no correlation, no relationship


±0.01 𝑡𝑜 ± 0.20 - very low correlation, almost negligible relationship
±0.21 𝑡𝑜 ± 0.40 - slight correlation, definite but small relationship
±0.41 𝑡𝑜 ± 0.70 - moderate correlation, substantial relationship
±0.71 𝑡𝑜 ± 0.90 - high correlation, marked relationship
±0.91 𝑡𝑜 ± 0.99 - very high correlation, very dependable relationship
±1.00 - perfect correlation, perfect relationship

*A test of significance for the coefficient of correlation may be used to find out if the computed Pearson’s 𝑟
could have occurred in a population in which the two variables are related or not. The test statistics follows
the 𝑡 distribution with 𝑛 − 2 degrees of freedom. The significance is computed using the formula of 𝑡 test as
shown below:

𝑟√𝑛−2
𝑡= where: 𝑡 = 𝑡-test for correlation coefficient
√1−𝑟2
𝑟 = correlation coefficient
𝑛 = number of paired samples

Assumptions in Pearson Product-Moment Correlation test:


1. Subjects are randomly selected.
2. Both populations are normally distributed.
Procedure for Pearson Product-Moment Correlation test:
1. Set up the hypotheses:
𝐻0: 𝜌 = 0 (The correlation in the population is zero.)
𝐻1: 𝜌 ≠ 0, 𝜌 > 0, 𝜌 < 0 (The correlation in the population is different from zero.)
where: 𝜌 = correlation in the population.
2. Set the level of significance.
3. Calculate the degrees of freedom (𝑑𝑓 = 𝑛 − 2) and determine the critical value of 𝑡.
4. Calculate the value of Pearson’s 𝑟.
5. Calculate the value of 𝑡 value and determine the statistical decision for hypothesis testing:
If 𝑡𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 < 𝑡𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 , do not reject 𝐻0.
If 𝑡𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 ≥ 𝑡𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 , reject 𝐻0.
6. State the conclusion.

Figure 4.14: Testing the Hypothesis of Correlation Coefficient at 0.05 Significance Level

Rejection Region Rejection Region


(There is correlation) (There is correlation)

Non rejection Region


(No correlation in
population)

When the null hypothesis has been rejected for a specific significance level, there are possible relationships
between x and y variables.

1. There is a direct cause-and-effect relationship between the two variables.


2. There is a reverse cause-and-effect relationship between the two variables.
3. The relationship between the two variables may be caused by the third variable.
4. There may be a complexity of interrelationship among many variables.
5. The relationship between the two variables may be coincidental.

Example 1: The owner of a chain of fruit shake stores would like to study the correlation between atmospheric
temperature and sales during the summer season. A random sample of 12 days is selected withthe results
given as follows:

Day 1 2 3 4 5 6 7 8 9 10 11 12
Temperature 79 76 78 84 90 83 93 94 97 85 88 82
Total Sales (Units) 147 143 147 168 206 155 192 211 209 187 200 150

Plot the data on a scatter diagram. Does it appear there is a relationship between atmospheric temperature
and sales? Compute the coefficient of correlation. Determine at the 0.05 significance level whether the
correlation in the population is greater than zero.
Solution:
Step 1: Graph the scatter plot.

250

200

150

100

50

0
0 20 40 60 80 100 120

Step 2: State the hypotheses.


𝐻0: 𝑟 = 0 (There is no correlation between atmospheric temperature and total sales of fruit shake.)
𝐻1: 𝑟 ≠ 0 (There is a correlation between atmospheric temperature and total sales of fruit shake.)

Step 3: The level of significance is 𝛼 = 0.05.

Step 4: Determine the degrees of freedom and the critical values of 𝑡.

𝐷𝐹 = 𝑁 − 2 = 12 − 2 = 10 and 𝑡 = ±2.228

Step 5: Compute for the value of 𝑟 (Pearson Product-Moment Correlation Coefficient).

Day 𝑥̅ 𝑦 𝑥̅2 𝑦2 𝑥̅𝑦


1 79 147 6,241 21,609 11,613
2 76 143 5,776 20,449 10,868
3 78 147 6,084 21,609 11,466
4 84 168 7,056 28,224 14,112
5 90 206 8,100 42,436 18,540
6 83 155 6,889 24,025 12,865
7 93 192 8,649 36,864 17,856
8 94 211 8,836 44,521 19,834
9 97 209 9,409 43,681 20,273
10 85 187 7,225 34,969 15,895
11 88 200 7,744 40,000 17,600
12 82 150 6,724 22,500 12,300
Total 1,029 2,115 88,733 380,887 183,222

∑ 𝑥̅ = 1,029 ∑ 𝑦 = 2,115 ∑ 𝑥̅2 = 88,733 ∑ 𝑦2 = 380,887 ∑ 𝑥̅𝑦 = 183,222

𝑛 ∑ 𝑥̅𝑦−(∑ 𝑥̅)(∑ 𝑦)
𝑟=
√[𝑛(∑ 𝑥̅2)−(∑ 𝑥̅)2][𝑛(∑ 𝑦2)−(∑ 𝑦)2]
12(183,222) − (1,029)(2,115) 22,329
𝑟= = = 0.9270572554 ≈ 0.93
√[12(88,733) − (1,029)2][12(380,887) − (2,115)2] √[5,955][97,419]

The coefficient of correlation, 𝑟 = 0.93, between the atmospheric temperature and total sales indicates a very
high positive correlation (very dependable relationship) – that is an increase in atmospheric temperature is
highly associated with the increased in total sales of fruit shake.

Step 6: Decision rule.


In order to make a decision on the significant relationship, we need to determine the value of 𝑡.

𝑟√𝑛 − 2 0.93√12 − 2 0.93(3.16227766) 2.940918224


𝑡= = = = ≈ 8.00
√1 − 𝑟2 √1 − (0.93)2 √1 − 0.8649 0.367559519

Since the computed t-value of 8.00 is greater than the tabular value of 2.228 at level of significance of 0.05,
we would need to reject the null hypothesis.

Step 7: Conclusion.
Since the null hypothesis has been rejected, we can conclude that there is evidence that shows significant
association between the atmospheric temperature and the total sales of fruit shake.

B. Simple Linear Regression Analysis


Regression analysis
- Is a simple statistical tool used to model the dependence of a variable on one (or more)
explanatory variables.

Simple linear regression


- Is the least estimator of a linear regression model with a single predictor (or one independent
variable).

Least square model


- Determines a regression equation by minimizing the sum of squares of the vertical distances
between the actual y values and the predicted values of y.

Residual – the difference between an observed and predicted value. The mean of residual is always zero.

Outliers – the points that fall outside the overall pattern of the other points.

Influential scores
- Scores whose removal greatly changes the regression line

𝑛(∑ 𝑥̅𝑦)−(∑ 𝑥̅)(∑ 𝑦)


𝑦̂= 𝑏1 𝑥̅ + 𝑏0 𝑏1 = 𝑛(∑ 𝑥̅ 2 )−(∑ 𝑥̅)2
𝑏0 = 𝑦− 𝑏1 𝑥̅

where: 𝑦̂= 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑜𝑟 𝑓𝑖𝑡𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑦.


𝑥̅ = 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑎𝑛𝑦 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒.
𝑦 = 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑎𝑛𝑦 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒.
𝑏1 = 𝑠𝑙𝑜𝑝𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑙𝑖𝑛𝑒.
𝑏0 = 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑙𝑖𝑛𝑒.
𝑥̅ = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝑦= 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒.

Example 2: Referring to the Example 1 involving atmospheric temperature on sales, determine the
regression equation, plot the regression line and interpret it.

Solution:
Computation of the Simple Linear Regression Equation
Step 1: Obtain the sum of 𝑥̅, 𝑦, 𝑥̅2, 𝑦2, 𝑎𝑛𝑑 𝑥̅𝑦. (Recall that we already obtain the values)

∑ 𝑥̅ = 1,029 ∑ 𝑥̅2 = 88,733 ∑ 𝑥̅𝑦 = 183,222


∑ 𝑦 = 2,115 ∑ 𝑦2 = 380,887

Step 2: Compute for slope of the simple linear regression.

𝑛(∑ 𝑥̅𝑦)−(∑ 𝑥̅)(𝑦)


𝑏1 = 𝑛(∑ 𝑥̅2)−(∑ 𝑥̅)2

12(183,222) − (1,029)(2,115) 2,198,664 − 2,176,335 22,329

𝑏1 = = = = 3.7496
12(88,733) − (1,029)2 1,064,796 − 1,058,841 5,955

Step 3: Compute for the mean value of x and y.


∑ 𝑥̅ 1,029 ∑𝑦 2,115
𝑥̅ = = = 85.75 𝑦= = = 176.25
𝑛 12 𝑛 12

Step 4: Compute for intercept of the simple linear regression.

𝑏0 = 𝑦− 𝑏1 𝑥̅ = 176.25 − 3.7496(85.75) = 176.25 − 321.5282 = −145.2782

Step 5: Substitute the slope and intercept in the general simple linear regression equation.

𝑦̂= 𝑏1 𝑥̅ + 𝑏0 General Equation for Simple Linear Regression

The Simple Linear Regression is 𝑦̂= 3.7496𝑥̅ − 145.2782

Step 6: Graph the least square regression line.

𝑦 =3.7496𝑥̅−145.2782
120

100

80

60

40

20

0
0 50 100 150 200 250

Thus, the regression equation is 𝑦̂= 3.7496𝑥̅ − 145.2782. The 𝑏1 of 3.7496 indicates that for each additional
temperature in Fahrenheit, sales are expected to increase by 3.7496 units. The 𝑏0 value of -145.2782
indicates that the intercept with the y-axis is below the origin. A concrete interpretation is that if the
temperature in Fahrenheit is zero, a negative 145.2782 units would be sold.
POST - ASSESSMENT

Direction: Please read, understand and follow the instructions carefully.


(WRITE YOUR ANSWERS ON THE SPACE PROVIDED BELOW)

A. Find the mean, median, mode from the following data: (5pts each)
77 56 47 73 67 84 33 37 49 67

B. At the SM Department Store, there are 10 supervisors, 40 Salesmen, 8 cashiers and 12 baggers. Their
monthly salaries are P25,300, P18,100, P21,500, P14,200, respectively. What is the weighted mean
salary? (5pts)

C. Find the value of the first, second, and third quartiles of the following data: (5pts each)
11, 13, 9, 16, 23, 31, 15, 49, 33, 17 and 52

D. The monthly expenditures of a school department are normally distributed with a mean of P13,800 and
a standard deviation of P3,300. What is the z-value of monthly expenditures of P21,200 and P7,400?
(5pts each)

E. A time study report indicates that an assembly line task should be finished in an average of 8.96 minutes,
with a standard deviation of 0.83 minutes. One particular item had a z-score of 2.12. What was the
completion time of this item? (5pts)

NOTE:
1. SUBMIT YOUR OUTPUT ONTIME, NO OUTPUT=NO ATTENDANCE!!!

You might also like