0% found this document useful (0 votes)
74 views

MMW Module 4

The document discusses data management and statistical concepts. It introduces organizing data through frequency distributions and calculating measures of central tendency, dispersion, and relative position. It also covers probabilities, normal distributions, linear regression, and correlation.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

MMW Module 4

The document discusses data management and statistical concepts. It introduces organizing data through frequency distributions and calculating measures of central tendency, dispersion, and relative position. It also covers probabilities, normal distributions, linear regression, and correlation.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

GE 1 Mathematics in Modern World

Data Management
OVERVIEW
When conducting a statistical research, investigation or study, the researcher
must gather data for the particular variable under investigation. To describe situations,
make conclusions, and draw inferences about events, the researcher must organize the
data gathered in some meaningful way. After organizing data, the next move of the
researcher is to present the data so they can be understood easily by those who will
benefit from reading the study.
Any data set can be characterized by measuring its central tendency. This
module will discuss three different measures of central tendency: the mean median and
mode. An important characteristic of data set is how it is distributed, or how far each
element is from some measure of central tendency. There are several ways to measure
variability of the data. Although the most common and most important is the standard
deviation, which provides an average distance for each element from the mean, several
others are also important. When presenting or analyzing data set it is sometimes
helpful to group subjects into several equal groups. For example, to create four equal
groups we need the values that split the data such that 25% of the observations are in
each group.
In this module we will improve your understanding of a population from which
data was obtained and use that information to help you with decision making. Definition
of statistical measures and how they are obtained will be presented. Significance of
statistical measures will also be discussed.

Lesson Content

1. Introduction to Data Management


2. Measures of Central tendency
3. Measures of Dispersion
4. Measures of Relative Position
5. Probabilities and Normal Distribution
6. Linear Regression and Correlation

Learning Competencies
After completing the module, the learner should be able to:

1. Compare the forms (textual, tabular, and graphical) of data.


2. Identify the essential parts of a table and describe the different kinds of
graphs for data presentation.
3. Draw the graph/table to present data.
4. Analyze and interpret the data presented in a graph or table.
5. Compute the different measures of central tendency for both grouped and
ungrouped data
6. Compute the different measures of dispersion for both grouped and
ungrouped data.
7. Discuss the uses, characteristics, advantages and disadvantages of
measures of dispersions.
8. Perform operations on mathematical expressions correctly.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 48
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

9. Analyze and interpret the data presented in the table using measure of
central tendency.
10. Advocate the use of statistical data in making important decisions.
11. Use a variety of statistical tools to process and manage numerical data.
12. Use linear regression to predict the value of a variable given certain
conditions.
13. Apply correlation to determine the relationship between two variables.
14. Perform operations on mathematical expressions correctly.
15. Articulate the importance of mathematics in one’s life.
16. Express appreciation for mathematics as a human endeavor.
17. Support the use of mathematics in various aspects and endeavors in life.

Motivation Questions:
What is the importance of Data Management in Mathematics?
Is Statistics and data management the same?

Lesson 4.1 Introduction to Data Management

A. Organization of Data

The easiest way and widely used for organizing data is to construct a frequency
distribution table. A frequency distribution is a grouping of the data into
categories showing the number of observations in each of the non – overlapping
classes.

After organizing data, the next step is to present the data so they can be
understood easily by the readers.

Definition of Terms:

(a) Raw Data – the data collected in original form.


(b) Range – the difference between the highest value and the lowest value in
the distribution
(c) Frequency Distribution – the organization of data in tabular form, using
mutually exclusive classes showing the number of observation in each.
(d) Class Limits – the highest and lowest values describing the class.
(e) Class Boundaries – the upper and lower values of a class for group
frequency distribution whose values has additional decimal place more than
the class limits and end with the digit 5.
(f) Interval – the distance between the class lower boundary and class lower
boundary denoted by i.
(g) Frequency – the number of values in a specific class of a frequency
distribution
(h) Cumulative Frequency - the sum of the frequencies accumulated up to the
upper boundary of a class in a frequency distribution.
(i) Midpoint - the point halfway between the class limits of each class and is
representative of the data within that class.

A grouped frequency distribution is used when the range of the data set is
large; the date must be grouped into class whether it is categorical data or
interval data. For interval data, the class is more than one unit in width. The
procedure for constructing the frequency distribution is discussed in the
succeeding sections.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 49
GE 1 Mathematics in Modern World

Categorical Frequency Distribution. This categorical frequency distribution is


used to organize nominal – level or ordinal – level of data. Some examples where
we can apply this distribution are gender, business type, political affiliation and
others.

Example 1:
Twenty applicants were given a performance evaluation appraisal. The data set is
Excellent Very Satisfactory Satisfactory Satisfactory
Excellent Satisfactory Very Satisfactory Satisfactory
Excellent Very Satisfactory Very Satisfactory Very Satisfactory
Satisfactory Very Satisfactory Excellent Excellent
Very Satisfactory Very Satisfactory Excellent Excellent

Construct a frequency distribution for the data.

Solution.

Step 1. Construct a table as shown below.

Classes Tally Frequency Percentage


Excellent
Very Satisfactory
Satisfactory

Step 2. Tally the raw data.

Classes Tally Frequency Percentage


Excellent |||| - ||
Very Satisfactory |||| - |||
Satisfactory ||||

Step 3. Convert the tallied data into numerical frequencies

Classes Tally Frequency Percentage


Excellent |||| - || 7
Very Satisfactory |||| - ||| 8
Satisfactory |||| 5

Step 4. Determine the percentage.


The percentage is computed using the formula:

𝑓
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 = 𝑥 100
𝑛

where f is the frequency of the class and n is the total number of values.

Classes Tally Frequency Percentage


Excellent |||| - || 7 35
Very Satisfactory |||| - ||| 8 40
Satisfactory |||| 5 25

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 50
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

For the sample, more applicants received a very satisfactory


performance rating.

Determining Class Interval

Generally, the number of classes for a frequency distribution table varies from 5
to 20. The decision about the number of classes depends on the method used by
the researcher.

1. Rule 1. To determine the number of classes is to use the smallest positive


integer k such that 2k ≥ n, where n is the total number of observations.

𝑅𝑎𝑛𝑔𝑒 𝐻𝑉−𝐿𝑉
𝑆𝑢𝑔𝑔𝑒𝑠𝑡𝑒𝑑 𝐶𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 (𝑖) = =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑙𝑎𝑠𝑠𝑒𝑠 𝑘

Where: HV = Highest value in the data set


LV = Lowest Value in the data set
k = number of classes
i = suggested class interval

2. Rule 2. Another way to determine the class interval is by applying the formula

𝑅𝑎𝑛𝑔𝑒
𝑆𝑢𝑔𝑔𝑒𝑠𝑡𝑒𝑑 𝐶𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 (𝑖) =
1 + 3.322 (𝑙𝑜𝑔𝑎𝑟𝑖𝑡ℎ𝑚 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠)

Grouped Frequency Distribution

Example 1:
Suppose a researcher wished to do a study on the score of students in an
entrance examination conducted by a certain High School. The research would
have to collect the data by obtaining the scores of the students. The data
collected is presented below.

19 44 24 43 33 29 26 25 29 23
31 33 38 18 33 33 39 33 37 32
36 37 40 24 40 37 57 48 39 48
26 39 42 32 24 30 30 39 35 28
34 45 39 49 46 43 40 34 41 45
32 21 32 33 22 43 33 29 29 19

Construct a frequency distribution using 2k Rule and determine the following:


a. Range e. Percentages
b. Interval f. Cumulative frequencies
c. Class limits g. Midpoints
d. Relative frequencies
Solution:

Step 1: Determine the classes.


• Find the Range
Range = HV – LV = 57 – 18 = 39
• Determine the number of classes

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 51
GE 1 Mathematics in Modern World

The objective is to use just enough classes. We can determine the


number of classes (k) using the 2k Rule. This will enable to select
the smallest number k for the number of classes such that 2 k is
greater than the number of observations (n). Using our example, n
= 60. If we apply k = 6, then 2k = 26 = 64, which is greater then n =
60. Therefore the recommended number of classes is 6.

• Determine the class interval


𝑅𝑎𝑛𝑔𝑒 39
𝐶𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 (𝑖) = = = 6.5 ≈ 7
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑙𝑎𝑠𝑠𝑒𝑠 6

Step 2: Determine frequency for each class by tallying the data.

Class Limits Frequencies


18 – 24 9
25 – 31 11
32 – 38 19
39 – 45 15
46 – 52 5
53 – 59 1

Step 3: Determine the relative frequency. It can be found by dividing each


frequency by the total frequency.

Class Limits Frequencies Relative Frequency


18 – 24 9 0.15
25 – 31 11 0.18
32 – 38 19 0.32
39 – 45 15 0.25
46 – 52 5 0.08
53 – 59 1 0.02

Step 4: Determine the percentage. It can be found by multiplying each relative


frequency by 100.

Class Limits Frequencies Percentage


18 – 24 9 15
25 – 31 11 18
32 – 38 19 32
39 – 45 15 25
46 – 52 5 8
53 – 59 1 2

Step 5: Determine the cumulative frequencies. The cumulative frequency can be


found by adding the frequency in each class to the total frequencies of
the class preceding that class.

Class Cumulative Found by


Frequencies
Limits Frequency
18 – 24 9 9 9
25 – 31 11 20 9 + 11
32 – 38 19 39 9 + 11 + 19
39 – 45 15 54 9 + 11 + 19 + 15
46 – 52 5 59 9 + 11 + 19 + 15 + 5
53 – 59 1 60 9 + 11 + 19 + 15 + 5 + 1

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 52
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

Step 6: Determine the midpoints. The midpoint can be found by getting the
average of the upper and lower limit in each class.

Class Frequencie Found by


Midpoints
Limits s
18 – 24 9 21 ( 18 + 24 ) ÷ 2
25 – 31 11 28 ( 25 + 31 ) ÷ 2
32 – 38 19 35 ( 32 + 38 ) ÷ 2
39 – 45 15 42 ( 39 + 45 ) ÷ 2
46 – 52 5 49 ( 46 + 52 ) ÷ 2
53 – 59 1 56 ( 53 + 59 ) ÷ 2

Example 2:
MRE Travel and Tours, a local travel agency offers special discounts during
summer. The owner wanted to find out the ages of the people who avail special
discounts. A random sample of 40 customers taking the travel last summer
revealed these ages.
24 36 28 34 23 37 28 31 22 39
27 28 45 23 21 55 48 48 43 27
33 29 31 25 26 37 49 25 42 42
28 40 34 27 28 37 51 16 38 32

Construct a frequency distribution using Rule 2.

Solution:

Step 1: Determine the classes.


• Find the Range
Range = HV – LV = 55 – 16 = 39
• Determine the class interval
𝑅𝑎𝑛𝑔𝑒
𝐶𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 (𝑖) =
1 + 3.322 (𝐿𝑜𝑔𝑎𝑟𝑖𝑡ℎ𝑚 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠)

39
=
1 + 3.322(log 40)

39
=
1 + 3.322 (1.60205999133)

39
=
6.3220432912

= 6. 1688916389 ≈ 6
• Class Limits

Select a starting point for the lowest class limit. The starting
point can be the smallest data value or any convenient
number less than the smallest data value. In our case, 16 is
used.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 53
GE 1 Mathematics in Modern World

Class Limits Frequencies


16 – 21
22 – 27
28 – 33
34 – 39
40 – 45
46 – 51

Step 2: Determine frequency for each class by tallying the data.

Class Limits Frequencies


16 – 21 2
22 – 27 10
28 – 33 10
34 – 39 8
40 – 45 5
46 – 51 4
52 – 57 1

Step 3: Determine the relative frequency. It can be found by dividing each


frequency by the total frequency.

Class Limits Frequencies Relative


Frequency
16 – 21 2 0.05
22 – 27 10 0.25
28 – 33 10 0.25
34 – 39 8 0.2
40 – 45 5 0.125
46 – 51 4 0.1
52 – 57 1 0.025
Step 4: Determine the percentage. It can be found by multiplying each relative
frequency by 100.
Class Limits Frequencies Percentage
16 – 21 2 5
22 – 27 10 25
28 – 33 10 25
34 – 39 8 20
40 – 45 5 12.50
46 – 51 4 10
52 – 57 1 2.50

Step 5: Determine the cumulative frequencies. The cumulative frequency can be


found by adding the frequency in each class to the total frequencies of
the class preceding that class.

Class Limits Frequencies Cumulative Found by


Frequency
16 – 21 2 2 2
22 – 27 10 12 2 + 10
28 – 33 10 22 2 + 10 + 10
34 – 39 8 30 2 + 10 + 10 + 8
40 – 45 5 35 2 + 10 + 10 + 8 + 5
46 – 51 4 39 2 + 10 + 10 + 8 + 5 + 4
52 – 57 1 40 2 + 10 + 10 + 8 + 5 + 4 + 1

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 54
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

Step 6: Determine the midpoints. The midpoint can be found by getting the
average of the upper and lower limit in each class.

Class Limits Frequencies Midpoint Found by


16 – 21 2 18.5 ( 16 + 21 ) ÷ 2
22 – 27 10 24.5 ( 22 + 27 ) ÷ 2
28 – 33 10 30.5 ( 28 + 33 ) ÷ 2
34 – 39 8 36.5 ( 34 + 39 ) ÷ 2
40 – 45 5 42.5 ( 40 + 45 ) ÷ 2
46 – 51 4 48.5 ( 46 + 51 ) ÷ 2
52 – 57 1 54.5 ( 52 + 57 ) ÷ 2

CHECK YOUR PROGRESS


A marketing research consultant conducted a survey of 40 persons who used to
visit fastfood chains in one morning. The age of the persons was recorded to the
nearest year as follows:

16 29 32 21 44 44 36 41 24 40
28 30 47 47 34 47 46 27 35 50
26 33 50 46 33 48 38 29 19 27
22 32 53 31 44 42 55 28 40 19

Prepare a frequency distribution by completing the table below using Rule 1 and
Rule 2.

Class Frequency Relative Percentage Cumulative Midpoints


Limits Frequency Frequency

Lesson 4.2 Measures of Central Tendency

A. Mean

Arithmetic mean or simply mean is one of the measures of central tendency


which can be defined as the sum of all observations to be divided by the number
of observations. The symbol 𝑥̅ read as “x bar” is used to represent the mean of
the sample and the Greek letter μ is used to denote the mean of the population.
Now, let us look at the properties of arithmetic mean.

Some Properties of Arithmetic Mean


1) A set of data has only one mean.
2) Mean can be applied for interval and ratio data.
3) All values in the data re included in computing the mean.
4) The mean is very useful in comparing two or more sets.
5) Mean is affected by the extreme values on the set of data.
6) Mean is most appropriate in symmetrical data.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 55
GE 1 Mathematics in Modern World

Sample Mean: Population Mean:


∑𝑥 ∑𝑥
𝑥̅ = 𝜇=
𝑛 𝑁

Where: 𝑥̅ = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 (𝑟𝑒𝑎𝑑 𝑎𝑠 "𝑥 𝑏𝑎𝑟")


𝜇 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 (𝑟𝑒𝑎𝑑 𝑎𝑠 "𝑚𝑢")
𝑥 = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑎𝑛𝑦 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
∑ 𝑥 = 𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑥′𝑠
𝑛 = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒
𝑁 = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛

Example 1:
The daily salaries of a sample of eight employees of Freedomlife Inc.
are: ₱650, ₱550, ₱470, ₱580, ₱500, ₱750, ₱700, ₱450. Find the mean
daily wage of the employees.

Solution 1:
∑x x1 +x2 +x3 +x4 +x5 +x6 +x7 +xn
x̅ = =
n n
650+550+470+580+500+750+700+450
x̅ =
8
4650
𝑥̅ = = 581.25
8
The mean daily wage of the employees is ₱ 581.25.

B. Weighted Mean
Weighted Mean is an average computed by giving different weights to some
of the individual values. If all the weights are equal, then the weighted mean
is the same as the arithmetic mean. The weighted mean is found by
multiplying each vale by the corresponding weight and dividing by the sum of
the weights.
𝑥1 𝑤1 +𝑥2 𝑤2 +𝑥3 𝑤3 +⋯+𝑥𝑛 𝑤𝑛
𝑥̅ =
𝑤1 +𝑤2 +𝑤3 +⋯+𝑤𝑛

Where: 𝑥̅ = 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑚𝑒𝑎𝑛


𝑤1 = 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑤𝑒𝑖𝑔ℎ𝑡
𝑥1 = 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑎𝑛𝑦 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛

Example 1
Suppose that a marketing firm conducts a survey of 1,000 households to
determine the average number of Electric Fans each household owns. The
data show a large number of households with two or three electric fans and a
smaller number with one or four. Every household in the sample has at least
one electric fan and no household has more than four.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 56
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

Here’s the data for the survey:


Number of electric fans
Number of Households
per Household
1 73
2 378
3 459
4 90

Solution:
Step 1. Assign a weight to each value in the data set.
𝑥1 = 1 𝑤1 = 73
𝑥2 = 2 𝑤2 = 378
𝑥3 = 3 𝑤3 = 459
𝑥4 = 4 𝑤4 = 90
Step 2. Compute the weighted mean using the formula.
𝑥1 𝑤1 + 𝑥2 𝑤2 +𝑥3 𝑤3 + 𝑥4 𝑤4
𝑥̅ =
𝑤1 +𝑤2 + 𝑤3 + 𝑤4

(1)(73) + (2)(378) + (3)(459) + (4)(90)


𝑥̅ =
73 + 378 + 459 + 90

73 + 756 + 1377 + 360


𝑥̅ =
1000

2566
𝑥̅ = = 2.566
1000
The mean number of electric fans per household in this sample is 2.566.

C. Median
Whenever the data is arranged in ascending or descending order, it is called
a data array. The median is the midpoint of the data array.

Some Properties of Median


1) A set of data has only one median.
2) Median is not affected by extreme or large values.
3) Median can be applied for ordinal, interval or ratio data,
4) Median is most appropriate in skewed data.

To determine the value of the median for the ungrouped data, we consider
the following rules:
1) Arrange the data in ascending or descending order.
2) If n is odd, the median is the middle value.
3) If n is even, the median is the average of the two middle values.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 57
GE 1 Mathematics in Modern World

𝑛+1
𝑀𝑒𝑑𝑖𝑎𝑛 (𝑅𝑎𝑛𝑘 𝑉𝑎𝑙𝑢𝑒) =
2
𝑁𝑜𝑡𝑒: 𝑛 𝑖𝑠 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑜𝑟 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒

Example 1
Find the median of the ages of 9 top management employees of Villar
Holdings Inc. The ages are 56, 49, 61, 58, 56, 53, 60, 59, and 48.

Solution:

Step 1. Arrange the data in order.


48, 49, 53, 56, 56, 58, 59, 60, 61
Step 2. Determine the middle rank value.
𝑛+1 9+1
𝑀𝑒𝑑𝑖𝑎𝑛 (𝑅𝑎𝑛𝑘 𝑉𝑎𝑙𝑢𝑒) = = =5
2 2

Step 3. Identify the median in the data set.


48, 49, 53, 56, 56, 58, 59, 60, 61

5th

Hence, the median age is 56.

Example 2
The daily salaries of a sample of eight employees of Freedomlife Inc. are:
₱650, ₱550, ₱470, ₱580, ₱500, ₱750, ₱700, ₱450. Find the median daily
wage of the employees

Solution:

Step 1. Arrange the data in order.


₱450, ₱470, ₱500, ₱550, ₱580, ₱650, ₱700, ₱750
Step 2. Determine the middle rank value.
𝑛+1 8+1 9
𝑀𝑒𝑑𝑖𝑎𝑛 (𝑅𝑎𝑛𝑘 𝑉𝑎𝑙𝑢𝑒) = = = = 4.5
2 2 2
Step 3. Identify the median in the data set.
₱450, ₱470, ₱500, ₱550, ₱580, ₱650, ₱700, ₱750

4.5th

Since, the middle point falls between ₱550 and ₱580, we can determine the
median of the data set by getting the average of the two values.

₱550 + ₱580 ₱1,130


𝑀𝑒𝑑𝑖𝑎𝑛 = = = ₱ 565
2 2

Therefore, the median daily wage is ₱ 565.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 58
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

D. Mode
The mode is the number that appears most frequently in a data set. A set of
numbers may have one mode or unimodal, two modes or bimodal, more
than one mode or multimodal, or no mode at all.

Some Properties of Mode


1) The mode is the easiest average to compute.
2) Mode is not affected by extreme values in the data set.
3) Mode can be applied for nominal ordinal, interval and ratio data.

Example 1
The following data represents the total unit sales for brand new cars from a
sample of 10 Car Dealer Shops in Region XII for the 1 st Quarter of 2019: 13,
14, 8, 10, 11, 13, 10, 8, 10, and 9. Find the mode.

Solution:
The ordered array of the data set is 8, 8, 9, 10, 10, 10, 11, 13, 13, 14.

Since 10 appears 3 times more than the other values, therefore the mode is
10.

Example 2
Find the mode of the ages of 9 top management employees of Villar
Holdings Inc. The ages are 56, 49, 61, 58, 56, 53, 60, 59, and 48.

Solution:
The ordered array of data is 48, 49, 53, 56, 56, 58, 59, 60, 61

There is no mode since each of the data has the same frequency.

Example 3
In a crash test, 11 cars were tested to determine what impact speed was
required to obtain minimal bumper damage. Find the mode of the speeds
given in miles per hour below.

24, 15, 18, 20, 18, 22, 24, 26, 18, 26, 24

Solution:
The ordered array of data is 15, 18, 18, 18, 20, 22, 24, 24, 24, 26, 26

Since both 18 and 24 occurs 3 times in the data set, we have two modes and
the data is considered bimodal.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 59
GE 1 Mathematics in Modern World

Lesson 4.3 Measures of Dispersion

Dispersion is the difference between the actual value and the average value.
Measure of dispersion shows the scatterings of the data. It tells the variation of
the data from one another and gives a clear idea about the distribution of the
data. The measure of dispersion shows the homogeneity or the heterogeneity of
the distribution of the observations.

A. Range

A range is the most common and easily understandable measure of dispersion. It


is the difference between two extreme observations of the data set. Advantages of
range includes (i) easy to compute and (ii) easy to understand. In contrast, its
disadvantages includes (i) affected by extreme values and (ii) only two values are
used in the calculation.

Example 1
The daily salaries of a sample of eight employees of Freedomlife Inc. are: ₱650,
₱550, ₱470, ₱580, ₱500, ₱750, ₱700, ₱450. Find the range.

Solution:

Step 1: Identify the highest value and the lowest value in the data set.
HV = ₱750 LV = ₱450

Step 2: Solve for range.


Range = HV – LV = ₱750 - ₱450 = ₱300

The range in daily salary is ₱300.

B. Variance and Standard Deviation

Variance is a measurement of the spread between numbers in a data set. That


is, it measures how far each number in the set is from the mean and therefore
from every other number in the set.

The standard deviation is a statistic that measures the dispersion of a dataset


relative to its mean and is calculated as the square root of the variance. The
standard deviation is calculated as the square root of variance by determining
each data point's deviation relative to the mean. If the data points are further
from the mean, there is a higher deviation within the data set; thus, the more
spread out the data, the higher the standard deviation.

Sample Variance and Sample Standard Deviation for Ungrouped Data

Variance Standard Deviation

∑(𝑥− 𝑥̅ )2 ∑(𝑥− 𝑥̅ )2
𝑠2 = 𝑠=√
𝑛−1 𝑛−1

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 60
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

(∑ 𝑥)2 2 −(∑ 𝑥)
2
∑ 𝑥2− ∑𝑥
𝑠2 = 𝑛
𝑠= √ 𝑛
𝑛−1 𝑛−1

Where:
𝑠2 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑠 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑥 = 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑎𝑛𝑦 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑥̅ = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛
𝑛 = 𝑠𝑎𝑚𝑝𝑙𝑒

Example 2

The daily salaries of a sample of eight employees of Freedomlife Inc. are: ₱650,
₱550, ₱470, ₱580, ₱500, ₱750, ₱700, ₱450. Find the variance and standard
deviation.

Solution:

Step 1: Compute for the mean of the data set.

∑x x +x +x +x +x +x +x +x
x̅ = = 1 2 3 4 5 6 7 n
n n
650+550+470+580+500+750+700+450
x̅ =
8
4650
𝑥̅ = = 581.25
8

Step 2: Subtract the mean from each of the value in the data set.

𝑥 𝑥 − 𝑥̅
650 68.75
550 -31.25
470 -111.25
580 -1.25
500 -81.25
750 168.75
700 118.75
450 -131.25
∑ 𝑥 = 4650 ∑(𝑥 − 𝑥̅ ) = 0

Step 3: Square the 𝑥 − 𝑥̅ , then get the sum.

𝒙 ̅
𝒙− 𝒙 ̅ )𝟐
(𝒙 − 𝒙
650 68.75 4,726.5625 (68.75)2 = 4,726.5625
550 -31.25 976.5625
470 -111.25 12,376.5625

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 61
GE 1 Mathematics in Modern World

580 -1.25 1.5625


500 -81.25 6,601.5625
750 168.75 28,476.5625
700 118.75 14,101.5625
450 -131.25 17,226.5625

∑ 𝑥 = 4650 ∑(𝑥 − 𝑥̅ ) = 0 ∑(𝑥 − 𝑥̅ )2 = 84,487.5

Step 4: Solve for the variance and standard deviation. We can simply obtain the
standard deviation by extracting the square root of the variance.

Variance Standard Deviation

∑(𝑥− 𝑥̅ )2 ∑(𝑥− 𝑥̅ )2
𝑠2 = 𝑠=√
𝑛−1 𝑛−1

84,487.5 84.487.5
𝑠2 = 𝑠=√
8−1 8−1

84,487.5 84.487.5
𝑠2 = 𝑠=√
7 7

𝑠2 = 12,069.64 𝑠 = √12,069.64

𝑠 = 109. 86

Hence the variance is ₱12,069.64 and the standard deviation is ₱ 109.86

Alternative Solution: An alternative solution can be done using the other


formulas.

Step 1: Get the sum of the data set and square the values in the data set and
get also the sum.

𝑥 𝑥2
650 422500
550 302500
470 220900
580 336400
500 250000
750 562500
700 490000
450 202500
∑ 𝑥 = 4650 ∑ 𝑥 2 = 2787300

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 62
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

Step 2: Solve for the variance and standard deviation.


Variance Standard Deviation

(∑ 𝑥)2 2 −(∑ 𝑥)
2
∑ 𝑥2− ∑𝑥
𝑠2 = 𝑛
𝑠= √ 𝑛
𝑛−1 𝑛−1

(4650)2 (4650)2
2,787,300− 2,787,300−
𝑠2 = 8
𝑠= √ 8
8−1 8−1

21,622,500 21,622,500
2,787,300− 2,787,300−
𝑠2 = 8
𝑠= √ 8
7 7

2,787,300− 2,702,812.5 2,787,300− 2,702,812.5


𝑠2 = 𝑠= √
7 7

2,787,300− 2,702,812.5 2,787,300− 2,702,812.5


𝑠2 = 𝑠= √
7 7

𝑠2 = 12, 069.64 𝑠 = √12, 069.64

𝑠 = 109. 86

Population Variance and Population Standard Deviation for Ungrouped


Data

Variance Standard Deviation

∑(𝑥− 𝜇)2 ∑(𝑥− 𝜇)2


𝜎2 = 𝜎=√
𝑁 𝑁
Where:
𝜎 2 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝜎 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑥 = 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑎𝑛𝑦 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝜇 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛
𝑁 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛

Example 3
The monthly income of five research directors of Recoletos schools are: ₱
55,000, ₱59,500, ₱62,500, ₱57,000, and ₱61,000. Find the variance and
standard deviation.

Solution:

Step 1: Compute the mean of the data set.


∑𝑥 55,000+59,500+62,500+57,000+61,000 295,000
𝜇= = = = 59,000
𝑁 5 5

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 63
GE 1 Mathematics in Modern World

Step 2: Subtract the population mean from each of the value in the data set.

𝑥 𝑥− 𝜇
55,000 -4,000
59,500 500
62,500 3,500
57,000 -2,000
61,000 2,000

Step 3: Get the square of 𝑥 − 𝜇 and get the sum of the squares.

𝑥 𝑥− 𝜇 (𝑥 − 𝜇)2
55,000 -4,000 16,000,000
59,500 500 250,000
62,500 3,500 12,250,000
57,000 -2,000 4,000,000
61,000 2,000 4,000,000
∑ 𝑥 = 295,000 ∑ (𝑥 − 𝜇 ) = 0 ∑(𝑥 − 𝜇)2 = 36,500,000

Step 4: Solve for the population variance and population standard deviation

∑(𝑥− 𝜇)2 ∑(𝑥− 𝜇)2


𝜎2 = 𝜎=√
𝑁 𝑁
36,500,000 36,500,000
𝜎2 = 𝜎= √
5 5
𝜎 2 = 730,000 𝜎 = √730,000

𝜎 = 2, 701.85
Hence, the population variance is 730,000 and the population standard deviation
is 2,701.85.

CHECK YOUR PROGRESS


A time – study analyst observed a packaging operation and collected the
following times (in seconds) required for the operation to fill packages of a fixed
volume box: 11, 12, 15, 18, 13, 18, 16, 14, 12, and 17. Find the range, variance
and standard deviation.

Lesson 4.4 Measures of Relative Position

A. Quartiles
A quartile is a statistical term describing a division of observations into four
defined intervals based upon the values of the data and how they compare to
the entire set of observations. It divides data into three points – a lower quartile,
median, and upper quartile – to form four groups of the data set. The lower
quartile or first quartile is denoted as Q1 and is the middle number that falls
between the smallest value of the data set and the median. The second
quartile, Q2, is also the median. The upper or third quartile, denoted as Q3, is
the central point that lies between the median and the highest number of the
distribution.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 64
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

There are four groups formed from the quartiles. The first group of values
contains the smallest number up to Q1; the second group includes Q1 to the
median; the third set is the median to Q3; the fourth category comprises Q3 to
the highest data point of the entire set.

Each quartile contains 25% of the total observations. Generally, the data is
arranged from smallest to largest:

1. First quartile: the lowest 25% of numbers


2. Second quartile: between 25.1% and 50% (up to the median)
3. Third quartile: 51% to 75% (above the median)
4. Fourth quartile: the highest 25% of numbers

𝑘(𝑁+1)
𝑄𝑘 =
4

Where: 𝑄𝑘 = 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒
𝑁 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝑘 = 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛

Example 1
Find the first, second and third quartile of the ages of 9 top management
employees of Villar Holdings Inc. The ages are 56, 49, 61, 58, 56, 53, 60, 59,
and 48.

Solution:

Step 1: Arrange the data in order.


48, 49, 53, 56, 56, 58, 59, 60, 61

Step 2: Select the first, second and third quartile values using the formula.

1(𝑁+1) 1( 9+1) 10
𝑄1 = = = = 2.5
4 4 4

2(𝑁+1) 2( 9+1) 20
𝑄2 = = = =5
4 4 4

3(𝑁 + 1) 3( 9 + 1) 30
𝑄3 = = = = 7.5
4 4 4

Step 3: Identify the first, second and third quartile values in the data set.
48, 49, 53, 56, 56, 58, 59, 60, 61
↑ ↑ ↑
th th
2.5 5 7.5th

Since the 2.5th falls between 49 and 53; and 7.5th falls between 59 and 60,
we can determine the first and third quartile of the data set by getting the
average of the two values.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 65
GE 1 Mathematics in Modern World

49+53 102 59+60


𝑄1 = = = 51 𝑄3 = = 59.5
2 2 2

Therefore, 𝑄1 = 51 𝑄2 = 56 𝑄3 = 59.5

B. z – scores
A z – score is a numerical measurement that describes a value's relationship to
the mean of a group of values. z – score is measured in terms of standard
deviations from the mean. If a z – score is 0, it indicates that the data point's
score is identical to the mean score. A z – score of 1.0 would indicate a value
that is one standard deviation from the mean. z – scores may be positive or
negative, with a positive value indicating the score is above the mean and a
negative score indicating it is below the mean.

A z- score measures the distance between an observation and the mean,


measured in unit of the standard deviation. The formulas show how to compute
the z – score for a data value 𝓍 in a population and in a sample.

𝑥− 𝜇 (𝑥− 𝑥̅ )
𝑧= (for population) 𝑧= (for sample)
𝜎 𝑠

Example 1
Books in the library are found to have average length of 350 pages with
standard deviation of 100 pages. What is the z-score corresponding to a book
of length 80 pages?

Solution:
Let 𝜇 = 350 𝜎 = 100 𝑥 = 80
𝑥− 𝜇
𝑧=
𝜎

80− 350
𝑧= 𝑧 = −2.7
100

C. Box – and – Whisker Plot


A box and whisker plot (sometimes called a boxplot) is a graph that presents
information from a five-number summary. It is especially useful for indicating
whether a distribution is skewed and whether there are potential unusual
observations (outliers) in the data set. Box and whisker plots are also very
useful when large numbers of observations are involved and when two or more
data sets are being compared.

The boxplot will give the following information:


1. If the median is near the center of the box, the distribution is
approximately symmetric.
2. If the median falls to the right of the center of the box, the distribution
is negatively skewed.
3. If the median falls to the left of the center of the box, the distribution is
positively skewed.
4. If the lines are about the same length, the distribution is
approximately symmetric.
5. If the left line is longer than the right line, the distribution is negatively
skewed.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 66
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

6. If the right line is longer than the left line, the distribution is positively
skewed

Example 1
Construct a box plot for the following data:
12, 5, 22, 30, 7, 36, 14, 42, 15, 53, 25
Solution:
Step 1: Arrange the data in ascending order.
5, 7, 12, 14, 15, 22, 25, 30, 36, 42, 53

Step 2: Determine the 𝑳𝒐𝒘𝒆𝒔𝒕 𝒗𝒂𝒍𝒖𝒆, 𝑸𝟏 , 𝑸𝟐 (𝑴𝒆𝒅𝒊𝒂𝒏), 𝑸𝟑 𝒂𝒏𝒅 𝑯𝒊𝒈𝒉𝒆𝒔𝒕 𝒗𝒂𝒍𝒖𝒆


of the given set of data.
1(𝑁+1) 1(11+1)
𝑄1 = = =3
4 4

5, 7, 12, 14, 15, 22, 25, 30, 36, 42, 53



3rd

2(𝑁+1) 2(11+1)
𝑄2 = = =6
4 4

5, 7, 12, 14, 15, 22, 25, 30, 36, 42, 53



6th
3(𝑁+1) 3(11+1)
𝑄3 = = =9
4 4

5, 7, 12, 14, 15, 22, 25, 30, 36, 42, 53



9th

𝐿𝑜𝑤𝑒𝑠𝑡 𝑉𝑎𝑙𝑢𝑒 = 5 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑉𝑎𝑙𝑢𝑒 = 53

Step 3: Draw a number line that will include the smallest and the largest data.

Step 4: Draw three vertical lines at the lower quartile (12), median (22) and the
upper quartile (36), just above the number line.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 67
GE 1 Mathematics in Modern World

Step 5: Join the lines for the lower quartile and the upper quartile to form a box.

Step 6: Draw a line from the smallest value (5) to the left side of the box and
draw a line from the right side of the box to the biggest value (53).

Based on the boxplot, the distribution is positively skewed.

Lesson 4.5 Probabilities and Normal Distributions

Normal distribution, also known as the Gaussian distribution, is a probability


distribution that is symmetric about the mean, showing that data near the mean
are more frequent in occurrence than data far from the mean. In graph form,
normal distribution will appear as a bell curve.

The normal curve was developed mathematically in 1733 by Abraham de Moivre


(1667 – 1754) as a n approximation to the binomial distribution. His paper was not
discovered until 1924 by Karl Pearson (1857 – 1936). Pierre – Simon Laplace
(1749 – 1827) used the normal curve in 1783 to describe the distribution of errors.
Subsequently, Carl Friedrich Gauss ( 1777 – 1855) used the normal curve to
analyse astronomical data in 1809. The normal curve is often called the Gaussian
distribution. The normal distribution can be used to describe, at least
approximately, any variable that tends to cluster around the mean.

Normal Distribution Curve

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 68
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

A normal distribution is a continuous, symmetric, bell – shaped distribution of


variable. The known characteristics of the normal curve make it possible to
estimate the probability of occurrence of any value of a normally distributed
variable. The properties of the normal distribution are as follows:

1. The distribution id bell – shaped.


2. The mean, median and the mode are equal and are located at the center of
the distribution.
3. The normal distribution is unimodal.
4. The normal distribution curve is symmetric about the mean.
5. The normal distribution is continuous.
6. Then normal curve is asymptotic.
7. The total area under the normal distribution is 1.00 or 100%.
8. The area under the part of a normal curve that lies within 1 standard
deviation of the mean is 68%; within 2 standard deviation from the mean is
about 95%; and within 3 standard deviations is about 99.7%.

A. Standard Normal Distribution

A normal distribution can be converted into a standard normal distribution by


obtaining the z – value. The z value is the signed distance between a selected
value, designated x, and the mean μ, divided by the standard deviation. It is
also called z – scores, the z statistics, the standard normal deviates, or the
standard normal values. In terms of formula:
𝑥− 𝜇
𝑧=
𝜎

Where: 𝑧 = 𝑧 𝑣𝑎𝑙𝑢𝑒
𝑥 = 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑎𝑛𝑦 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑟 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑚𝑒𝑛𝑡
𝜇 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
𝜎 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛

The normal distribution property allows to compute a probability problem


concerning x into one concerning z. to determine the probability that x lies in a
given interval, convert the interval to a z scale and then compute the probability by
using the standard normal distribution table. (see Appendix A)

Example 1
Determine the area under the standard normal distribution curve between z = 0
and z = 1.35.

Solution: Draw the figure and represent the area as shown in the figure below.

𝑃(0 < 𝑧 < 1.35) = 0.4115

0 1.35

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 69
GE 1 Mathematics in Modern World

Hence the area is. 0.4115 or 41.15%

Example 2
Determine the area under the standard normal distribution curve between z = 0
and z = -1.85.

𝑃(−1.85 < 𝑧 < 0) = 0.4678

-1.85 0

Hence the area is. 0.4678 or 46.78%

Example 3
Find the area under the standard normal distribution to the right of 1.25.

0 1.25

The required area is the right tail of the normal curve. Since Table A gives the area
between z = 0 and z = 1.25, first find that area.

𝑃(0 < 𝑧 < 1.25) = 0.3944.

Then subtract 𝑃(0 < 𝑧 < 1.25) = 0.3944 from 0.5000, since half of the area under
the curve is to the right of z = 0.
𝑃(𝑧 > 1.25) = 0.5000 − 𝑃(0 < 𝑧 < 1.25)
𝑃(𝑧 > 1.25) = 0.5000 − 0.3944
𝑃(𝑧 > 1.25) = 0.1056

The area to the right of z = 1.25 is 0.1056 or 10.56%.

Example 4
Determine the area under the standard normal distribution curve between z = 0.5
and z = 1.75.

Solution: Draw the figure and represent the area.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 70
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

0.2684

0 0.5 1.75

𝑃(0 < 𝑧 < 1.75) = 0.4599 𝑃(0 < 𝑧 < 0.5) = 0.1915

𝑃(0.5 < 𝑧 < 1.75) = 𝑃(0 < 𝑧 < 1.75) − 𝑃(0 < 𝑧 < 0.5)
𝑃(0.5 < 𝑧 < 1.75) = 0.4599 − 0.1915
𝑃(0.5 < 𝑧 < 1.75) = 0.2684

Therefore, the area is 0.2684 or 26.84%

Example 5
Determine the area under the standard normal distribution curve between z = 1.25
and z = - 1.5.

Solution: Draw the figure and represent the area.

-1.5 0 1.25

𝑃(−1.5 < 𝑧 < 0) = 0.4332 𝑃(0 < 𝑧 < 1.25) = 0.3944

Since the two areas are on the opposite sides of z = 0, we must find both areas
and add them.

𝑃(−1.5 < 𝑧 < 1.25) = 𝑃(−1.5 < 𝑧 < 0) + 𝑃(0 < 𝑧 < 1.25) = 0.4332 + 0.3944 =
0.8276

Hence the total area is 0.8276 or 82.76%.

Example 6
Find the z value such that the area under the standard normal distribution curve
between 0 and z value is 0.4625.

Solution: Draw the figure. Find the area in table A. Then connect z value in the left
column 1.7 and in the top as 0.08, and add these two values to get 1.78.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 71
GE 1 Mathematics in Modern World

0.4625

0 z

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633

1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 72
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

B. Application of Standard Normal Distribution

Example 1
A radar unit is used to measure speeds of cars on a motorway. The speeds are
normally distributed with a mean of 90km/hr and a standard deviation of 10km/hr. What
is the probability that a car picked at random is travelling at more than 100 km/hr?

Solution:

Step 1. Draw the figure and represent the area.

P( x > 100)

90 100

Step 2. Find the z value for 100.


𝑥−𝜇
𝑧=
𝜎
100 − 90 10
𝑧= = =1
10 10

Step 3. Find the area.

𝑃(𝑧 > 1) = 𝑡𝑜𝑡𝑎𝑙 𝑎𝑟𝑒𝑎 − 𝑎𝑟𝑒𝑎 𝑡𝑜 𝑡ℎ𝑒 𝑙𝑒𝑓𝑡 𝑜𝑓 𝑧 = 1


𝑃(𝑧 > 1) = 1 − (0.5000 + 0.3413)
𝑃(𝑧 > 1) = 1 − (0.8413)
𝑃(𝑧 > 1) = 0.1587

The probability that a car selected at random has a speed greater than 100
km/hr is equal to 0.1587 or 15.87%

Example 2

For certain types of computers, the length of time between charges of battery is
normally distributed with a mean of 50 hrs and a standard deviation of 15 hrs.
John owns one of these computers and wants to know the probability that the
length of time will be between 50 to 70 hrs.

Solution:

Step 1. Draw the figure and represent the area.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 73
GE 1 Mathematics in Modern World

P ( 50 < x < 70)


100)

50 70

Step 2. Find the z value for 50 and 70.

a. x = 50 b. x = 70
𝑥−𝜇 𝑥−𝜇
𝑧= 𝑧=
𝜎 𝜎

50−50 70−50 20
𝑧= =0 𝑧= = = 1.33
15 15 15

Step 3. Find the area.

𝑃( 0 < 𝑧 < 70) = 0.4082 or 40.82%

Hence the probability that the length between charges of John’s computer is
between 50 to 70 hrs is 40.82%

Example 3
Entry to a certain University is determined by a national test. The scores on this
test are normally distributed with a mean of 500 and a standard deviation of 100.
Tom wants to be admitted to this University and he knows that he must score
better than at least 70% of the students who took the test. Tom takes the test
and scores 585. Will he be admitted to this University?

Solution:

Step 1. Draw the figure and represent the area.

500 585

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 74
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

Step 2. Find the z value for 585.


𝑥−𝜇
𝑧=
𝜎
585−500 85
𝑧= = = .85
100 100

Step 3. Find the area z < 0.85.


𝑃(𝑧 < 0.85) = 0.5000 + 𝑃(𝑧 = 0.85)
𝑃(𝑧 < 0.85) = 0.5000 + 0.3023 = 0.8023 𝑜𝑟 80.23%

Tom scored better than 80.23% of the students who took the test and he will be
admitted to the University.

Example 4
The length of life of an instrument produced by a machine has a normal
distribution with a mean of 12 months and standard deviation of 2 months. Find
the probability that an instrument produced by this machine will last

(a) less than 7 months


(b) between 7 and 12 months

Solution (a):

Step 1. Draw the figure and represent the area.

7 12

Step 2. Find the z value for 7.


7−12 −5
𝑧= = = −2.5
2 2

Step 3. Find the area z < -2.5


𝑃(𝑧 < −2.5) = 1 − (0.5000 + 𝑃(𝑧 = −2.5)
𝑃(𝑧 < −2.5) = 1 − (0.5000 + .4938)
𝑃(𝑧 < −2.5) = 1 − (0.9938)
𝑃(𝑧 < −2.5) = 1 − (0.9938)
𝑃(𝑧 < −2.5) = 0.0062

Solution (b):

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 75
GE 1 Mathematics in Modern World

Step 1. Draw the figure and represent the area.

7 12
12

Step 2. Find the z value for 7.


7−12 −5
𝑧= = = −2.5
2 2

Step 3. Find the z value for 12.


12−12 0
𝑧= = = 0
2 2

Step 4. Find the 𝑃(−2.5 < 𝑧 < 0)

𝑃(−2.5 < 𝑧 < 0) = .4938 = 49.38%

Hence the probability that an instrument produced by this machine will last
between 7 and 12 months is 49.38%.

Example 5

The length of similar components produced by a company are approximated by


a normal distribution model with a mean of 5 cm and a standard deviation of
0.02 cm. If a component is chosen at random

a) what is the probability that the length of this component is between 4.98 and
5.02 cm?

b) what is the probability that the length of this component is between 4.96 and
5.04cm?

Solution (a):
Step 1. Draw the figure and represent the area.

4.98 5.02

Step 2. Find the z value for 4.98 and 5.02.

𝑓𝑜𝑟 𝑧 = 4.98

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 76
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

4.98−5.0 −0.02
𝑧= = = −1
0.02 0.02

𝑓𝑜𝑟 𝑧 = 5.02

5.02 − 5.0 0.02


𝑧= = = 1
0.02 0.02

Step 3. Find the 𝑃(−1 < 𝑧 < 1)

𝑃(−1 < 𝑧 < 1) = 0.3413 + 0.3413 = 0.6826 = 68.26%

Hence, the probability that the length of the component is between 4.98 and
5.02 is 68.26%

Solution (b):
Step 1. Draw the figure and represent the area.

4.96 5.04

Step 2. Find the z value for 4.96 and 5.04.

𝑓𝑜𝑟 𝑧 = 4.96

4.96−5.0 −0.04
𝑧= = = −2
0.02 0.02

𝑓𝑜𝑟 𝑧 = 5.04

5.04 − 5.0 0.04


𝑧= = = 2
0.02 0.02

Step 3. Find the 𝑃(−2 < 𝑧 < 2)

𝑃(−2 < 𝑧 < 2) = 0.4772 + 0.4772 = 0.9544 = 95.44%

Hence, the probability that the length of the component is between 4.96 and
5.04 is 95.44%

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 77
GE 1 Mathematics in Modern World

Lesson 4.6 Linear Regression and Correlation

Correlation shows the strength of a relationship between two variables and is


expressed numerically by the correlation coefficient. A variable here is the
characteristic of the population being observed or measured. Correlation refers to
the departure of two random variables from independence.

A. Pearson Product Moment Correlation

Pearson Product Moment Correlation coefficient or simply correlation coefficient


is the most widely used on statistics to measure the degree of relationship between
the linear related variables. It is a measure of the association between two variables.
It is founded by Karl Pearson. The Pearson r correlation require that both variables
be normally distributed. The correlation coefficient's values range between -1.0 and
1.0. A perfect positive correlation means that the correlation coefficient is exactly 1.
A perfect negative correlation means that two assets move in opposite directions,
while a zero correlation implies no linear relationship at all. As the value of the
correlation coefficient goes closer to zero, the relationship between the two variables
will be weaker.

The correlation coefficient is defined as the covariance divided by the standard


deviations of the variables. The following formula is used to calculate the Pearson r
correlation:

∑(𝑥− 𝑥̅ )(𝑦− 𝑦̅) 𝑛 ∑ 𝑥𝑦−(∑ 𝑥)(∑ 𝑦)


𝑟= or 𝑟=
√[∑(𝑥− 𝑥̅ )2 ][∑(𝑦− 𝑦̅)2 ] √[𝑛(∑ 𝑥 2 )−(∑ 𝑥)2 ][𝑛(∑ 𝑦 2 )−(∑ 𝑦)2 ]

The following summarizes the correlation coefficient and the strength of


relationships:

0.00 → no correlation, no relationship


± 0.01 to ± 0.20 → very low correlation, almost negligible relationship
± 0.21 to ± 0.40 → slight correlation, definite but small relationship
± 0.41 to ± 0.70 → moderate correlation, substantial relationship
± 0.71 to ± 0.90 → high correlation, marked relationship
± 0.91 to ± 0.99 → very high correlation, very dependable relationship
±1.00 → perfect correlation, perfect relationship

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 78
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

A test of significance for the coefficient of correlation may be used to find


out if the Pearson’s r could have occurred in a population in which the two
variables are related or not. The test statistic follows the t distribution with n – 2
degrees of freedom. The significance is computed using the formula of t test as
shown below:

𝑟 √𝑛−2
𝑡= where: 𝑡 = 𝑡 − 𝑡𝑒𝑠𝑡 𝑓𝑜𝑟 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡
√1−𝑟 2
𝑟 = 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡
𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑎𝑖𝑟𝑒𝑑 𝑠𝑎𝑚𝑝𝑙𝑒𝑠

Assumptions:
1) Samples are randomly selected.
2) Both populations are normally distributed.

Procedure for Pearson Product Moment Correlation test:


1) Set up the hypotheses:
H0: ρ = 0 (The correlation in the population is zero.
H1: ρ ≠ 0, ρ > 0, ρ < 0 (The correlation in the population is different
from zero.) where ρ = correlation in the population.
2) Set the level of significance.
3) Calculate the degrees of freedom (df = n – 2) and determine the
critical value of t.
4) Calculate the value of Pearson’s r.
5) Calculate the value of t and determine the statistical decision for
hypothesis testing:

If tcomputed < tcritical, do not reject H0.


If tcomputed ≥ tcritical, reject H0.

6) State the conclusion.

The test for correlation coefficient is two – tailed; the rejection region is
divided into two equal parts. The figure below illustrates the rejection and non-
rejection region of the test of hypothesis of correlation coefficient.

When the null hypothesis has been rejected for a specific significance level,
there are possible relationship between x and y variables.

1) There is a direct cause – and – effect relationship between the two variables.
2) There is a reverse cause – and – effect relationship between the two
variables.
3) The relationship between the two variables may be caused by the third
variable.
4) There may be a complexity of interrelationship among many variables.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 79
GE 1 Mathematics in Modern World

5) The relationship between the two variables may be coincidental.

Example 1
A mathematics instructor at a university would like to examine the relationship (if any)
between the number of optional homework problems students do during the semester
and their final grade. She randomly selected 12 students for study and ask them to keep
track of the number of these problems completed during the course in the semester. At
the end of each class, each student’s total is recorded along with their final grade. The
data is in the table below.

Student 1 2 3 4 5 6 7 8 9 10 11 12
# of Problems 51 58 62 65 68 76 77 78 78 84 85 91
Final Grade 62 68 66 66 67 72 73 72 78 73 76 75

Plot the data on a scatter diagram. Does it appear that there is a relationship
between the number of optional problems students do and their final grade?
Compute the coefficient correlation. Determine at the 0.05 significance level
whether the correlation in the population is greater than zero.

Solution:

Step 1. Graph the scatter plot.

Final Grade
90
80
70
No. of Problems

60
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100

Step 2. State the Hypotheses

H0: r = 0 (There is no correlation between the No of optional problems


students do and their final grades.)
H1: r ≠ 0 (There is a correlation between the No. of optional problems
students do and their final grades.)

Step 3. Determine the level of significance.


The level of significance is α = 0.05

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 80
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

Step 4: Determine the degrees of freedom and the critical value of t based on the
table of critical values (t – distribution table).
df = n- 2 = 12 – 2 = 10 and t = ±2.228

Step 5: Compute for the value of r (Pearson Product Moment Correlation


Coefficient)

x
y
Student (No. of
(Final x2 y2 xy
number optional
Grade)
problems)
1 51 62 2601 3844 3162
2 58 68 3364 4624 3944
3 62 66 3844 4356 4092
4 65 66 4225 4356 4290
5 68 67 4624 4489 4556
6 76 72 5776 5184 5472
7 77 73 5929 5329 5621
8 78 72 6084 5184 5616
9 78 78 6084 6084 6084
10 84 73 7056 5329 6132
11 85 76 7225 5776 6460
12 91 75 8281 5625 6825
∑𝑥 ∑𝑦 ∑ 𝑥2 ∑ 𝑦2 ∑ 𝑥𝑦
Total
= 873 = 848 = 65093 = 60180 = 62254

𝑛 ∑ 𝑥𝑦 − (∑ 𝑥)(∑ 𝑦)
𝑟=
√[𝑛(∑ 𝑥 2 ) − (∑ 𝑥)2 ][𝑛(∑ 𝑦 2 ) − (∑ 𝑦)2 ]

12(62254) − (873)(848)
𝑟=
√[12(65093) − (873)2 ][12(60180) − (848)2

747048 − 740304
𝑟=
√[781116 − 762129][722160 − 719104

6744
𝑟=
√[18987][3056]

6744
𝑟=
√58024272
6744
𝑟=
7617.37

𝑟 = 0.8853449

𝑟 = 0.89

The coefficient of correlation, 𝑟 = 0.89 between the number of optional problems a


student do and student’s final grade in the course is a high positive correlation, that

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 81
GE 1 Mathematics in Modern World

is, an increase in the number of optional problems a student do is highly associated


with the increase in the final grade in the course.

Step 6. Decision Rule:

In order, to make a decision, on the significant relationship, we need to determine


the value of t.

𝑟√𝑛 − 2
𝑡=
√1 − 𝑟 2

0.89√12 − 2
𝑡=
√1 − (0.89)2

0.89√10
𝑡=
√1 − .7921

0.89(3.16)
𝑡=
√0.2079

2.8124
𝑡=
0.4560

𝑡 = 6.1675

Since the computed value of 6.6175 is greater than the tabular value of 2.228 at α =
0.05, we need to reject the null hypothesis.

Step 7: Conclusion:

Since the null hypothesis has been rejected, we can conclude that there is evidence
that shows significant association between optional problems a student and final
grade in the course.

B. Simple Linear Regression Analysis

Regression analysis is a simple statistical tool used to model the dependence


of a variable on one (or more) explanatory variables. The functional relationship
with may then be formally stated as an equation, with associated statistical
values that describe how well this equation fits the data.

Simple linear regression is a linear regression model with a single explanatory


variable. That is, it concerns two-dimensional sample points with one
independent variable and one dependent variable and finds a linear function (a
non-vertical straight line) that, as accurately as possible, predicts the dependent
variable values as a function of the independent variable.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 82
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

The least square model determines the regression equation by minimizing the
sum of squares of the vertical distances between the actual y values and the
predicted values of y. This method gives what is generally known as the “best
fitting” line. The difference between an observed and predicted value is called
the residual. The mean of the residuals is always zero. The points that fall
outside the overall pattern of the other points is called the outliers.
In a scatterplot, there are scores whose removal greatly changes the regression
line which are called influential scores. In some cases, these scores are
restricted to points with extreme x – values. Some influential scores may have a
small residual but still have a greater effect on the regression line than scores
with possibly larger residuals but average x – values.

𝑛(∑ 𝑥𝑦)−(∑ 𝑥)(∑ 𝑦)


𝑦̂ = 𝑏1 𝑥 + 𝑏0 𝑏1 = 2 𝑏0 = 𝑦̅ − 𝑏1 𝑥̅
𝑛(∑ 𝑥 2 )−(∑ 𝑥)

Where: 𝑦̂ = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑜𝑟 𝑓𝑖𝑡𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑦

𝑥 = 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑎𝑛𝑦 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒

𝑦
= 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑎𝑛𝑦 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒

𝑏1 = 𝑡ℎ𝑒 𝑠𝑙𝑜𝑝𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛


𝑏0 = 𝑡ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑙𝑖𝑛𝑒

𝑥̅ = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒


𝑦̅ = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒

Example 2

Referring to the Example 1 involving optional problems students do and their


final grade is the course, determine the regression equation. Plot the regression
line and interpret it.
Solution:
Computation of simple linear regression equation
Step 1. Obtain the sum of x, y, x2, y2, and xy. Recall that we have already obtain
the values)

∑ 𝑥 = 873 ∑ 𝑦 = 848 ∑ 𝑥 2 = 65093 ∑ 𝑦 2 = 60180 ∑ 𝑥𝑦 = 62254

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 83
GE 1 Mathematics in Modern World

Step 2. Compute for slope of the simple linear regression.


𝑛(∑ 𝑥𝑦) − (∑ 𝑥)(∑ 𝑦)
𝑏1 = 2
𝑛(∑ 𝑥 2 ) − (∑ 𝑥)
12(62254) − (873)(848)
𝑏1 =
12(65093) − (873)2
747048 − 740304
𝑏1 =
781116 − 762129
6744
𝑏1 = = 0.35519
18987
Step 3. Compute for the mean value of x and y.
∑𝑥 873 ∑𝑦 848
𝑥̅ = = = 72.75 𝑦̅ = = = 70.67
𝑛 12 𝑛 12

Step 4. Compute for the intercept of the simple linear regression.


𝑏0 = 𝑦̅ − 𝑏1 𝑥̅

𝑏0 = 70.67 − (0.35519)(72.75)
𝑏0 = 70.67 − 25.84

𝑏0 = 44.83
Step 5. Substitute the slope and intercept in the general simple linear regression
equation.
𝑦̂ = 𝑏1 𝑥 + 𝑏0

𝑦̂ = 0.35519𝑥 + 44.83
Thus, the regression equation is 𝑦̂ = 0.35519𝑥 + 44.83. The b1 of 0.35519
indicates that for each additional number of optional problems done, final grades
are expected to increase by 0.35519 units. The b0 value of 44.83 indicates that if
the problems done by the student is zero, his final grade would be 44.83.

CHECK YOUR PROGRESS

The National Housing Authority wants to investigate the relationship between the
size of houses and the rent paid by the tenants in General Santos City. The NHA
collected the following information on the sizes (in hundreds of square feet) for
eight houses and monthly rents (in thousands of pesos) paid by the tenants.
Construct a scatter diagram for these data. (a) Determine if the relationship (if
any) exists between the sizes of houses and the monthly rents using 0.05 level
of significance (b)n Find the regression line.

Size of House 35 40 50 60 28 34 45 25
Monthly Rent 11 17 18 20 6 10 19 5

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 84
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

Module 4 SUMMARY

I. Introduction to Data Management


The easiest way and widely used for organizing data is to construct a frequency
distribution table. A frequency distribution is a grouping of the data into
categories showing the number of observations in each of the non – overlapping
classes.
A grouped frequency distribution is used when the range of the data set is
large; the date must be grouped into class whether it is categorical data or
interval data.
Categorical Frequency Distribution. This categorical frequency distribution is
used to organize nominal – level or ordinal – level of data.
II. Measures of Central Tendency
Arithmetic mean or simply mean is one of the measures of central tendency
which can be defined as the sum of all observations to be divided by the number
of observations. Weighted Mean is an average computed by giving different
weights to some of the individual values.

Whenever the data is arranged in ascending or descending order, it is called a


data array. The median is the midpoint of the data array.

The mode is the number that appears most frequently in a data set. A set of
numbers may have one mode or unimodal, two modes or bimodal, more than
one mode or multimodal, or no mode at all.

III. Measures of Dispersion


A range is the most common and easily understandable measure of dispersion. It
is the difference between two extreme observations of the data set. Advantages of
range includes (i) easy to compute and (ii) easy to understand.

Variance is a measurement of the spread between numbers in a data set. That


is, it measures how far each number in the set is from the mean and therefore
from every other number in the set.

The standard deviation is a statistic that measures the dispersion of a dataset


relative to its mean and is calculated as the square root of the variance.

IV. Measures of Relative Position


A quartile is a statistical term describing a division of observations into four
defined intervals based upon the values of the data and how they compare to
the entire set of observations. It divides data into three points – a lower quartile,
median, and upper quartile – to form four groups of the data set.

A z – score is a numerical measurement that describes a value's relationship to


the mean of a group of values. z – score is measured in terms of standard
deviations from the mean

A z- score measures the distance between an observation and the mean,


measured in unit of the standard deviation.

A box and whisker plot (sometimes called a boxplot) is a graph that presents
information from a five-number summary. It is especially useful for indicating

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 85
GE 1 Mathematics in Modern World

whether a distribution is skewed and whether there are potential unusual


observations (outliers) in the data set.
V. Probabilities and Normal Distributions
Normal distribution, also known as the Gaussian distribution, is a probability
distribution that is symmetric about the mean, showing that data near the mean
are more frequent in occurrence than data far from the mean. In graph form,
normal distribution will appear as a bell curve.

A normal distribution can be converted into a standard normal distribution by


obtaining the z – value. The z value is the signed distance between a selected
value, designated x, and the mean μ, divided by the standard deviation.

VI. Linear Regression and Correlation


Correlation shows the strength of a relationship between two variables and is
expressed numerically by the correlation coefficient. A variable here is the
characteristic of the population being observed or measured. Correlation refers
to the departure of two random variables from independence.

Pearson Product Moment Correlation coefficient or simply correlation coefficient


is the most widely used on statistics to measure the degree of relationship
between the linear related variables. It is a measure of the association between
two variables. It is founded by Karl Pearson.

A test of significance for the coefficient of correlation may be used to find out if
the Pearson’s r could have occurred in a population in which the two variables
are related or not.

Regression analysis is a simple statistical tool used to model the dependence


of a variable on one (or more) explanatory variables

Simple linear regression is a linear regression model with a single explanatory


variable

The least square model determines the regression equation by minimizing the
sum of squares of the vertical distances between the actual y values and the
predicted values of y.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 86
For Instructional Purposes Only * 1st Semester AY 2023 - 2024

Module 4 Review Test

1. Following are the amount on customers’ meal checks at a diner for one
day’s lunches:
355 427 304 404 279 583 40 590 286 495
255 262 280 353 310 186 530 474 583 600
300 445 187 216 278 290 635 536 404 680
290 316 364 358 275 184 640 570 310 470
Tabulate into frequency distribution using Rule 1.
2. A garment company has declared bankruptcy. As an accountant you wish to
clarify the company’s account payable. The following are the amounts owed
by the garment company. (In hundred thousands)
395 252 250 268 285 305 304 375 400 306
320 341 355 340 278 312 278 265 408 324
325 372 286 359 305 416 312 286 311 378
238 313 290 263 314 325 278 401 314 371
Construct the frequency distribution table using Rule 2.
3. A week’s records of a bus company show the amounts (in pesos) spent on
gasoline by each of its 16 buses.
₱ 10,780 ₱ 12,790 ₱18,100 ₱ 13,480
₱ 17,740 ₱ 12,780 ₱ 19,120 ₱ 19,200
₱ 14,380 ₱ 14,712 ₱ 16,745 ₱ 13,725
₱ 15,145 ₱ 15, 314 ₱ 14,314 ₱ 17,189
Find the mean, median and mode of the expenses incurred for gasoline.
4. The monthly salaries (in thousand pesos) of the top executive of
Telecommunication Companies in the Philippines are: ₱ 380, ₱ 275, ₱ 477,
₱ 315, ₱ 415, ₱ 340, ₱ 415, ₱ 425, ₱ 376, ₱352, ₱ 285, ₱ 296, ₱ 338, ₱412
and 349. Determine the range, variance and standard deviation.

5. Determine the first second and third quartile of the data in problem #4.

6. The average cholesterol content of a certain duck egg is 210 mg, and the
standard deviation is 16mg. Assume the variable is normally distributed. If a
single egg is selected at random, find the probability that the cholesterol
content will be greater than205 mg.
7. A random sample of nine (9) cities gave the following figures for annual per
capita of cigarette consumption and annual death rate from lung cancer.
City 1 2 3 4 5 6 7 8 9
Cigarette Consumption (x) 350 370 250 260 255 300 400 330 240
Death Rate (y) 21 24 17 18 17 19 25 20 16

a. Calculate the sample correlation r. At 0.01 level of significance, test


whether cigarette consumption and lung cancer are unrelated.

b. Determine the regression line.

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 87
GE 1 Mathematics in Modern World

References:
Aufmann, R. et. al.(2018).Mathematics in the Modern World (Philippine
Edition). Rex Bookstore Inc. Manila, Philippines. pp. 101 - 143

Reyes, J.A (2018). Mathematics in the Modern World. Unlimited Books


Library Services & Publishing Inc. Intramuros, Manila. pp. 99 – 133

Sirug, W.S. (2018). Mathematics in Modern World. Mindshapers Co. Inc.


Intramuros, Manila. pp. 74 – 143

Vision: A Premier technological institution in Agriculture and Allied Sciences in the Region
Mission: Advancing Agriculture, allied sciences and technological development through production, research, extension, management,
instruction and entrepreneurship for rural development. 88

You might also like