0% found this document useful (0 votes)
5 views

C04 Data Management

Lab again

Uploaded by

jeraldmacasio51
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

C04 Data Management

Lab again

Uploaded by

jeraldmacasio51
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

4.

TECHNOLOGICAL UNIVERSITY OF THE PHILIPPINES

Data Management

Prepared by: Ron Eric B. Legaspi, RChT


Data Management

• Development, execution and supervision of plans, policies, programs


and practices that control, protect, deliver and enhance the value of
data and information assets.
• Administrative process by which the required data is acquired,
validated, stored, protected, and processed, and by which its
accessibility, reliability, and timeliness is ensured to satisfy the needs of
the data users.

Page 2
For instructional purposes only
Data

• Individual pieces of information recorded and used for the purpose of


analysis.
• The raw information from which statistics are created.

Page 3
For instructional purposes only
Statistics

• The results of data analysis, its interpretation and presentation.


• Involves the collection, organization, summarization, presentation,
and interpretation of data.
• Descriptive statistics
• Inferential statistics

• Oneof the most basic statistical concepts involves finding the


measures of central tendency of a set of numerical data.

Page 4
For instructional purposes only
4.1

Measures of
Central Tendency

For instructional purposes only


Measure of Central Tendency

E.g., Ella is one of the graduating class and plans to start a career. A
survey of five engineers from last year’s class shows that they received
job offers with the ff. salaries:

43,750 39,500 38,000 41,250 44,000

Before her interview, she wishes to determine an average of these five


salaries.

MEAN MEDIAN MODE


Page 6
For instructional purposes only
Measure of Central Tendency
Arithmetic Mean

The arithmetic mean is the commonly used measure of central


tendency.

The mean of 𝒏 numbers is the sum of the numbers divided by 𝒏.

σ 𝒙𝒊
ഥ=
𝑚𝑒𝑎𝑛 = 𝒙
𝒏

Page 7
For instructional purposes only
Measure of Central Tendency

Page 8
For instructional purposes only
Measure of Central Tendency
Arithmetic Mean

The arithmetic mean is the commonly used measure of central


tendency.

The mean of 𝒏 numbers is the sum of the numbers divided by 𝒏.

σ 𝒙𝒊
ഥ=
𝑚𝑒𝑎𝑛 = 𝒙
𝒏

σ 𝒙𝒊
𝑚𝑒𝑎𝑛 = 𝝁 =
𝑵
Page 9
For instructional purposes only
Measure of Central Tendency
Arithmetic Mean

σ 𝒙𝒊
ഥ=
𝑚𝑒𝑎𝑛 = 𝒙
𝒏

Example 1: The mean of 43,750; 39,500; 38,000; 41,250; 44,000.

Page 10
For instructional purposes only
Measure of Central Tendency
Arithmetic Mean

σ 𝒙𝒊
ഥ=
𝑚𝑒𝑎𝑛 = 𝒙
𝒏

Example 2: The mean of 92, 84, 65, 76, 88 and 90.

Page 11
For instructional purposes only
Measure of Central Tendency
Arithmetic Mean

Properties:
1. The sum of the deviations of the observations from the mean is zero.
𝑑𝑖 = 𝑥𝑖 − 𝜇
• e.g. given the ff. observed values 3, 8, 4; the mean is 5.

Page 12
For instructional purposes only
Measure of Central Tendency
Arithmetic Mean

Properties:
2. The mean reflects the magnitude of every observations, since every
observation contributes to the value of the mean.
3. The mean can be easily affected by the presence of an extreme
value. Hence, not a good central tendency measure when an
extreme value occur.
• e.g. The mean of 3, 8, 4, and 50 is 16.25.

Page 13
For instructional purposes only
Measure of Central Tendency
Median

The median is the middle number or the mean of the two middle
numbers.
The median, 𝒙
෥ of a ranked list of n numbers is:
• The middle number if 𝒏 is odd
• The mean of two middle numbers if 𝒏 is even
𝒙 𝒏+𝟏
𝟐 if 𝒏 is odd
෥= 𝒙
𝒙 𝒏 +𝒙 𝒏
𝟐 𝟐 +𝟏
𝟐 if 𝒏 is even

Page 14
For instructional purposes only
Measure of Central Tendency
Median

Example 3: Find the median of the ff.


• 4, 8, 1, 14, 9, 21, 12
• 46, 23, 92, 89, 77, 108

Page 15
For instructional purposes only
Measure of Central Tendency
Mode

The mode, 𝒙 ෝ of a list of numbers is the number that occurs most


frequently.

Example 4: Find the mode


• 18, 15, 21, 16, 15, 14, 15, 21
• 2, 5, 8, 9, 11, 4, 7, 23
• 1, 4, 3, 4, 5, 6, 5, 4, 5, 7

Page 16
For instructional purposes only
Measure of Central Tendency

The mean, median and mode are all averages, however they are
generally not equal.
The mean is the most sensitive of the averages, which can change by
having an extreme value.

E.g.,
Salaries: 150,000 60,000 36,000 20,000 20,000

Page 17
For instructional purposes only
Measure of Central Tendency
Weighted Mean

The weighted mean, 𝝁𝒘 or 𝒙 ഥ𝒘 is often used when some data are more
important than the others.

The weighted mean of the 𝑛 numbers 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 with the respective


assigned weights 𝑤1 , 𝑤2 , 𝑤3 , … , 𝑤𝑛 is

σ(𝒙𝒊 ∙ 𝒘𝒊 )
𝝁𝒘 =
σ 𝒘𝒊

Page 18
For instructional purposes only
Measure of Central Tendency
Weighted Mean

Example 5: Compute for the weighted mean to determine if the student’s term grade
will be eligible for Dean’s list.

Subject Grade Units


Subject A 2.25 3
Subject B 1.50 3
Subject C 1.75 3
Subject D 2.00 3
Subject E 1.50 2
Subject F 1.25 1

Page 19
For instructional purposes only
Frequency Distribution

Data that have not been organized or manipulated in any manner are
called raw data.

A frequency distribution is a table that lists observed events and the


frequency of occurrence of each event.
It is often used to organized raw data.

Page 20
For instructional purposes only
Frequency Distribution

E.g., Consider the ff. table which lists the number of laptop computers
owned by families in each of 40 homes in a subdivision.

2 0 3 1 2 1 0 4
2 1 1 7 2 0 1 1
0 2 2 1 3 2 2 1
1 4 2 5 2 3 1 2
2 1 2 1 5 0 2 5

Page 21
For instructional purposes only
Frequency Distribution

Number of Number of households, 𝒇,


laptop computers, 𝒙 with 𝒙 laptop computers
0 5
1 12
2 14
3 3
4 2
5 3
6 0
7 1
40 total

Page 22
For instructional purposes only
Frequency Distribution

The formula for a weighted Number of Number of households, 𝒇,


laptop computers, 𝒙 with 𝒙 laptop computers
mean can be used to find
0 5
the mean of the grouped
1 12
data in a frequency
2 14
distribution.
3 3
4 2
σ(𝒙𝒊 ∙ 𝒇𝒊 ) 5 3
ഥ𝑮 =
𝒙
σ 𝒇𝒊 6 0
7 1
40 total

Page 23
For instructional purposes only
Frequency Distribution

Example 6: Number of Number of households, 𝒇,


laptop computers, 𝒙 with 𝒙 laptop computers
Find the mean of the data in the
0 5
right table.
1 12
2 14
3 3
4 2
5 3
6 0
7 1
40 total

Page 24
For instructional purposes only
4.2

Measures of
Dispersion

For instructional purposes only


Dispersion

Machine 1 Machine 2
Machines 1 and 2 are soft-drink dispensing 9.52 8.01
machines that should dispense 8 oz into a 6.41 7.99
cup. 10.07 7.95
This example shows that average values do 5.85 8.03
not reflect the spread or dispersion of data. 8.15 8.02
ഥ = 𝟖. 𝟎
𝒙 ഥ = 𝟖. 𝟎
𝒙

To measure the spread or dispersion of data, we must introduce


statistical values known as the range and the standard deviation.

Page 26
For instructional purposes only
Dispersion
Range

Machine 1 Machine 2
The range of a set of data values is the 9.52 8.01
difference between the greatest data value 6.41 7.99
and the least data value. 10.07 7.95
5.85 8.03

𝑹 = 𝑯𝒗𝒂𝒍 − 𝑳𝒗𝒂𝒍 8.15 8.02


ഥ = 𝟖. 𝟎
𝒙 ഥ = 𝟖. 𝟎
𝒙
Example 7:
Find the range of the numbers of ounces dispensed by both machines.

Page 27
For instructional purposes only
Dispersion
Range

Properties:
1. It is a quick but rough measure of dispersion.
2. The larger the value of the range, the more dispersed are the
observations.
3. It only considers the lowest and the highest value in the data set.
4. It can be easily affected by the presence of an extreme value.

Page 28
For instructional purposes only
Dispersion
Mean Absolute Deviation - Variance

The simplest method of considering the variations or the spread ability of


all items into a series from the point of central tendency.
The variance considers the position of each observation relative to the
mean.

The population variance:

σ 𝒙𝒊 − 𝝁 𝟐
𝟐
𝝈 =
𝑵

Page 29
For instructional purposes only
Dispersion
Mean Absolute Deviation - Variance

Example 8: Student Score

The ff. data represent the quiz score of 1 4


all 7 students of a math special class: 2 7
3 8
4 2
5 2
6 9
7 3

Computational formula:

σ 𝒙𝒊 𝟐
− 𝝁𝟐
𝑵
Page 30
For instructional purposes only
Dispersion
Mean Absolute Deviation - Variance

The simplest method of considering the variations or the spread ability of


all items into a series from the point of central tendency.
The variance considers the position of each observation relative to the
mean.

The sample variance:

𝟐
σ 𝒙𝒊 − 𝒙
ഥ 𝟐 𝒏 σ 𝒙𝒊 𝟐 − σ 𝒙𝒊 𝟐
𝒔 = =
𝒏−𝟏 𝒏 𝒏−𝟏

Page 31
For instructional purposes only
Dispersion
Mean Absolute Deviation - Variance

Example 9: Sample Score

Find the variance of the ff. random data. 1 4


2 7
3 8
4 2
5 2
6 8
7 9
8 2
9 5
10 7

Page 32
For instructional purposes only
Dispersion
Mean Absolute Deviation - Variance

Properties:
1. It is always non-negative.
2. The larger the value of the variance, the more dispersed are the
observations.
3. Each observation contributes to the magnitude of the variance.
4. The unit of measure of the variance is the square of the unit of
measure of the original data set.

Page 33
For instructional purposes only
Dispersion
Standard Deviation

Standard deviation is based on the deviations of all the scores in a


series. It is always computed from the mean.

σ 𝒙𝒊 − 𝝁 𝟐 σ 𝒙𝒊 𝟐
𝜎= 𝜎2 = = − 𝝁𝟐
𝑵 𝑵

σ 𝒙𝒊 − 𝒙
ഥ 𝟐 𝒏 σ 𝒙𝒊 𝟐 − σ 𝒙𝒊 𝟐
𝑠= 𝑠2 = =
𝒏−𝟏 𝒏 𝒏−𝟏

Page 34
For instructional purposes only
Dispersion
Standard Deviation

Example 10:
A consumer group has tested a sample of 8 batteries from each of 2 companies. Both
has a mean of 7.0 hours. According to these tests, which company produces batteries
for which the values representing hours of constant use have the smallest standard
deviation?

Company Hours of constant use per battery


EverSoBright 6.2 6.4 7.1 5.9 8.3 5.3 7.5 9.3
Dependable 6.8 6.2 7.2 5.9 7.0 7.4 7.3 8.2

Page 35
For instructional purposes only
Dispersion
Standard Deviation
Company Hours of constant use per battery
EverSoBright 6.2 6.4 7.1 5.9 8.3 5.3 7.5 9.3

Example 10: Dependable 6.8 6.2 7.2 5.9 7.0 7.4 7.3 8.2

Page 36
For instructional purposes only
Dispersion
Standard Deviation

Properties:
1. It is always non-negative.
2. The larger the value of the standard deviation, the more dispersed
are the observations.
3. Each observation contributes to the magnitude of the SD.
4. The unit of measure of SD is the same as the unit of measure of the
original data set.

Page 37
For instructional purposes only
4.3

Measures of
Relative Position

For instructional purposes only


Relative Position

Measure of position identifies the rank occupied by a data from an


array of data collected.

I AM HERE

Page 39
For instructional purposes only
Relative Position
z-Score

Movie download time, in minutes

𝑥ҧ = 12, 𝑠 = 4

6 12 20

6 min. below 8 min. above


the mean the mean

1.5 std. dev. 2 std. dev.

The number of standard deviations between a data value and the


mean is known as the data value’s 𝒛-score or standard score.

Page 40
For instructional purposes only
Relative Position
z-Score

The z-score for a given data value 𝑥 is the number of standard


deviations that 𝑥 is above or below the mean of the data. The following
formulas show how to calculate the 𝑧-score for a data value 𝑥 in a
population and a sample.

𝒙−𝝁
Population: 𝒛𝒙 =
𝝈

𝒙−𝒙ഥ
Sample: 𝒛𝒙 =
𝒔

Page 41
For instructional purposes only
Relative Position
z-Score

Example 11:
Paul has taken two tests in his class. He scored 72 on his first test, for which the mean of
all scores was 65 and the standard deviation was 8. Her received a 60 on a second
test, for which the mean of all scores was 45 and the standard deviation was 12.
In comparison to the other students, did Paul do better on the first test or the second
test?

Page 42
For instructional purposes only
Relative Position
z-Score

Example 12:
A consumer group tested a sample of 100 light bulbs. It’s found that the mean life
expectancy of the bulbs was 842 h, with a standard deviation of 90. One particular
light bulb from the Durabright Company had a 𝑧-score of 1.2. What was the life span of
this bulb?

Page 43
For instructional purposes only
Relative Position
Percentile

Most standardized examinations provide scores in terms of percentiles,


which are defined as follows:
• A value 𝑥 is called 𝒑𝒕𝒉 percentile of a data set provided 𝒑% of the data
values are less than 𝑥.
• Percentile are values that divide a set of observations into 100 equal
parts.
• The position occupied by each of the score from an array of data
collected is based on the hundredth when the scores are arranged from
highest to lowest or vise versa.
𝑷
𝑶𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏 𝑵𝒐. = 𝒏
𝟏𝟎𝟎
Page 44
For instructional purposes only
Relative Position
Percentile

Example 13: 44 37 39 29 33

The scores of 17 students in a 50-point quiz are as follows: 35 37 31 43 29


Find the 𝑃10 , 𝑃25 , and 𝑃50 .
43 42 42 37 37

29 41

Page 45
For instructional purposes only
Relative Position
Percentile

Observation No. Score


1 29
Example 13:
2 29
The scores of 17 students in a 50-point quiz are as follows: 3 29
4 31
Find the 𝑃10 , 𝑃25 , and 𝑃50 . 5 33
6 35
7 37
8 37
9 37
10 37
11 39
12 41
13 42
14 42
15 43
16 43
17 44

Page 46
For instructional purposes only
Relative Position
Percentile

Given a set of data and a data value 𝑥,


number of data values less than 𝑥
percentile of score 𝑥 = × 100
total number of data values

Example 14:
On a reading examination given to 900 students, Elaine’s score of 602 was higher than
the scores of 576 of the students who took the examination. What is the percentile for
Elaine’s score?

Page 47
For instructional purposes only
Relative Position
Quartile

The three numbers 𝑄1 , 𝑄2 , and 𝑄3 that partition a ranked data set into
four (appx.) equal groups are called the quartiles of the data.
For instance, for the data set below, the values 𝑄1 = 11, 𝑄2 = 29, and 𝑄3 = 104 are the
quartiles of the data.

2, 5, 5, 8, 11, 12, 19, 22, 23, 29, 31, 45, 83, 91, 104, 159, 181, 312, 354

𝑄1 𝑄2 𝑄3

𝑄1 is the first quartile


𝑄2 is the second quartile, and is the median
𝑄3 is the third quartile
Page 48
For instructional purposes only
Relative Position
Quartile

The median procedure for finding quartiles


• Rank the data
• Find the median of the data. This is the second quartile, 𝑄2
• The first quartile, 𝑄1 , is the median of the data values less than 𝑄2
• The third quartile, 𝑄3 , is the median of the data values greater than 𝑄2

Page 49
For instructional purposes only
Relative Position
Quartile

Example 15:
Use medians to find the quartiles of a data set.
The table below lists the calories per 100mL of 25 popular sodas. Find the quartiles for
the data.
Calories, per 100ml of selected sodas
43 37 42 40 53 62 36 32 50 49
26 53 73 48 45 39 45 48 40 56
41 36 58 42 39

Page 50
For instructional purposes only
Relative Position
Quartile

Example 15:
Use medians to find the quartiles of a data set.
The table below lists the calories per 100mL of 25 popular sodas. Find the quartiles for
the data.
Calories, per 100ml of selected sodas
26 32 36 36 37 39 39 40 40 41
42 42 43 45 45 48 48 49 50 53
53 56 58 62 73

Page 51
For instructional purposes only
Q04: Data Management
The table shows the set of your class scores in 46 48 45 47 48
GEC4-UT01.
41 47 45 44 43
I. Using these raw data, find the ff:
43 43 45 43 37
• The mean, median and mode/s (3pts)
• Range, Variance and Standard Deviation (3pts) 48 47 48 48 42
• Quartiles (3pts)
48 45 36 48 39
II. Using your score, compute for the ff:
• Deviation from mean (1pt) 38 41 46 32 47

• z-Score (1pt)
43 38 48 48 43
• Percentile (1pt)

BSCESEP-3B
4.4

Normal
Distribution

For instructional purposes only


Frequency Distribution

Large sets of data are often displayed using a grouped frequency


distribution.
Grouped frequency distribution table categorizes the numerical data
into intervals or classes.
• Class – mutually exclusive categories defining the lower limit and the upper limit
with equal intervals
• Class frequency – the number of observations in each class
• Cumulative frequency – the sum of frequencies in a particular class of interest
• Relative frequency – the percentage of observations in a particular class of
interest.

Page 54
For instructional purposes only
Frequency Distribution

Download times, s Subscribers


An ISP has installed new computers.
0-5 6
To estimate the new download times 5-10 17
its subscribers will experience, the ISP 10-15 43
surveyed 1000 of its subscribers to 15-20 92
determine the time required for each 20-25 151
25-30 192
subscriber to download a particular
30-35 190
file from an internet site. 35-40 149
40-45 90
45-50 45
50-55 15
55-60 10

Page 55
For instructional purposes only
Frequency Distribution

Download times, s Subscribers


0-5 6
5-10 17
Class 10-15 43
15-20 92
10-15 20-25 151
25-30 192
Lower class Upper class
boundary boundary 30-35 190
35-40 149
40-45 90
45-50 45
50-55 15
55-60 10

12 Classes

Page 56
For instructional purposes only
Frequency Distribution

Example 16: Download times, s Subscribers, %


0-5 0.6
Use the relative frequency distribution to 5-10 1.7
determine the: 10-15 4.3
• Percent of subscribers who required at least 25 s 15-20 9.2
to download the file 20-25 15.1
• Probability that a subscriber chosen at random 25-30 19.2
will require at least 5 s but less than 20 s to 30-35 19.0
download the file 35-40 14.9
40-45 9.0
45-50 4.5
50-55 1.5
55-60 1.0

Page 57
For instructional purposes only
Frequency Distribution

Constructing a frequency distribution with equal class size


1. Determine the range, 𝑹 of the numerical data.
2. Determine the number of classes to which the data are to be
grouped using the Sturges’ approximation:

𝑲 = 𝟏 + 𝟑. 𝟑𝟐𝟐 𝐥𝐨𝐠 𝑵
3. Determine the class size 𝑪. Where 𝑪 = 𝑹/𝑲
4. Determine the lower limit of the first class.
5. Construct the class intervals and determine the class frequencies.

Page 58
For instructional purposes only
Frequency Distribution

Example 17:
Construct a frequency distribution table for the ff.
raw scores of 50 students in 200 item test:

144 112 156 122 168 172 141 159 127 154

156 145 134 137 123 149 144 160 136 139

142 138 159 151 147 150 126 152 147 136

135 132 146 133 150 122 139 149 152 129

131 155 116 140 145 135 160 125 172 163

Page 59
For instructional purposes only
Frequency Distribution

Example 17:
Construct a frequency distribution table for the ff.
raw scores of 50 students in 200 item test:

112 116 122 122 123 125 126 127 129 131

132 133 134 135 135 136 136 137 138 139

139 140 141 142 144 144 145 145 146 147

147 149 149 150 150 151 152 152 154 155

156 156 159 159 160 160 163 168 172 172

Page 60
For instructional purposes only
Frequency Distribution
112 116 122 122 123 125 126 127 129 131

132 133 134 135 135 136 136 137 138 139
Example 17:
139 140 141 142 144 144 145 145 146 147
Construct a frequency distribution table for the ff.
147 149 149 150 150 151 152 152 154 155
raw scores of 50 students in 200 item test:
156 156 159 159 160 160 163 168 172 172

Class Interval Class Boundary Frequency


112 – 120 111.5 – 120.5 2
121 – 129 120.5 – 129.5 7
130 – 138 129.5 – 138.5 10
139 – 147 138.5 – 147.5 12
148 – 156 147.5 – 156.5 11
157 – 165 156.5 – 165.5 5
166 - 174 165.5 – 174.5 3
Page 61
For instructional purposes only
Frequency Distribution
Histograms

Collected data can also be represented in a graphical form.

Histogram is a graphical representation that organizes a group of data


points into class boundary ranges.

Page 62
For instructional purposes only
Frequency Distribution
Histograms

200
Download times, s Subscribers
1 0
0-5 6
5-10 17 160

10-15 43 1 0

Number of subscribers
15-20 92 120
20-25 151
100
25-30 192
0
30-35 190
35-40 149 60

40-45 90 0
45-50 45 20
50-55 15
0
55-60 10 0, 5 5, 10 10, 15 15, 20 20, 25 25, 0 0, 5 5, 0 0, 5 5, 50 50, 55 55, 60

Download time (seconds)

Page 63
For instructional purposes only
Frequency Distribution
Histograms

Scores Students 1
111.5 – 120.5 2
12
120.5 – 129.5 7
129.5 – 138.5 10

Number of students
10
138.5 – 147.5 12
147.5 – 156.5 11
156.5 – 165.5 5 6
165.5 – 174.5 3

0
111.5, 120.5 120.5, 129.5 129.5, 1 .5 1 .5, 1 .5 1 .5, 156.5 156.5, 165.5 165.5, 1 .5

Scores (over 200)

Page 64
For instructional purposes only
ACT04: Frequency Distribution
Using your Unit Test 1 scores: 46 48 45 47 48

41 47 45 44 43
I. Construct a frequency distribution table
• Determine the range, R
• Determine the number of classes K
43 43 45 43 37
• Determine the class size C
• Construct the table with the ff. parameters: 48 47 48 48 42
• Class interval
• Class Boundary 48 45 36 48 39
• Frequency
• Relative Frequency
38 41 46 32 47
II. Make a Histogram out from the constructed FDT.
43 38 48 48 43
III. Answer the ff. guide questions:

BSCESEP-3B
ACT04: Frequency Distribution
III. Guide questions:

1. What percentage of the class had scores below 44?


2. How many students are there in the last two classes?
3. What is the lower class boundary of the second class?
4. What is the upper class boundary of the third class?
5. From which classes do your individual scores belong to?

BSCESEP-3A
Normal Distribution

A normal distribution forms a bell-shaped curve that is symmetric


about a vertical line through the mean of the data.

Properties:
• The mean, median and mode are equal
• The y-value of each point of the curve is
the percent of the data at the
corresponding x-value
• Areas under the curve are symmetric
about the mean are equal
• The total area under the curve is 1 0 1 2 3 4 5 6 7 8

Page 67
For instructional purposes only
Normal Distribution

E.g., In the normal distribution shown below, the area of the shaded
region is 0.159 units.

Page 68
For instructional purposes only
Normal Distribution
Empirical Rule

In a normal distribution, approximately,


• 68% of the data lie within 1 standard deviation of the mean
• 95% of the data lie within 2 standard deviations of the mean
• 97.7% of the data lie within 3 standard deviations of the mean

Page 69
For instructional purposes only
Normal Distribution
Empirical Rule
Example 18:
A survey of 1000 gas stations found that the price charged for a liter of regular gas could
be closely approximated by a normal distribution with a mean of P38.75 and a standard
deviation of P2.25. How many of the gas stations charge
• Between P34.25 to P43.25 for a liter of regular gas?
• Less than P41.00 for a liter of regular gas?
• More than P43.25 for a liter of regular gas?

Page 70
For instructional purposes only
Standard Normal Distribution

It is often helpful to convert data values 𝑥 to 𝑧-scores, using the


formulas
𝑥−𝜇 𝑥 − 𝑥ҧ
Population: 𝑧𝑥 = Sample: 𝑧𝑥 =
𝜎 𝑠

If the original distribution of 𝑥 values is a normal distribution, then the corresponding


distribution of 𝑧-scores will also be a normal distribution. This normal distribution of 𝑧-
scores is called the standard normal distribution.

Page 71
For instructional purposes only
Standard Normal Distribution

If the original distribution of 𝑥 values is a normal distribution, then the corresponding


distribution of 𝑧-scores will also be a normal distribution. This normal distribution of 𝑧-
scores is called the standard normal distribution.

Page 72
For instructional purposes only
Standard Normal Distribution

The standard normal distribution is the normal distribution that has a


mean of 0 and a standard deviation of 1.

− 𝑥−𝜇 2
𝑒 2𝜎2
𝑦=
𝜎 2𝜋
− 𝑥−𝜇 2
𝑥2 𝑥2
𝑒 2𝜎2 ∙ 𝑑𝑥
𝐴 = න 𝑦𝑑𝑥 = න
𝑥1 𝑥1 𝜎 2𝜋

Or we can simply refer to a 𝑧-Score–Area table ☺


Page 73
For instructional purposes only
Standard Normal Distribution
z-Score–Area Table

Page 74
For instructional purposes only
Standard Normal Distribution

Example 19:
Find the area of the standard normal distribution between 𝑧 = −1.44 and 𝑧 = 0.

Example 20: Find the area of the tail region


Find the area of the standard normal distribution to the right of 𝑧 = 0.82.

Page 75
For instructional purposes only
Standard Normal Distribution
Example 21:
A soda machine dispenses soda into 12-ounce cups. Tests show that the actual amount of soda
dispensed is normally distributed, with a mean of 11.5 oz and a standard deviation of 0.2 oz.
• What percent of cups will receive less than 11.25 oz of soda?
• What percent of cups will receive between 11.20 oz and 11.55 oz of soda?
• If cup is filled at random, what is the probability that the machine will overflow the cup?

Page 76
For instructional purposes only
Standard Normal Distribution

A cut-off score is a score that separates data into two groups such
that the data in one group satisfy a certain requirement and the data
in the other group do not satisfy the requirement.

Example 22:
The OnTheGo company manufactures laptop computers. A study indicates that the
life span of its computers are normally distributed, with a mean of 4.0 years and a
standard deviation of 1.2 years. How long a warranty period should the company
offer if the company wishes less than 4% of the computers to fail during the warranty
period?

Page 77
For instructional purposes only
Standard Normal Distribution

Example 22:
The OnTheGo company manufactures laptop computers. A study indicates that the
life span of its computers are normally distributed, with a mean of 4.0 years and a
standard deviation of 1.2 years. How long a warranty period should the company
offer if the company wishes less than 4% of the computers to fail during the warranty
period?

Page 78
For instructional purposes only
4.5

Linear Regression
and Correlation

For instructional purposes only


Linear Regression

When performing research studies, scientists often wish to know whether


two variables are related. If the variables are determined to be related,
a scientist may then wish to find an equation that can be used to model
the relationship.
Data involving two variables are called bivariate data.
e.g.,
Time between eruptions
272 227 237 238 203 270 218 226 250 245
(in seconds), x
Duration of eruptions
89 79 83 82 81 85 78 81 85 79
(in seconds), y

Page 80
For instructional purposes only
Linear Regression
Time between eruptions
272 227 237 238 203 270 218 226 250 245
(in seconds), x
Duration of eruptions
89 79 83 82 81 85 78 81 85 79
(in seconds), y

Scatter Diagram
90

88

86
Duration, (s)

84

82

80

78

76
0 50 100 150 200 250 300
Interval, (s)
Page 81
For instructional purposes only
Linear Regression
Time between eruptions
272 227 237 238 203 270 218 226 250 245
(in seconds), x
Duration of eruptions
89 79 83 82 81 85 78 81 85 79
(in seconds), y

Scatter Diagram
90

88
The least-squares regression line for a bivariate data is the line that
minimizes the sum
86
of the squares of the vertical deviations from each
data point to the
84 line.
Duration, (s)

82

80

78

76
0 50 100 150 200 250 300
Interval, (s)
Page 82
For instructional purposes only
Linear Regression
Time between eruptions
272 227 237 238 203 270 218 226 250 245
(in seconds), x
Duration of eruptions
89 79 83 82 81 85 78 81 85 79
(in seconds), y

Scatter Diagram
90

88

86
Duration, (s)

84

82

80

78

76
0 50 100 150 200 250 300
Interval, (s)
Page 83
For instructional purposes only
Linear Regression

The least-squares regression line for a bivariate data is the line that
minimizes the sum of the squares of the vertical deviations from each
data point to the line.
The equation of the least-squares line for the 𝑛 ordered pairs
𝑥1 , 𝑦1 , 𝑥2 , 𝑦2 , 𝑥3 , 𝑦3 , . . ., 𝑥𝑛 , 𝑦𝑛

is 𝑦ො = 𝑎𝑥 + 𝑏, where

𝑛 σ 𝑥𝑦−(σ 𝑥)(σ 𝑦)
𝑎= and 𝑏 = 𝑦ത − 𝑎𝑥ҧ
𝑛 σ 𝑥 2 −(σ 𝑥)2

Page 84
For instructional purposes only
Linear Regression

Example 23:
Apply the formula to the ff. data

Time between eruptions


272 227 237 238 203 270 218 226 250 245
(in seconds), x
Duration of eruptions
89 79 83 82 81 85 78 81 85 79
(in seconds), y

Page 85
For instructional purposes only
Linear Regression
Time between eruptions
272 227 237 238 203 270 218 226 250 245
(in seconds), x
Duration of eruptions
89 79 83 82 81 85 78 81 85 79
(in seconds), y

Scatter Diagram
90

88

86
Duration, (s)

84

82

80

78

76
0 50 100 150 200 250 300
Interval, (s)
Page 86
For instructional purposes only
Linear Regression
Time between eruptions
272 227 237 238 203 270 218 226 250 245
(in seconds), x
Duration of eruptions
89 79 83 82 81 85 78 81 85 79
(in seconds), y

Scatter Diagram
90

88

86 y = 0.119x + 53.817
Duration, (s)

84

82

80

78

76
0 50 100 150 200 250 300
Interval, (s)
Page 87
For instructional purposes only
Linear Regression

Example 24:
Using the equation in example 23, if the time between eruptions is 200 seconds, what is
the estimated duration of the eruption?

Page 88
For instructional purposes only
Linear Correlation

To determine the strength of a linear relationship between two


variables, statisticians use a statistic called the linear correlation
coefficient, which is denoted by the variable 𝑟 and is defined as
follows:
For the 𝑛 ordered pairs 𝑥1 , 𝑦1 , 𝑥2 , 𝑦2 , 𝑥3 , 𝑦3 , . . ., 𝑥𝑛 , 𝑦𝑛 , the linear
correlation coefficient 𝑟 is given by

𝑛 σ 𝑥𝑦 − (σ 𝑥) (σ 𝑦)
𝑟=
2 2
𝑛(σ 𝑥 2 ) − (σ 𝑥) ∙ 𝑛(σ 𝑦 2 ) − (σ 𝑦)

* the close 𝑟 = 1, the stronger the relationship of the variables


Page 89
For instructional purposes only
Linear Correlation

Page 90
For instructional purposes only
Linear Correlation

Properties of linear correlation coefficient


• The linear correlation coefficient is always a real number between −1 to 1, inclusive.
In the case in which
• All the ordered pairs lie on a line with positive slope, 𝑟 = 1
• All the ordered pairs lie on a line with negative slope, 𝑟 = −1
• For any set of ordered pairs, the linear correlation coefficient and the slope of the
least-squares line both have the same sign
• Interchanging the variables in the ordered pairs does not change the value of 𝑟.
Thus, the value of 𝑟 for the ordered pairs 𝑥1 , 𝑦1 , 𝑥2 , 𝑦2 , . . ., 𝑥𝑛 , 𝑦𝑛 is the same as
the value of 𝑟 for the ordered pairs 𝑦1 , 𝑥1 , 𝑦2 , 𝑥2 , . . ., 𝑦𝑛 , 𝑥𝑛 .
• The value of 𝑟 does not depend on the units used. You can change the units of a
variable from, for example, feet to inches, and the value of 𝑟 will remain the same.
Page 91
For instructional purposes only
Linear Correlation

Example 25:
Find the linear correlation coefficient of the data in example 23.

Page 92
For instructional purposes only
Mathematics in the Modern World
References

• Aufmann, R. (2018). Mathematics in the modern world. Rex Publishing. Manila,


Philippines
• Earnhart, R. T., & Adina, E. M. (2018). Mathematics in the modern world. C&E
Publishing Inc. Quezon City, Philippines
• Oguan, F., Reyes, J., Cordial R., Halili, L., Manalo, L. G., Dublin, A., Balignasay, R.,
Romero, J., & Santos, B. (2018). Mathematics in the modern world. Panday-Lahi
Publishing House, Inc. Muntinlupa City, Philippines

Page 93
For instructional purposes only

You might also like