C04 Data Management
C04 Data Management
Data Management
Page 2
For instructional purposes only
Data
Page 3
For instructional purposes only
Statistics
Page 4
For instructional purposes only
4.1
Measures of
Central Tendency
E.g., Ella is one of the graduating class and plans to start a career. A
survey of five engineers from last year’s class shows that they received
job offers with the ff. salaries:
σ 𝒙𝒊
ഥ=
𝑚𝑒𝑎𝑛 = 𝒙
𝒏
Page 7
For instructional purposes only
Measure of Central Tendency
Page 8
For instructional purposes only
Measure of Central Tendency
Arithmetic Mean
σ 𝒙𝒊
ഥ=
𝑚𝑒𝑎𝑛 = 𝒙
𝒏
σ 𝒙𝒊
𝑚𝑒𝑎𝑛 = 𝝁 =
𝑵
Page 9
For instructional purposes only
Measure of Central Tendency
Arithmetic Mean
σ 𝒙𝒊
ഥ=
𝑚𝑒𝑎𝑛 = 𝒙
𝒏
Page 10
For instructional purposes only
Measure of Central Tendency
Arithmetic Mean
σ 𝒙𝒊
ഥ=
𝑚𝑒𝑎𝑛 = 𝒙
𝒏
Page 11
For instructional purposes only
Measure of Central Tendency
Arithmetic Mean
Properties:
1. The sum of the deviations of the observations from the mean is zero.
𝑑𝑖 = 𝑥𝑖 − 𝜇
• e.g. given the ff. observed values 3, 8, 4; the mean is 5.
Page 12
For instructional purposes only
Measure of Central Tendency
Arithmetic Mean
Properties:
2. The mean reflects the magnitude of every observations, since every
observation contributes to the value of the mean.
3. The mean can be easily affected by the presence of an extreme
value. Hence, not a good central tendency measure when an
extreme value occur.
• e.g. The mean of 3, 8, 4, and 50 is 16.25.
Page 13
For instructional purposes only
Measure of Central Tendency
Median
The median is the middle number or the mean of the two middle
numbers.
The median, 𝒙
of a ranked list of n numbers is:
• The middle number if 𝒏 is odd
• The mean of two middle numbers if 𝒏 is even
𝒙 𝒏+𝟏
𝟐 if 𝒏 is odd
= 𝒙
𝒙 𝒏 +𝒙 𝒏
𝟐 𝟐 +𝟏
𝟐 if 𝒏 is even
Page 14
For instructional purposes only
Measure of Central Tendency
Median
Page 15
For instructional purposes only
Measure of Central Tendency
Mode
Page 16
For instructional purposes only
Measure of Central Tendency
The mean, median and mode are all averages, however they are
generally not equal.
The mean is the most sensitive of the averages, which can change by
having an extreme value.
E.g.,
Salaries: 150,000 60,000 36,000 20,000 20,000
Page 17
For instructional purposes only
Measure of Central Tendency
Weighted Mean
The weighted mean, 𝝁𝒘 or 𝒙 ഥ𝒘 is often used when some data are more
important than the others.
σ(𝒙𝒊 ∙ 𝒘𝒊 )
𝝁𝒘 =
σ 𝒘𝒊
Page 18
For instructional purposes only
Measure of Central Tendency
Weighted Mean
Example 5: Compute for the weighted mean to determine if the student’s term grade
will be eligible for Dean’s list.
Page 19
For instructional purposes only
Frequency Distribution
Data that have not been organized or manipulated in any manner are
called raw data.
Page 20
For instructional purposes only
Frequency Distribution
E.g., Consider the ff. table which lists the number of laptop computers
owned by families in each of 40 homes in a subdivision.
2 0 3 1 2 1 0 4
2 1 1 7 2 0 1 1
0 2 2 1 3 2 2 1
1 4 2 5 2 3 1 2
2 1 2 1 5 0 2 5
Page 21
For instructional purposes only
Frequency Distribution
Page 22
For instructional purposes only
Frequency Distribution
Page 23
For instructional purposes only
Frequency Distribution
Page 24
For instructional purposes only
4.2
Measures of
Dispersion
Machine 1 Machine 2
Machines 1 and 2 are soft-drink dispensing 9.52 8.01
machines that should dispense 8 oz into a 6.41 7.99
cup. 10.07 7.95
This example shows that average values do 5.85 8.03
not reflect the spread or dispersion of data. 8.15 8.02
ഥ = 𝟖. 𝟎
𝒙 ഥ = 𝟖. 𝟎
𝒙
Page 26
For instructional purposes only
Dispersion
Range
Machine 1 Machine 2
The range of a set of data values is the 9.52 8.01
difference between the greatest data value 6.41 7.99
and the least data value. 10.07 7.95
5.85 8.03
Page 27
For instructional purposes only
Dispersion
Range
Properties:
1. It is a quick but rough measure of dispersion.
2. The larger the value of the range, the more dispersed are the
observations.
3. It only considers the lowest and the highest value in the data set.
4. It can be easily affected by the presence of an extreme value.
Page 28
For instructional purposes only
Dispersion
Mean Absolute Deviation - Variance
σ 𝒙𝒊 − 𝝁 𝟐
𝟐
𝝈 =
𝑵
Page 29
For instructional purposes only
Dispersion
Mean Absolute Deviation - Variance
Computational formula:
σ 𝒙𝒊 𝟐
− 𝝁𝟐
𝑵
Page 30
For instructional purposes only
Dispersion
Mean Absolute Deviation - Variance
𝟐
σ 𝒙𝒊 − 𝒙
ഥ 𝟐 𝒏 σ 𝒙𝒊 𝟐 − σ 𝒙𝒊 𝟐
𝒔 = =
𝒏−𝟏 𝒏 𝒏−𝟏
Page 31
For instructional purposes only
Dispersion
Mean Absolute Deviation - Variance
Page 32
For instructional purposes only
Dispersion
Mean Absolute Deviation - Variance
Properties:
1. It is always non-negative.
2. The larger the value of the variance, the more dispersed are the
observations.
3. Each observation contributes to the magnitude of the variance.
4. The unit of measure of the variance is the square of the unit of
measure of the original data set.
Page 33
For instructional purposes only
Dispersion
Standard Deviation
σ 𝒙𝒊 − 𝝁 𝟐 σ 𝒙𝒊 𝟐
𝜎= 𝜎2 = = − 𝝁𝟐
𝑵 𝑵
σ 𝒙𝒊 − 𝒙
ഥ 𝟐 𝒏 σ 𝒙𝒊 𝟐 − σ 𝒙𝒊 𝟐
𝑠= 𝑠2 = =
𝒏−𝟏 𝒏 𝒏−𝟏
Page 34
For instructional purposes only
Dispersion
Standard Deviation
Example 10:
A consumer group has tested a sample of 8 batteries from each of 2 companies. Both
has a mean of 7.0 hours. According to these tests, which company produces batteries
for which the values representing hours of constant use have the smallest standard
deviation?
Page 35
For instructional purposes only
Dispersion
Standard Deviation
Company Hours of constant use per battery
EverSoBright 6.2 6.4 7.1 5.9 8.3 5.3 7.5 9.3
Example 10: Dependable 6.8 6.2 7.2 5.9 7.0 7.4 7.3 8.2
Page 36
For instructional purposes only
Dispersion
Standard Deviation
Properties:
1. It is always non-negative.
2. The larger the value of the standard deviation, the more dispersed
are the observations.
3. Each observation contributes to the magnitude of the SD.
4. The unit of measure of SD is the same as the unit of measure of the
original data set.
Page 37
For instructional purposes only
4.3
Measures of
Relative Position
I AM HERE
Page 39
For instructional purposes only
Relative Position
z-Score
𝑥ҧ = 12, 𝑠 = 4
6 12 20
Page 40
For instructional purposes only
Relative Position
z-Score
𝒙−𝝁
Population: 𝒛𝒙 =
𝝈
𝒙−𝒙ഥ
Sample: 𝒛𝒙 =
𝒔
Page 41
For instructional purposes only
Relative Position
z-Score
Example 11:
Paul has taken two tests in his class. He scored 72 on his first test, for which the mean of
all scores was 65 and the standard deviation was 8. Her received a 60 on a second
test, for which the mean of all scores was 45 and the standard deviation was 12.
In comparison to the other students, did Paul do better on the first test or the second
test?
Page 42
For instructional purposes only
Relative Position
z-Score
Example 12:
A consumer group tested a sample of 100 light bulbs. It’s found that the mean life
expectancy of the bulbs was 842 h, with a standard deviation of 90. One particular
light bulb from the Durabright Company had a 𝑧-score of 1.2. What was the life span of
this bulb?
Page 43
For instructional purposes only
Relative Position
Percentile
Example 13: 44 37 39 29 33
29 41
Page 45
For instructional purposes only
Relative Position
Percentile
Page 46
For instructional purposes only
Relative Position
Percentile
Example 14:
On a reading examination given to 900 students, Elaine’s score of 602 was higher than
the scores of 576 of the students who took the examination. What is the percentile for
Elaine’s score?
Page 47
For instructional purposes only
Relative Position
Quartile
The three numbers 𝑄1 , 𝑄2 , and 𝑄3 that partition a ranked data set into
four (appx.) equal groups are called the quartiles of the data.
For instance, for the data set below, the values 𝑄1 = 11, 𝑄2 = 29, and 𝑄3 = 104 are the
quartiles of the data.
2, 5, 5, 8, 11, 12, 19, 22, 23, 29, 31, 45, 83, 91, 104, 159, 181, 312, 354
𝑄1 𝑄2 𝑄3
Page 49
For instructional purposes only
Relative Position
Quartile
Example 15:
Use medians to find the quartiles of a data set.
The table below lists the calories per 100mL of 25 popular sodas. Find the quartiles for
the data.
Calories, per 100ml of selected sodas
43 37 42 40 53 62 36 32 50 49
26 53 73 48 45 39 45 48 40 56
41 36 58 42 39
Page 50
For instructional purposes only
Relative Position
Quartile
Example 15:
Use medians to find the quartiles of a data set.
The table below lists the calories per 100mL of 25 popular sodas. Find the quartiles for
the data.
Calories, per 100ml of selected sodas
26 32 36 36 37 39 39 40 40 41
42 42 43 45 45 48 48 49 50 53
53 56 58 62 73
Page 51
For instructional purposes only
Q04: Data Management
The table shows the set of your class scores in 46 48 45 47 48
GEC4-UT01.
41 47 45 44 43
I. Using these raw data, find the ff:
43 43 45 43 37
• The mean, median and mode/s (3pts)
• Range, Variance and Standard Deviation (3pts) 48 47 48 48 42
• Quartiles (3pts)
48 45 36 48 39
II. Using your score, compute for the ff:
• Deviation from mean (1pt) 38 41 46 32 47
• z-Score (1pt)
43 38 48 48 43
• Percentile (1pt)
BSCESEP-3B
4.4
Normal
Distribution
Page 54
For instructional purposes only
Frequency Distribution
Page 55
For instructional purposes only
Frequency Distribution
12 Classes
Page 56
For instructional purposes only
Frequency Distribution
Page 57
For instructional purposes only
Frequency Distribution
𝑲 = 𝟏 + 𝟑. 𝟑𝟐𝟐 𝐥𝐨𝐠 𝑵
3. Determine the class size 𝑪. Where 𝑪 = 𝑹/𝑲
4. Determine the lower limit of the first class.
5. Construct the class intervals and determine the class frequencies.
Page 58
For instructional purposes only
Frequency Distribution
Example 17:
Construct a frequency distribution table for the ff.
raw scores of 50 students in 200 item test:
144 112 156 122 168 172 141 159 127 154
156 145 134 137 123 149 144 160 136 139
142 138 159 151 147 150 126 152 147 136
135 132 146 133 150 122 139 149 152 129
131 155 116 140 145 135 160 125 172 163
Page 59
For instructional purposes only
Frequency Distribution
Example 17:
Construct a frequency distribution table for the ff.
raw scores of 50 students in 200 item test:
112 116 122 122 123 125 126 127 129 131
132 133 134 135 135 136 136 137 138 139
139 140 141 142 144 144 145 145 146 147
147 149 149 150 150 151 152 152 154 155
156 156 159 159 160 160 163 168 172 172
Page 60
For instructional purposes only
Frequency Distribution
112 116 122 122 123 125 126 127 129 131
132 133 134 135 135 136 136 137 138 139
Example 17:
139 140 141 142 144 144 145 145 146 147
Construct a frequency distribution table for the ff.
147 149 149 150 150 151 152 152 154 155
raw scores of 50 students in 200 item test:
156 156 159 159 160 160 163 168 172 172
Page 62
For instructional purposes only
Frequency Distribution
Histograms
200
Download times, s Subscribers
1 0
0-5 6
5-10 17 160
10-15 43 1 0
Number of subscribers
15-20 92 120
20-25 151
100
25-30 192
0
30-35 190
35-40 149 60
40-45 90 0
45-50 45 20
50-55 15
0
55-60 10 0, 5 5, 10 10, 15 15, 20 20, 25 25, 0 0, 5 5, 0 0, 5 5, 50 50, 55 55, 60
Page 63
For instructional purposes only
Frequency Distribution
Histograms
Scores Students 1
111.5 – 120.5 2
12
120.5 – 129.5 7
129.5 – 138.5 10
Number of students
10
138.5 – 147.5 12
147.5 – 156.5 11
156.5 – 165.5 5 6
165.5 – 174.5 3
0
111.5, 120.5 120.5, 129.5 129.5, 1 .5 1 .5, 1 .5 1 .5, 156.5 156.5, 165.5 165.5, 1 .5
Page 64
For instructional purposes only
ACT04: Frequency Distribution
Using your Unit Test 1 scores: 46 48 45 47 48
41 47 45 44 43
I. Construct a frequency distribution table
• Determine the range, R
• Determine the number of classes K
43 43 45 43 37
• Determine the class size C
• Construct the table with the ff. parameters: 48 47 48 48 42
• Class interval
• Class Boundary 48 45 36 48 39
• Frequency
• Relative Frequency
38 41 46 32 47
II. Make a Histogram out from the constructed FDT.
43 38 48 48 43
III. Answer the ff. guide questions:
BSCESEP-3B
ACT04: Frequency Distribution
III. Guide questions:
BSCESEP-3A
Normal Distribution
Properties:
• The mean, median and mode are equal
• The y-value of each point of the curve is
the percent of the data at the
corresponding x-value
• Areas under the curve are symmetric
about the mean are equal
• The total area under the curve is 1 0 1 2 3 4 5 6 7 8
Page 67
For instructional purposes only
Normal Distribution
E.g., In the normal distribution shown below, the area of the shaded
region is 0.159 units.
Page 68
For instructional purposes only
Normal Distribution
Empirical Rule
Page 69
For instructional purposes only
Normal Distribution
Empirical Rule
Example 18:
A survey of 1000 gas stations found that the price charged for a liter of regular gas could
be closely approximated by a normal distribution with a mean of P38.75 and a standard
deviation of P2.25. How many of the gas stations charge
• Between P34.25 to P43.25 for a liter of regular gas?
• Less than P41.00 for a liter of regular gas?
• More than P43.25 for a liter of regular gas?
Page 70
For instructional purposes only
Standard Normal Distribution
Page 71
For instructional purposes only
Standard Normal Distribution
Page 72
For instructional purposes only
Standard Normal Distribution
− 𝑥−𝜇 2
𝑒 2𝜎2
𝑦=
𝜎 2𝜋
− 𝑥−𝜇 2
𝑥2 𝑥2
𝑒 2𝜎2 ∙ 𝑑𝑥
𝐴 = න 𝑦𝑑𝑥 = න
𝑥1 𝑥1 𝜎 2𝜋
Page 74
For instructional purposes only
Standard Normal Distribution
Example 19:
Find the area of the standard normal distribution between 𝑧 = −1.44 and 𝑧 = 0.
Page 75
For instructional purposes only
Standard Normal Distribution
Example 21:
A soda machine dispenses soda into 12-ounce cups. Tests show that the actual amount of soda
dispensed is normally distributed, with a mean of 11.5 oz and a standard deviation of 0.2 oz.
• What percent of cups will receive less than 11.25 oz of soda?
• What percent of cups will receive between 11.20 oz and 11.55 oz of soda?
• If cup is filled at random, what is the probability that the machine will overflow the cup?
Page 76
For instructional purposes only
Standard Normal Distribution
A cut-off score is a score that separates data into two groups such
that the data in one group satisfy a certain requirement and the data
in the other group do not satisfy the requirement.
Example 22:
The OnTheGo company manufactures laptop computers. A study indicates that the
life span of its computers are normally distributed, with a mean of 4.0 years and a
standard deviation of 1.2 years. How long a warranty period should the company
offer if the company wishes less than 4% of the computers to fail during the warranty
period?
Page 77
For instructional purposes only
Standard Normal Distribution
Example 22:
The OnTheGo company manufactures laptop computers. A study indicates that the
life span of its computers are normally distributed, with a mean of 4.0 years and a
standard deviation of 1.2 years. How long a warranty period should the company
offer if the company wishes less than 4% of the computers to fail during the warranty
period?
Page 78
For instructional purposes only
4.5
Linear Regression
and Correlation
Page 80
For instructional purposes only
Linear Regression
Time between eruptions
272 227 237 238 203 270 218 226 250 245
(in seconds), x
Duration of eruptions
89 79 83 82 81 85 78 81 85 79
(in seconds), y
Scatter Diagram
90
88
86
Duration, (s)
84
82
80
78
76
0 50 100 150 200 250 300
Interval, (s)
Page 81
For instructional purposes only
Linear Regression
Time between eruptions
272 227 237 238 203 270 218 226 250 245
(in seconds), x
Duration of eruptions
89 79 83 82 81 85 78 81 85 79
(in seconds), y
Scatter Diagram
90
88
The least-squares regression line for a bivariate data is the line that
minimizes the sum
86
of the squares of the vertical deviations from each
data point to the
84 line.
Duration, (s)
82
80
78
76
0 50 100 150 200 250 300
Interval, (s)
Page 82
For instructional purposes only
Linear Regression
Time between eruptions
272 227 237 238 203 270 218 226 250 245
(in seconds), x
Duration of eruptions
89 79 83 82 81 85 78 81 85 79
(in seconds), y
Scatter Diagram
90
88
86
Duration, (s)
84
82
80
78
76
0 50 100 150 200 250 300
Interval, (s)
Page 83
For instructional purposes only
Linear Regression
The least-squares regression line for a bivariate data is the line that
minimizes the sum of the squares of the vertical deviations from each
data point to the line.
The equation of the least-squares line for the 𝑛 ordered pairs
𝑥1 , 𝑦1 , 𝑥2 , 𝑦2 , 𝑥3 , 𝑦3 , . . ., 𝑥𝑛 , 𝑦𝑛
is 𝑦ො = 𝑎𝑥 + 𝑏, where
𝑛 σ 𝑥𝑦−(σ 𝑥)(σ 𝑦)
𝑎= and 𝑏 = 𝑦ത − 𝑎𝑥ҧ
𝑛 σ 𝑥 2 −(σ 𝑥)2
Page 84
For instructional purposes only
Linear Regression
Example 23:
Apply the formula to the ff. data
Page 85
For instructional purposes only
Linear Regression
Time between eruptions
272 227 237 238 203 270 218 226 250 245
(in seconds), x
Duration of eruptions
89 79 83 82 81 85 78 81 85 79
(in seconds), y
Scatter Diagram
90
88
86
Duration, (s)
84
82
80
78
76
0 50 100 150 200 250 300
Interval, (s)
Page 86
For instructional purposes only
Linear Regression
Time between eruptions
272 227 237 238 203 270 218 226 250 245
(in seconds), x
Duration of eruptions
89 79 83 82 81 85 78 81 85 79
(in seconds), y
Scatter Diagram
90
88
86 y = 0.119x + 53.817
Duration, (s)
84
82
80
78
76
0 50 100 150 200 250 300
Interval, (s)
Page 87
For instructional purposes only
Linear Regression
Example 24:
Using the equation in example 23, if the time between eruptions is 200 seconds, what is
the estimated duration of the eruption?
Page 88
For instructional purposes only
Linear Correlation
𝑛 σ 𝑥𝑦 − (σ 𝑥) (σ 𝑦)
𝑟=
2 2
𝑛(σ 𝑥 2 ) − (σ 𝑥) ∙ 𝑛(σ 𝑦 2 ) − (σ 𝑦)
Page 90
For instructional purposes only
Linear Correlation
Example 25:
Find the linear correlation coefficient of the data in example 23.
Page 92
For instructional purposes only
Mathematics in the Modern World
References
Page 93
For instructional purposes only