Math
Math
CvSU Vision Republic of the Philippines Cavite State University shall provide excellent,
The premier university in
CAVITE STATE UNIVERSITY
equitable and relevant educational opportunities in
historic Cavite recognized for the arts, science and technology through quality
excellence in the development of Cavite City Campus instruction and relevant research and development
activities.
globally competitive and morally
Brgy. 8, Pulo II, Dalahican, Cavite City It shall produce professional, skilled and
upright individuals. morally upright individuals for global competitiveness.
CHAPTER 4
DATA MANAGEMENT
Objectives:
After the completion of the chapter, students should be able to:
• Use variety of statistical tools to process and manage numerical data;
• Use methods of linear regression and correlations to predict the value of a variable given certain
conditions; and
• advocate the use of statistical data in making important decisions.
EVALUATION REQUIREMENTS:
• Problem Sets and Exercises
• Quiz
• Quantitative Research Proposal (FINAL PROJECT)
SAMPLE: You want the university to offer an online enrolment system to improve the enrolment
process. CSG asks your team to present hard data that will convince the administration. Prepare a
proposal on how you will do this task.
Statistical tools derived from mathematics are useful in processing and managing numerical data
in order to describe a phenomenon and predict values.
NATURE OF STATISTICS
General Uses of Statistics
a. Statistics aids in decision making
• provides comparison
• explains action that has taken place
• justifies a claim or assertion
• predicts future outcome
• estimates unknown quantities
b. Statistics summarizes data for public use
FIELDS OF STATISTICS
a. Statistical Methods of Applied Statistics – refers to procedures and techniques used in the
collection, presentation, analysis and interpretation of data.
• Descriptive statistics
- methods concerned with the collection, description and analysis of a set of data
without drawing conclusions or inferences about a larger set.
- the main concern is simply describe the set of data.
• Inferential Statistics
- methods concerned with making predictions or inferences about a larger set of data
using only the information gathered from a subset of this larger set.
- the main is not merely to describe but actually predict and make inferences based
on the information gathered.
2
b. Statistical Theory of Mathematical Statistics – deals with the development and exposition of
theories that serve as bases of statistical methods.
CLASSIFICATION OF VARIABLE
1. Discrete vs. Continuous
Discrete – a variable which can assume finite number of values; usually measured by counting or
enumeration.
Continuous – a variable which can assume infinitely many values corresponding to a line number.
2. Qualitative vs. Quantitative
Qualitative – a variable that yields a categorical response.
Example: Occupation, Marital Status
Quantitative – a variable that takes on numerical values representing an amount or quantity.
Example: Weight, Height, Age, Number of cars
LEVEL OF MEASUREMENT
1. Nominal Level – the nominal level or classificatory scale is the weakest level of measurement where
numbers or symbols are used simply for categorizing subjects into different groups.
Examples: Sex: M-Male F-Female
Marital Status: 1-Single 2-Married 3-Widowed 4-Separated
2. Ordinal Level – the ordinal level of measurement contains the properties of the nominal level, and in
addition, the numbers assigned to categories of any variables may be ranked or ordered in some
low-to-high manner.
Examples: Teaching Ratings 1-poor 2-fair 3-good 4-excellent
Year Level 1-1st year 2-2nd year 3-3rd year 4-4th year
3. Interval Level – the interval level is that which the distances between any two numbers on the scale
are of known sizes.
Example: IQ level, Temperature
4. Ratio Level – the ratio level of measurement contains all the properties of the interval level, and in
addition, it has a “true zero” point.
Example: Number of correct answers in exam.
CLASSIFICATION OF DATA
1. Primary vs. Secondary
a. Primary Source – data measured by the researcher/agency that published it.
b. Secondary Source – any republication of data by another agency.
Example: The publication of the National Statistics Office (NSO) is primary sources and
all subsequent publications of other agencies are secondary sources.
2. External vs. Internal
a. Internal Data – information that relates to the operations and functions of the organization
collecting the data.
b. External Data – information that relates to some activity outside the organization collecting
the data.
Example: The sales data of SM is internal data for SM but external data for any other
organization such as Robinson’s.
2. A study to be conducted by an NGO would determine the Filipinos’ awareness about the war
against IRAQ.
Population: _________________________________________________________________________
Variable: ___________________________________________________________________________
Type of Variable: ____________________________________________________________________
SLOVIN’S FORMULA
𝑁
𝑛=
1 + 𝑁𝑒 2
Where:
n = sample size
N = population size
e = margin of error (0.05 or 0.01)
Example:
1. Solve for the sample size of 350 patients from Cavite Medical Center.
𝑁 350 350 350
𝑛= 2
= 2
= = = 186.67 = 187
1 + 𝑁𝑒 1 + (350)(0.05) 1 + (350)(0.0025) 1.875
NOTE: Sample size, when computed, must be rounded up to its nearest whole number.
2. 12,345
3. 1000
4. 1203
2. Observation method – makes possible the recording of behavior but only at the time of occurrence.
3. Experimental method – a method designed for collecting data under controlled conditions. An
experiment is an operation where there is actual human interference with the conditions
than can affect the variable under study.
4. Use of existing studies – e.g., census, health statistics, and weather bureau reports.
Two type:
• Documentary sources – published or written reports, periodicals, unpublished documents,
etc.
• Field sources – researchers who have done studies on the area of interest are asked
personally or directly for information needed.
5. Registration method – e.g., car registration, student registration and hospital admission.
Advantages Disadvantages
• When a large mass of quantitative data are
• It gives emphasis to significant figures and included in a text or paragraph, the
comparisons. presentation becomes almost
incomprehensible.
• It is simplest and most appropriate • Paragraphs can be tiresome to read
approach when there are only a few especially if the same words are repeated
numbers to be presented. so many times.
2. Box Head –the portion of the table that contains the column heads which describe the data in each
column.
3. Stub – The portion of the table usually comprising the first column on the left. The row caption is a
descriptive title of the data on the given line.
4. Field – main part of the table; contains the substance or the figures of one’s data.
5. Source note – an exact citation of the source of data presented in the table (should always be
placed when the figures are not original).
6. Foot note – any statement or note inserted at the bottom of the table.
Index CrimesPhilippine77,261
Source: National 124
Police 67,354 106 58,684 90
stub Murder 8,707 14 8,293 13 7,758 12
Homicide 8,069 13 7,912 12 7,123 11
Physical 29,862 35 20,462 32 18,722 29 field
Injury 13,817 22 11,164 18 9,856 15
Robbery 22,780 37 17,374 27 12,940 20
Theft 2,026 3 2,149 3 2,285 4
Rape
44,065 71 37,365 59 38002 58
Nonindex crimes
Graphical Presentation – a graph or chart is a device for showing numerical values or relationships in
pictorial form.
Advantages:
• Main features and implications of a body of data can be grasped at a glance.
• Can attract attention and hold the reader’s interest.
• Simplifies concepts that would otherwise have been expressed in so many words.
• Can readily clarify data; frequently bring hidden facts and relationships.
50 50 50 50 50 50 51 52 53 53 57
59 59 60 60 60 62 62 62 62 63 65
66 66 68 68 68 68 68 69 69 69 69
69 70 71 71 71 71 72 72 72 72 72
73 73 73 73 74 74 74 75 75 75 75
75 76 76 76 76 77 77 77 77 78 79
79 79 79 79 80 80 80 81 81 81 81
82 82 82 82 82 82 83 83 84 84 84
84 84 84 84 85 85 86 86 87 87 87
87 87 87 88 89 89 91 92 94 94 96
• Classes – these are mutually exclusive categories defining the lower limit and upperlimit
with equal intervals. (C – Class size; R – range; K – Class Interval)
𝑹 = 𝑯𝒊𝒈𝒉𝒆𝒔𝒕 𝑽𝒂𝒍𝒖𝒆 − 𝑳𝒐𝒘𝒆𝒔𝒕 𝑽𝒂𝒍𝒖𝒆 = 96 − 50 = 46
𝑲 = 𝟏 + 𝟑. 𝟑𝟐𝟐 𝐥𝐨𝐠 𝑵 = 1 + 3.322 log 110 = 7.78 = 8
𝑹 46
𝑪= = = 5.75 = 𝟔
𝑲 8
• Class Frequency – the number of observations falling in the class
• Class interval – the numbers defining the class
• Class limits – the end numbers of the class
• Class boundaries – the true class limits; lower class boundary (LCB) is usually defined as
halfway between the lower class limit of the class and the upper class limit of the preceding
class while the upper class boundary (UCB) is usually defined as the halfway between the
upper class limit of the class and the lower limit of the next class.
• Class size – the difference between the upper class boundaries of the class and the
preceding class
• Class mark – midpoint of a class interval
3. Determine the lowest class limit. The first class must include the smallest value in the data set.
In our example, 50 is the lowest class limit.
4. Determine all the class limits by adding the class size to the limit of the previous class. (In our
example, the series of lower limits will be: 50 (+6), 56 (+6), 62, … up to 92. And upper limits are
one less than the next class.)
5. Tally the frequencies for each class. Sum the frequencies and check against the total number
of observations.
6. Determine the lower class boundaries by subtracting 0.5 from the lower limits.
7. Determine the upper class boundaries by adding 0.5 to the upper limits.
8. Determine the class mark by getting the average of the lower and upper limits.
NOTE: A frequency distribution can be extended with cumulative frequencies and relative
frequencies.
• <CF or less than cumulative frequency is the accumulated frequencies below the upper limit.
• >CF or greater than cumulative frequency is the number of observations above the lower
limit.
• Relative frequency(percentage) is the fraction to which the class comprises the whole
observation.
144 112 156 122 168 172 141 159 127 154
156 145 134 137 123 149 144 160 136 139
142 138 159 151 147 150 126 152 147 136
135 132 146 133 150 122 139 149 152 129
131 155 116 140 145 135 160 125 172 163
Class Intervals Frequency Class Mark Class Boundary Cummulative Frequency Relative frequency
LL UL (f) (x) LCB UCB <CF >CF RF RFP (%)
b. Median
𝑛
( −<𝑐𝑓𝑝)
Formula: 𝑥̃ = 𝐿𝐶𝐵𝑚𝑑 + 𝐶 [ 2
]
𝑓𝑚𝑑
Example:
Final grades of Stat 101 students arrange in array. Solve for the median.
Solution:
1. Determine the median class by dividing the total number of observations by 2.
𝑛 110
= 2 = 55
2
2. Go over the entries in the less than cumulative frequency column. The class that immediately
has a sum of frequencies greater than the result of step 1 is the median class.
3.
𝑛
( −<𝑐𝑓𝑝)
2
Class Frequency LCB <cf 𝑥̃ = 𝐿𝐶𝐵𝑚𝑑 + [ ]𝑖
𝑓𝑚𝑑
50 – 55 10 49.5 10
56 – 61 6 55.5 16 (
110
−49)
2
62 – 67 8 61.5 24 𝑥̃ = 73.5 + 6 [ ]
22
68 – 73 25 67.5 49
Median class 74 – 79 22 73.5 71 𝑥̃ = 75.14
80 – 85 23 79.5 94
86 – 91 12 85.5 106
92 – 97 4 91.5 110
N= 110
c. Mode
𝑓 −𝑓
Formula: 𝑥̂ = 𝐿𝐶𝐵𝑚 + 𝐶 (2𝑓 𝑚 𝑓 −𝑓
1
)
𝑚− 1 2
Where: 𝑥̂ = Mode
𝐿𝐶𝐵𝑚 = LCB of the modal class
𝑓𝑚 = Frequency of the modal class
𝑓1 = frequency of the class below the modal class
𝑓2 = frequency of the class above the modal class
Example:
Final grades of Stat 110 students arrange in array. Solve for the median.
Solution:
1. Determine the modal class by identifying the class that contains the highest frequency or
observation. (NOTE: This should be a bimodal class. Since two classes has 24 as its frequency. But
in this case, the author altered the 3rd and 5th class to make it unimodal).
𝑓 𝑓1
Class Frequency LCB <cf 𝑥̂ = 𝐿𝐶𝐵𝑚 + 𝐶 (2𝑓 𝑚− )
𝑚 −𝑓1 −𝑓2
50 – 55 10 49.5 10
56 – 61 6 55.5 16 25−8
𝑥̂ = 67.5 + 6 ( )
62 – 67 8 61.5 24 2(25)−8−22
2. Complete the Frequency Distribution Table to find the mean, median and mode of the data set
given:
Class F CM (x) fx LCB <CF
10-19 3
20-29 1
30-39 3
40-49 2
50-59 9
60-69 8
70-79 35
80-89 30
90-99 9
Example:
Below is the list of the scores of two groups of students in a grammar quiz.
Group A Group B
13 10
14 10
15 15
16 18
19 18
20 19
25 26
30 36
Solution:
1. Compute the mean
∑𝑥 152 ∑𝑥 152
𝑥̅𝐴 = 𝑛 = 8 = 19 𝑥̅ 𝐵 = = = 19
𝑛 8
2. Compute the deviations by subtracting the mean from each of the observations, and then
square the deviations.
Group A 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2 Group B 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2
13 -6 36 10 -9 81
14 -7 49 10 -9 81
15 -4 16 15 -4 16
16 -3 9 18 -1 1
19 0 0 18 -1 1
20 1 1 19 0 0
25 6 36 26 7 49
30 11 121 36 17 289
3. Take the sum of the squared deviations, then divide the sum by N – 1, then take the square root
of the sample variance
∑(𝑥−𝑥̅ )2 268 ∑(𝑥−𝑥̅ )2 518
𝑠𝐴 = √ = √8−1 = 6.19 𝑠𝐵 = √ = √8−1 = 8.60
𝑛−1 𝑛−1
Example:
Final grades of students in Stat 110 arranged in FDT. Solve for the Standard deviation.
𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2 𝑓 (𝑥 − 𝑥̅ )2
Class Frequency CM (x) 𝑓𝑥
50 – 55 10
56 – 61 6
62 – 67 8
68 – 73 25
74 – 79 22
80 – 85 23
86 – 91 12
92 – 97 4
N= 110
∑ 𝑓(𝑥−𝑥̅ )2 ∑ 𝑓|𝑥− 𝑥̅ |
𝑠=√ 𝑀𝐷 =
𝑛−1 𝑁
Complete the Frequency Distribution Table to find the standard deviation of the data set given:
Class F CM (x) 𝑓𝑥 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2 𝑓(𝑥 − 𝑥̅ )2
10-19 3
20-29 1
30-39 3
40-49 2
50-59 9
60-69 8
70-79 35
80-89 30
90-99 9
a. The mean, median, and mode are all equal and are located at the center of the distribution.
b. The distribution is symmetric. The distribution depicts a bell-shaped curve where the left area
is a mirror image of the right area.
c. The total area under the normal curve is 1 or 100%.
d. The distribution is asymptotic.
e. The location of the distribution is determined by the mean and the standard deviation
determines dispersion of the distribution.
𝜇 − 3𝛿 𝜇 − 2𝛿 𝜇 − 1𝛿 𝜇 𝜇 + 1𝛿 𝜇 + 2𝛿 𝜇 + 3𝛿
The mean and the standard deviation determine the shape of the distribution.
As previously stated, there are infinite families of curves depending upon the standard deviation of
the distribution. This may suggest that we have to use different table corresponding to a particular
mean and standard deviation. Well, it is not. It is necessary that we need to standardize a given
observation. the standardized score may also be termed as Z-value, Z statistics, standard deviate,
standard normal value or just normal value. The formula is shown below.
𝑥−𝜇
𝑍=
𝜎
Z - values Rules
1. The z – values are positive and negative Add the areas of the corresponding Z – values.
2. Both Z – values are positive or both Z – In either case, subtract the smaller area from
Value are negative the bigger area
3. To the right of a positive z – value or to
Subtract the area from 0.5
the left of a negative z value
4. To the right of a negative z value or to
Add area to 0.5
the left of a positive z value
Examples:
Find the area under the normal distribution curve of the following z values:
1. 0 < z < 1.63 5. z > 1.63
2. Two-tailed test – a test where the areas of rejection are both sides of the distribution. The two-
tailed test is used if the alternate hypothesis is non-directional.
Example:
A test was administered to two groups of students – the HRM student group and the
tourism student group. At the 0.05 significance level, is there difference between the scores
obtained by the two groups of students?
H0 = There is no significant difference between the scores obtained by the two groups of
students.
Ha = There is significant difference between the scores obtained by the two groups of
students.
LEVEL OF SIGNIFICANCE
• It is the probability of rejecting a true null hypothesis.
• If the null hypothesis is true and is rejected, it is called TYPE I ERROR. And if the null hypothesis
is false and is accepted, it is called TYPE II ERROR.
Decision
Null Hypothesis
Reject H0 Accept H0
H0 is true Type I Error Correct Decision
CRITICAL VALUE
• The value that divides the area of rejection and the area of acceptance.
Region of
acceptance
Region of Region of
rejection rejection
-1.701 1.701
STEPS IN HYPOTHESIS TESTING
1. State the null hypothesis (H0) and the alternative hypothesis (Ha).
2. Set the desired level of significance.
3. Determine the appropriate test statistic and establish the critical region.
4. Compute the test statistic as a basis for decision.
5. Formulate the decision.
Examples:
For each of the problems below, do the following:
• Define the variable that you are going to use to represent information.
• Formulate the appropriate null hypothesis (H 0) and the appropriate alternative hypothesis
(Ha).
1. The soft drink dispenser of a fast food center was just readjusted. The manager, wanting to
know if the dispenser is really in good condition, got a sample of 50 cups filled by the
dispenser. She would only classify the dispenser as “in good condition” (and therefore need
not to be readjusted again) if the average fill per cup of the dispenser is 8 ounces.
Solution:
• Variable: The variable that will represent the information is –
X = fill per cup of the dispenser.
2. Jenny suspects that male CvSU-CCC students spend less time studying compare to their
female counterpart. She decided to conduct a study regarding the study habits of both
male and female CvSU-CCC student spends doing his/her school work.
Solution:
• Variable: The variable that will represent the information is –
X = time spent by male CvSU-CCC student in doing school work.
Y = time spent by female CvSU-CCC student in doing school work
H0: _________________________________________________________________________________
H1: _________________________________________________________________________________
Test: _________________________________________________________________________________
2. A biologist believes that there has been an increase in the mean number of lakes infected
with milfoil, an invasive species, since the last study five years ago.
H0: _________________________________________________________________________________
H1: _________________________________________________________________________________
Test: _________________________________________________________________________________
3. A scientist’s research indicates that there has been a change in the proportion of people
who support certain environmental policies. He wants to test the claim that there has been
a reduction in the proportion of people who support these policies.
H0: _________________________________________________________________________________
H1: _________________________________________________________________________________
Test: _________________________________________________________________________________
4. For a shipment of cable, suppose that the specifications call for a mean breaking strength
of 2010 pounds. A sample of the breaking strength of 32 segments of cable has a mean of
1895 pounds with an associated standard deviation of 59 pounds. Using the 5% level, test the
significance of the difference found.
H0: _________________________________________________________________________________
H1: _________________________________________________________________________________
Test: _________________________________________________________________________________
5. An electrical company claimed that less than 2% of the parts which they supplied on a
government contract are defective. A sample of 642 parts was tested, and 17 did not meet
the specifications. Can we accept the company’s claim at a .05 level of significance?
H0: _________________________________________________________________________________
H1: _________________________________________________________________________________
Test: _________________________________________________________________________________
TEST OF DIFFERENCE
1. Z – Test of One Population Mean
FUNCTION: Parametric. Used to determine if a given sample mean was drawn from the
population with known parameters.
LEVEL OF MEASUREMENT: Interval/Ratio
SAMPLE DATA: SATT Scores, Average, Ratings, IQ, Budget, Gross Income
RESEARCH PROBLEM: Is the group of teenagers in Makati represent Metro Manila teenagers?;
Is there enough evidence to contradict the rental company’s claim that the mean time to
rent a car on their website is 60 seconds if the mean time of rent of random sample of 36
customers was 75 seconds?; Is there a significant difference between the mean score of the
2018 LET passers from CvSU with mean score of the total LET passers of CvSU?
RESEARCH PROBLEM: Is there a significant difference between students who are in favor of
Duterte’s war on drug before and after the forum?; Is there a significant difference between
voters’ choice of candidate before and after the political debate?