Lesson 5: Data Management
Learning Outcomes
• Use a variety of statistical tools to process and manage numerical
data.
• Use the methods of linear regression and correlations to predict the
value of a variable given certain conditions.
• Advocate the use of statistical data in making important decisions.
Mean(average), median 2, 3, 4, 5, 6, mode: 1,1,1 , 3, 4
Data Gathering and Organization
DATA
• are measurements or observations that are gathered fro an event
under study.
STATISTICS and probability
• is the branch of mathematics that involves collecting, organizing,
summarizing, and presenting data and drawing general conclusions
from that data. Cardano (dice) 1-6, 1/6=0.016
Populations and Samples
When statistical studies are performed, we usually begin by identifying
the population for the study.
A Population consists of all subjects under study. (i.e. all colleges in the
Philippines). 3000 STUDENTS AT NUD (all)
More often than not, it’s not realistic to gather data from every
member of a population. (COMPARISON) (STATE OF CALAMITY)
A Sample is a representative subgroup or subset of a population.
HMA 221 (SAMPLE) (budget)
Sampling Method
Four Basic Sampling Methods:
1. In order to obtain a random sample, each subject of the population
must have an equal chance of being selected. (lottery)
2. A systematic sample is taken by numbering each member of the
population and then selected every kth member; where k is a natural
member. 7TH , 8TH SO ON
When using systematic sampling, its important that the starting
number is selected at random.
Sampling Method
Four Basic Sampling Methods:
3. When a population is divided into groups where the members of
each group have similar characteristics and members from each group
are chosen at random, the result is called a stratified sample.
Kpop(5), English movies(10), anime(5), horror(5), undecided(16)
hobbies
4. When an existing group of subjects that represent the population is
used for a sample, it is called a cluster sample.
PHILIPPINES –CITIES-MANILA, DASMA, ANGELES CITY, MUNTINLUPA
WORLD-ASIA, EUROPE, ETC.
Example 1: Random Sample
(a) The names of 25 employees out of 250 are chosen out of a hat for the lottery
method at work.
Each of the 250 employees would be assigned a number between 1 and 250,
after which 25 of those numbers would be chosen at random.
SYSTEMATIC SAMPLE
(b) If you randomly select 1000 people from a town with a population of 100,000
residents, each person has a 1000/100000 = 0.01 probability.
That’s a simple calculation requiring no additional knowledge about the
population’s composition. Hence, simple random sampling.
RANDOM SAMPLING
Example 2: Systematic Sample
(a) In a population of 10,000 people, a statistician selects every 100th person for
sampling.
The sampling intervals can also be systematic, such as choosing a new
sample to draw from every 12 hours.
(b) If you wanted to select a random group of 1,000 people from a population of
50,000 using systematic sampling, all the potential participants must be placed
on a list and a starting point would be selected.
Once the list is formed, every 50th person on the list (starting the count at
the selected starting point) would be chosen as a participant, since 50,000/1,000
= 50.
Example 3: Stratified Sample
(a) In one might divide a sample of adults into subgroups by age, like 18–29, 30–
39, 40–49, 50–59, and 60 and above.
To stratify this sample, the researcher would then randomly select
proportional amounts of people from each age group. This is an effective
sampling technique for studying how a trend or issue might differ across
subgroups.
Some of the most common strata used in stratified sampling include age,
gender, religion, race, educational attainment, socioeconomic status, nationality,
etc.
Example 4: Cluster Sample
(a) A business owner wants to explore the performance of his/her plants that are
spread across various parts of the Philippines. The owner creates clusters of
the plants. He/she then selects random samples from these clusters to
conduct research.
CLUSTER SAMPLE
(b) An organization intends to survey to analyze the performance of smartphones
across Philippines. They can divide the entire country’s population into cities
(clusters) and select cities with the highest population and also filter those
using mobile devices.(DIFFERENT COURSES, PSY, TOU, ENG…
Data Gathering and Organization
The data collected for a statistical study are called Raw Data.
Two methods that use:
1. Frequency Distributions
2. Stem and Leaf Plots
1. Frequency Distributions
A Grouped Frequency Distribution is a type of Frequency Distribution
that can be constructed uses numerical data.
It is divided into classes:
1. Try to keep the number of classes between 5 and 15.
2. Make sure the classes do not overlap.
3. Don’t leave out any numbers between the lowest and highest, eve
if nothing falls into a particular classes.
4. Make sure the range of numbers included in a class
Example: Frequency Distributions
These data represent the record high temperatures for each of the 50
states in degrees Fahrenheit. Construct a grouped frequency
distribution for the data.
Example 1: SOLUTION
Step 1: Subtract the lowest value from the highest value:
134 – 100 = 34.
Step 2: If we use a range of a 5 degrees, that will give us 7 classes, since
the entire range (34 degrees) divided by 5 is 6.8.
7
Example 1: SOLUTION
Step 3: Start with the lowest value CLASS
and add 5 to get the lower class 100
limits: 100, 105, 110, 115, 120, 125, 105
130.
110
115
120
125
130
Example 1: SOLUTION
Step 4: Setup the classes by CLASS
subtracting one from each lower 100 – 104
class limit except the first lower 105 - 109
class limit. 110 - 114
115 - 119
120 - 124
125 - 129
130 - 134
Example 1: SOLUTION
Step 5: Tally the data and CLASS TALLY FREQUENCY
record the frequencies 100 - 104 2
105 - 109 8
110 - 114 18
115 - 119 13
120 - 124 7
125 - 129 1
130 – 134 1
2. Stem and Leaf Plots
• Another way to organize data is to use a stem and leaf plot
(sometimes called a stem plot).
• Each data value or number is separated into two (2) parts:
• Example for two-digits number; 53
- tens digit, 5, is called the stem
- ones digit, 3, is called the leaf
2. Stem and Leaf Plots
• Another way to organize data is to use a stem and leaf plot
(sometimes called a stem plot).
• Example for three-digits number; 138
- first two digits, 13, is called the stem
- third digit, 8, is called the leaf
Example: Stem and Leaf Plots
The data below show the number of games won by the Chicago Cubs in
each of the 21 seasons from 1988 – 2008, with exception of 1994,
which was a short season because of a player strike. Draw a stem and
leaf plot for the data. 6, 7, 8, 9 stem
Example: Stem and Leaf Plots
SOLUTION:
Notice that the digits of stem ranges from 6 to 9, so we set up a table with
stems 6, 7, 8, 9.
CLASS LEAF(ONES)
6 6 7 5 7 8
7 9 6 3 8 7 7 7
8 5 9 8 8 4
9 7 0 3
Exercises: Choosing Sample
1. All employees of the company are listed in alphabetical order. From the first
10 numbers, you randomly select a starting point: number 6. From number 6
onwards, every 10th person on the list is selected (6, 16, 26, 36, and so on),
and you end up with a sample of 100 people.
2. The company has offices in 10 cities across the country (all with roughly the
same number of employees in similar roles). You don’t have the capacity to
travel to every office to collect your data, so you use random sampling to
select 3 offices.
3. 100 employees of Company X. You assign a number to every employee in the
company database from 1 to 1000, and use a random number generator to
select 100 numbers.
Exercises: Choosing Sample
4. Your population list alternates between men (on the even numbers) and
women (on the odd numbers). You choose to sample every tenth individual,
which will therefore result in only men being included in your sample. This would
obviously be unrepresentative of the population.
5. The company has 800 female employees and 200 male employees. You want to
ensure that the sample reflects the gender balance of the company, so you sort
the population into two strata based on gender. Then you use random sampling
on each group, selecting 80 women and 20 men, which gives you a
representative sample of 100 people.
6. I am having a special dinner.I am trying to decode what to serve as the main
course: chicken, steaks or fish. I will call 7 people to find out their preference out
of 113 invited listed.
Questions?
Clarifications?
Thank you for listening!