0% found this document useful (0 votes)
7 views

module_7

Uploaded by

bkg46y4gjg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

module_7

Uploaded by

bkg46y4gjg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Data:

Gathering, Organizing,
Representing,
Interpreting
Learning Target:
1. Define data and statistics.
2. Explain the difference between a
population and a sample.
3. Describe four basic methods of sampling.
4. Construct a frequency distribution for a
data set.
5. Draw a stem and leaf plot for a data set.
elementary statistical terms :

Data- are measurements or


observations that are gathered for an
event under study.
Statistics- is the branch of mathematics
that involves collecting, organizing,
summarizing, and presenting data and
drawing general conclusions from that data.
statistics:
descriptive statistics - consists of
methods for organizing, displaying and
describing data using tables, graphs,
and summary measures
Examples:

Measures of Central Tendency Frequency Distributions


Mean
Histograms
Median
Bar charts
Mode
Measures of Dispersion
Range
Variance
Standard Deviation
statistics:
inferential statistics - consists of
methods that use sample results to help
make decisions or predictions about a
population.
Examples:

Hypothesis Testing
T- test
ANOVA
chi-square

Confidence Intervals

Correlation and Regression


population

- consists of all subjects under study


(e.g) individuals, items, or objects – whose
characteristics are being studied.

- the population that is being studied is also


called target population.
population
real - all units really exist (students of SSCT, daily
production of breads,…)

hypothetical – is generally defined, but really exists just


a particular part of it (physical or chemical
measurements).

More often than not, it’s not realistic to gather data from every member
of a population.

A unit is a single entity (usually a person or an object) whose


characteristics are of interest.
sample
- is a representative subgroup or subset of a
population. A sample from a statistical population is
a proportion (a subset) of the population selected
for study
A survey that includes every member of the population is called census.
The technique of collecting information from a proportion of the
population is called sample survey.
A sample that represents the characteristics of the population as closely
as possible is called a representative sample.
A parameter is a numerical description of a
sample can be
population characteristic.

random – A sample drawn in such a way


A statistic is a numerical description of a sample that each
element of
characteristic. the population has a chance of being selected.
If all samples of the same size selected from a population
have the
Example: same
Decide chance
whether theof being selected,
numerical we calla it
value describes simple
population
randomorsampling.
parameter Such a sample is called a simple random
a sample statistic.
sample.
1. A recent survey of a sample of 450 college students reported that
thenon-random
average weekly– The elements
income of the
for students sample
is Php are not
325.00.
selected randomly but with a view of obtaining a
2. The average weekly
representative income for all students is Php 405.00.
sample.
Main Types of Data (variables)
A variable is a characteristic under study that assumes different
values for different elements.
> The value of variable for an element is called an observation or
measurement.

A data set is a collection of observations on one or more


variables. The number of observations we call a sample size and
denote usually 𝑛. We distinguish two basic types of data
(variables)
qualitative or categorical data – A variable that cannot
assume a numerical value but can be classified into two or
more non-numeric categories is called a qualitative or
categorical variable, the data collected on such a variable
are called qualitative data. Examples: color of cars (black,
red, green,. . . ), marital status of people (unmarried,
married, divorced, widow–widower), sex (male, female),
etc.
quantitative or numerical data – A variable that can be
measured numerically is called a quantitative variable. The
data collected on a quantitative variable are called
quantitative data.

o discrete variable – usually integer numbers

Examples: number of typographical errors in newspapers, number of


persons in a family, number of cars owned by families, etc.

o continuous variable – real numbers

Examples: height, weight, survival time, etc


Levels of Measurement
> The level of measurement determines which
statistical calculations are meaningful. The four levels
of measurement are: nominal, ordinal, interval, and
ratio.
1. Nominal Level

In this level of measurement, the numbers in the variable are used only
to classify the data. In this level of measurement, words, letters, and
alpha-numeric symbols can be used. Suppose there are data about
people belonging to three different gender categories. In this case, the
person belonging to the female gender could be classified as F, the
person belonging to the male gender could be classified as M, and
transgendered classified as T. This type of assigning classification is
nominal level of measurement.

Examples: City of birth, Gender, Ethnicity, Car brands, Marital status


2. Ordinal Level

This level of measurement depicts some ordered relationship among the


variable’s observations. Suppose a student scores the highest grade of
100 in the class. In this case, he would be assigned the first rank. Then,
another classmate scores the second highest grade of 92; she would
be assigned the second rank. A third student scores 81 and he would be
assigned the third rank, and so on. The ordinal level of measurement
indicates an ordering of he measurements.

Examples: Top 5 Olympic medallists, Language ability (e.g., beginner,


intermediate, fluent), Likert-type questions (e.g., very dissatisfied to very
satisfied)
3. Interval Level

The interval level of measurement not only classifies and orders the
measurements, but it also specifies that the distances between each interval
on the scale are equivalent along the scale from low interval to high interval.
For example, an interval level of measurement could be the measurement of
anxiety in a student between the score of 10 and 11, this interval is the same
as that of a student who scores between 40 and 41. A popular example of
this level of measurement is temperature in centigrade, where, for example,
the distance between 940C and 960C is the same as the distance between
1000C and 1020C.

Examples: Test scores (e.g., IQ or exams), Personality inventories,


Temperature in Fahrenheit or Celsius
4. Ratio Level

In this level of measurement, the observations, in addition to having equal


intervals, can have a value of zero as well. The zero in the scale makes this
type of measurement unlike the other types of measurement, although the
properties are similar to that of the interval level of measurement. In the
ratio level of measurement, the divisions between the points on the scale
have an equivalent distance between them.

Examples: Height, Age, Weight, Temperature in Kelvin


Sampling Methods
Probability Sampling Methods

1. Simple random sampling In this case each individual is chosen entirely


by chance and each member of the population has an equal chance, or
probability, of being selected. One way of obtaining a random sample is to
give each individual in a population a number, and then use a table of
random numbers to decide which individuals to include.1 For example, if you
have a sampling frame of 1000 individuals, labelled 0 to 999, use groups of
three digits from the random number table to pick your sample. So, if the
first three numbers from the random number table were 094, select the
individual labelled “94”, and so on.
2. Systematic sampling Individuals are selected at regular intervals from
the sampling frame. The intervals are chosen to ensure an adequate sample
size. If you need a sample size 𝑛 from a population of size 𝑥, you should
select every 𝑥/𝑛𝑡ℎ individual for the sample. For example, if you wanted a
sample Greater Heights” size of 100 from a population of 1000, select every
1000/100 = 10𝑡ℎ member of the sampling frame.
3. Stratified sampling In this method, the population is first divided into
subgroups (or strata) who all share a similar characteristic. It is used when
we might reasonably expect the measurement of interest to vary between
the different subgroups, and we want to ensure representation from all the
subgroups. For example, in a study of stroke outcomes, we may stratify the
population by sex, to ensure equal representation of men and women. The
study sample is then obtained by taking equal sample sizes from each
stratum. In stratified sampling, it may also be appropriate to choose non-
equal sample sizes from each stratum.
4. Clustered sampling In a clustered sample, subgroups of the population
are used as the sampling unit, rather than individuals. The population is
divided into subgroups, known as clusters, which are randomly selected to
be included in the study. Clusters are usually already defined, for example
individual GP practices or towns could be identified as clusters. In single-
stage cluster sampling, all members of the chosen clusters are then included
in the study. In two-stage cluster sampling, a selection of individuals from
each cluster is then randomly selected for inclusion. Clustering should be
taken into account in the analysis.
Non-Probability Sampling
Methods
1. Convenience sampling

Convenience sampling is perhaps the easiest method of sampling, because


participants are selected based on availability and willingness to take part.
Useful results can be obtained, but the results are prone to significant bias,
because those who volunteer to take part may be different from those who
choose not to (volunteer bias), and the sample may not be representative of
other characteristics, such as age or sex. Note: volunteer bias is a risk of all
non-probability sampling methods.
2. Quota sampling

This method of sampling is often used by market researchers. Interviewers are


given a quota of subjects of a specified type to attempt to recruit. For example,
an interviewer might be told to go out and select 20 adult men, 20 adult
women, 10 teenage girls and 10 teenage boys so that they could interview them
about their television viewing. Ideally the quotas chosen would proportionally
represent the characteristics of the underlying population.

While this has the advantage of being relatively straightforward and potentially
representative, the chosen sample may not be representative of other
characteristics that weren’t considered (a consequence of the non-random
nature of sampling).
3. Judgment (or Purposive) Sampling

Also known as selective, or subjective, sampling, this technique relies on the


judgment of the researcher when choosing who to ask to participate.
Researchers may implicitly thus choose a “representative” sample to suit their
needs, or specifically approach individuals with certain characteristics. This
approach is often used by the media when canvassing the public for opinions
and in qualitative research. Judgment sampling has the advantage of being
time- and cost-effective to perform while resulting in a range of responses
(particularly useful in qualitative research). However, in addition to volunteer
bias, it is also prone to errors of judgment by the researcher and the findings,
whilst being potentially broad, will not necessarily be representative.
4. Snowball sampling

This method is commonly used in social sciences when investigating hard-to-


reach groups. Existing subjects are asked to nominate further subjects known to
them, so the sample increases in size like a rolling snowball. For example, when
carrying out a survey of risk behaviors amongst intravenous drug users,
participants may be asked to nominate other users to be interviewed.

Snowball sampling can be effective when a sampling frame is difficult to


identify. However, by selecting friends and acquaintances of subjects already
investigated, there is a significant risk of selection bias (choosing a large number
of people with similar characteristics or views to the initial individual identified).
Bias in sampling
Bias in sampling

1. Any pre-agreed sampling rules are deviated from

2. People in hard-to-reach groups are omitted

3. Selected individuals are replaced with others, for example if they are difficult
to contact

4. There are low response rates

5. An out-of-date list is used as the sample frame (for example, if it excludes


people who have recently moved to an area)
Organizing and Graphing Data
Organizing and Graphing Categorical Data

After categorical data has been sampled it should be summarized to provide


the following information:

1. Which values have been observed? (red, green, blue, brown, orange, yellow)
2. How often did every value occur?

Categorical data is usually summarized in a table giving the following


information:
• categories observed
• frequency, or number of measurements for each category
• relative frequency, or proportion of measurements for each category
• percentage of measurements for each category
bar graph
BAR GRAPH -

bar chart is used when you want to show a distribution of data points
or perform a comparison of metric values across different subgroups of
your data. From a bar chart, we can see which groups are highest or
most common, and how other groups compare against the others.
pie chart
Pie Charts - provide an alternative kind of graph for categorical data.
It is a circle divided into portions that represent the relative frequencies
or percentages of a population or sample belonging to different
categories is called a pie-chart.

How to create a pie chart:


• Draw a circle
• Calculate the slice size ( angle ) slice size=category relative frequency
· 360 (fraction of the circle for the category)
• use protractor to mark the angles
Relative Frequency Histograms

-The most common graph for describing numerical continuous data is


the histogram. It visualizes the distribution of the underlying variable,
that is: how many measurements are found where on the measurement
scale.
The first step into creating a histogram, is finding the frequency
distribution of the variable of interest.

Definition: A frequency distribution for quantitative data lists all the


classes and the number of values that belong to each class.
Features to check for in a histogram
1. center, where is the ”middle” of the data?
2. range, the data fall between which values
3. number of peaks: unimodal (just one peak), bimodal (often occurs if you
have observation from two groups (men, women)(two peaks), multimodal(more
than 2 peaks)
4. symmetry: if you can draw a vertical line so that the part to the left is a
mirror image of the part to the right, then it is symmetric.
5. nonsymmetric graphs are skewed. If the upper tail of the histogram
stretches out farther than the lower tail, then is the histogram positively
skewed, or skewed to the right.
6. Is the lower tail longer than the upper tail the histogram is negatively
skewed.
7. Check for outliers.
stem and leaf plot
A stem and leaf plot, or stem plot, is a technique used to classify
either discrete or continuous variables. A stem and leaf plot is used
to organize data as they are collected.

A stem and leaf plot looks something like a bar graph. Each
number in the data is broken down into a stem and a leaf, thus the
name. The stem of the number includes all but the last digit. The
leaf of the number will always be a single digit.
thankyou for
listening!!

You might also like