0% found this document useful (0 votes)
289 views

Data Management

This document provides an introduction to basic concepts in statistics including data collection, organization, presentation and analysis. It defines key terms, describes different data types and variables. Methods of collecting data are discussed including primary and secondary sources. Common ways to present quantitative data through tables, graphs like histograms and frequency polygons are also outlined. Measures of central tendency, dispersion and relative position are introduced.

Uploaded by

Milk Brother
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
289 views

Data Management

This document provides an introduction to basic concepts in statistics including data collection, organization, presentation and analysis. It defines key terms, describes different data types and variables. Methods of collecting data are discussed including primary and secondary sources. Common ways to present quantitative data through tables, graphs like histograms and frequency polygons are also outlined. Measures of central tendency, dispersion and relative position are introduced.

Uploaded by

Milk Brother
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 62

CHAPTER II

DATA MANAGEMENT
(STATISTICS)
Introduction

In this lesson, you will learn basic concepts in statistics. These


concepts include data gathering and organizing, representing
and interpreting data using graphs and charts, measures of
central tendency, measures of dispersion and measures of
relative position.
Learning Outcomes
At the end of the lesson, the students should have:

1. Differentiated the types of statistical variables;


2. Described the types of data;
3. Differentiated population from sample;
4. Characterized the different sampling techniques;
Learning Outcomes
5. Described the different methods of data collection;
6. Described different ways of data presentation;
7. Presented the frequency distribution in graphical forms; and
8. Differentiated the different measures of central tendency;
measures of dispersion and measures of relative position
Statistics defined:

 It is derived from the Latin word “status” which means


state.
 It is a body of knowledge that deals with the collection,
organization and presentation, analysis and interpretation
of data.
Four-fold function of Statistics
 collection – gathering of information or data
 organization and presentation – involves summarizing data or
information in textual, graphical or tabular forms.
 Analysis – describing data using statistical methods and
procedures
 Interpretation – making conclusions based on the analyzed data.
Definition of Terms:
1.Population – refers to a large collection of all objects, persons,
places or things under consideration or study.
2.Sample – a small portion or part of a population which is a
representative of a population.
3.Parameter – any numerical or nominal characteristic of a
population. It is a value or a measurement obtained from a
population. It is referred as true or actual value.
4. Statistic –it is an estimate of a parameter. It is any value or
measurement obtained from a sample.
5. Data – facts or sets of information or observation under study.
6. Variable – a characteristic or property of a population or sample which
makes the members different from each other.
 Dependent - a variable which is affected or influenced by another
variable.
 Independent - one which affects or influences the dependent variable.
COLLECTION OF DATA
A. Points to Consider when Collecting Data
11. Better results will be achieved if the researcher does the
measuring/counting instead of asking the respondent for the value.
2. The method of data collection used may expedite or delay the
process. Avoid a medium that would produce low response rates.
3. Ensure that the sample size is large enough for the required
purpose.
4. Ensure that the method used to collect the data actually results to
a sample that is representative of the population.
COLLECTION OF DATA
B. Sources of Data
1. Primary. Results of observations, interviews and experiments
 Data taken from government agencies, business
establishments, organizations, individuals who carry
original data or who have first hand information relevant
to a given data
2. Secondary. Books, magazines, encyclopedia, republished materials or studies
conducted by other individuals 
COLLECTION OF DATA

C. Ways of Collecting Data


1.Direct or Interview Method
 Researcher has a direct contact with the interviewee
 Researcher obtains needed data by asking questions and inquiries
from the interviewee
 Questions can be repeated, rephrased or modified for better
understanding of both parties, making answers to be more
accurate
 Costly , time consuming and can only be extended to a limited
number of individuals
Characteristics of a good question

 unbiased – should be worded not to influence the


respondent to answer in a certain way. It should be stated
in a neutral language with no element of pressure.
 clear & simply stated – easy to understand and more likely
to be answered truthfully.
 precise - Should indicate clearly the manner how the
answers must be given
COLLECTION OF DATA
2. Indirect or Questionnaire Method
 utilizes questions in written form to obtain information
 material could either be personally distributed or through mail
 more economical, hence could be extended to a greater number of
respondents
 clarification of responses could not be possible
 responses could be less accurate since respondents may not be
very accurate to write down vital information
COLLECTION OF DATA
3. Observation Method
This would be most appropriate when data needed pertains to behaviors of
individuals at a given time of occurrence of a given situation.
One limitation of its use is the fact that observation could only be made at the
time of occurrence of appropriate events
4. Use of Documents/ Registration method
This is most useful when data needed are required registries or are found in
existing records in offices
Economical for cost, time and effort
5. Method of Experimentation
Used to find out cause and effect relationships
PRESENTATION OF DATA
A. TEXTUAL METHOD
 This method uses paragraphs to present data.
 This is usually used when the data are purely qualitative or
when very few numbers are involved
 This involves enumerating the important characteristics, giving
emphasis on significant figures and identifying important
features of the data.
 This is not desirable when too many figures are involved as the
reader may fail to grasp the significance of some quantitative
relationships.
PRESENTATION OF DATA

B. TABULAR PRESENTATION

 This method uses tables to present data.


 This is particularly useful when the reader wants to
make comparisons and draw relationships.
 This method is more convenient and understandable
than textual method.
PRESENTATION OF DATA
Components of a Statistical Table
1. Table heading - this shows the table number and title. The table
number serves as the identity of the table while the title briefly
explains what is being presented.
2. The body – This is the main part of the table which contains the
quantitative information.
3. Stubs or row classifier – These are the classifications or categories
which are presented as values of the variable and they describe what
information are found in the rows.
PRESENTATION OF DATA
4. Box heads or column captions or column header– these are the
information that appear above the columns, which describes what
are found in each column.
5. Footnotes maybe placed immediately below the main part of the
table to explain details whenever necessary.
 6. Source note – This acknowledges the origin of the data and can be
placed beneath the footnote
PRESENTATION OF DATA
B. TABULAR PRESENTATION
Guidelines in the construction of a table

 The table should be self-explanatory.


 Title should be clear and descriptive – should give information about
what, where, how and when the data were taken.
 Each characteristic maybe summarized and compared separately using
percentages or any appropriate measure.
 If more than one information is available for each subject, several
columns may be constructed in one table. Each column should be
properly labeled.
 Footnotes should be placed at the bottom of the table to briefly explain
details of the information whenever necessary
PRESENTATION OF DATA

C. GRAPHICAL PRESENTATION OF DATA


 uses graphs to present data
 helps facilitate comparison and interpretation without going
through the numerical data.
 helps visualize certain properties and characteristics of the
data at a glance
PRESENTATION OF DATA
1.Histogram – is a graph represented by vertical or horizontal
rectangles whose bases are the class marks and whose heights are
the frequencies.
 class marks (or real limits at the end) are placed at the center of
the base of
 each rectangle.
 no gaps between bars
 lengths of the bars represent the magnitude of the quantities being
compared
PRESENTATION OF DATA
2. Frequency polygon – is a line graph whose bases are the
class marks and whose heights are the frequencies.
 additional points in both ends of the graphs are used to
close the figure.
 shows relationship between two or more sets of quantities.
 used to compare quantities
PRESENTATION OF DATA
3. Ogive – is a line graph representing the percentage cumulative
frequency distributions.
 the bases are the real limits and the heights are the pcf< for the
less than ogive and pcf> for the greater than ogive.
 The less than ogive is constructed by plotting the pcf< against
the upper real limits and which is used to estimate the number
of cases falling below any given value
 the greater than ogive is constructed by plotting the pcf<
against the lower real limits and is used to estimate the number
of cases falling above any given value
Types of Data

1.According to nature of collection


 Qualitative
 Quantitative
Discrete
Continuous data
2. According to scale of measurement
 Nominal (Real or artificial) - characterized by data that could be
labeled or categorized or named from the root word name. Data
cannot be arranged in an ordering scheme.
 Ordinal - involved data that may be arranged in some order but
differences between these data values either cannot be determined
or meaningless.
 Interval – Similar to ordinal but amounts of differences between
data can be determined. However zero has no meaning.
 Ratio – a modified interval level to include the meaning of zero as
a starting point. Differences and ratios are meaningful. Usually,
units of measure accompany the numerical values.
MEASURES OF CENTRAL TENDENCY

 averages or measures of position or location


➔ these are intended to describe the center or the middle of an ordered
set of data
➔ when the 3 measures are not equal,
 Median is between the mean and the mode
 Mode is the smallest if mean is greater than the median
 Mode is the largest when the mean is less than the median
  A. THE MEAN ( )
 most convenient measure
 most reliable since it varies less from sample to sample
 most stable with least probable error
 describes the center of gravity of a distribution.
 Sum of the deviations (distances/moments) of the scores from the
mean is zero.
A. THE MEAN ( )
➔ Affected by every single score in the data

➔ Sensitive to extreme scores

➔ Used when the scores are close to each other and when values between
or among scores follow the same pattern

➔ Used with interval or ratio data

➔ Obtained by getting the sum of all values divided by the number of cases
Formulas:

,
Formulas:
Formulas:
B. The Median ( )
➔ a score point which divides a ranked distribution into two equal
parts
➔ it is the value below which lies 50% of the data.
➔ Appropriate when there are values which are relatively large or
relatively small compared to most of the scores
➔ Appropriate when open-ended intervals (for grouped scores) are
involved
➔ Associated with ordinal data
➔ Best used when a distribution is positively or negatively skewed
Formulas:

➔ For ungrouped data


◆ Data are to be arranged according to magnitude then get the
middle score
◆ median is the [(n+1)/2]th score.
Formulas:
For grouped data

Where: l l = lower real limit of the step where n/2 lies


n = total number of cases
Fup= cumulative frequency up  n / 2
f = frequency of the step containing n/2
i = interval
 
C. The Mode ( )
➔ the score which occurs the most number of times
➔ preferred when we want a quick estimate of the average
➔ a bimodal group means that two scores have the same frequencies which are the
highest.
➔ Not influenced by extreme values
➔ Does not indicate anything about the other values in the data
➔ Used for nominal data
Formulas:

➔ Crude Mode - Score with highest frequency

 In case of grouped data, it represents the midpoint of the class with


the highest frequency

➔ True mode = 3(median) – 2(mean)


Formulas:
➔ Modal value:

where:
➔l l = lower real limit of the modal class
➔f = frequency of the modal class
➔f1 = frequency of the class preceding the modal class
➔f2 = frequency of the class following the modal class
➔i = interval
Measures of Position

Fractiles

➔ used to discriminate a group of scores from another group in the


same data
➔ used to divide distribution into equal parts
The Quartiles

➔ these are score points which divide a distribution into 4 equal parts

➔ Q1 – 25% fall below the first quartile

➔ Q2 – same as the median – 50% fall below Q2

➔ 75% are less than the Q3


Formula

➔ Where:
  = quartile level
 Fup = fup  n/4
 F = frequency of the step containing n/4
 i = interval
 l.l. = lower real limit of the class containing n/4
The Dec il es

➔ these are score points which divide a


distribution into 10 equal parts

Where:  = decile level


 Fup = fup  n/10
 F = frequency of the step containing n/10
 i = interval
 l.l. = lower real limit of the class containing n/10
F. The Percentiles
➔ these are score points which divide a distribution into 100 equal
parts
➔ They are generally used to characterize values according to the
percentage below them.

Where:
  = percentile level
 Fup = fup  n/100
 F = frequency of the step containing n/100
 i = interval
 l.l. = lower real limit of the class containing n/100
MEASURE OF DISPERSION/ VARIABILITY
MEASURES OF VARIABILITY

➔ These are measures that describe how the


scores are spread out along the scale of the
distribution.
➔ They indicate the degree of spread or how far
apart the observations are.
➔ Used when the central measures are almost
alike
A. RANGE ( R )

➔ difference between the highest value and the lowest value in the data
➔ for grouped data, the range is estimated by subtracting the lower real
limit of the lowest class interval from the upper real limit of the
highest class interval
➔ describes how far the highest value is from the lowest value but does
not tell anything about the scores between the two extreme values
➔ easily determined.
B. INTERQUARTILE RANGE (IR)
AND QUARTILE DEVIATION (QD)
➔ measures are generally more desirable than the range when the
distribution is truncated or skewed or when the median is the only
measure of central tendency that is available

➔ the IR indicates the distance between the two values which determine
the middle 50% of all the observations within the distribution.
Formulas

Interquartile Range Quartile Deviation


➔ IR = Q3 – Q1 QD = ½ (Q3 – Q1)

Where :
Q3 – 3rd quartile
Q1 – 1st quartile
C. MEAN DEVIATION or AVERAGE DEVIATION

➔ takes into consideration the deviations of the individual scores from


an average.
➔ more accurate than the first two given measures
➔ used in determining the extent of the differences or how compact the
group is on a certain measure.
➔ obtained by getting the averages of the absolute deviations from the
mean.
Formulas

➔ ungrouped
For   data:

For frequency distributions,

where :
➔ X - individual score or the midpoint
➔ - mean
➔ N – number of cases
D. VARIANCE AND STANDARD DEVIATIONS

➔ variance is the average of the squared deviations from the mean


➔ sample variance is denoted by s2 while population variance is
denoted by σ2
➔ not very popular because of its unit which is squared
➔ (n-1) is used as a divisor in order to remove the bias normally
associated with s2 whenever it is used as an estimator of σ2, that is, it
tends to be closer to the population variance when the divisor is (n
-1)
Formulas

where:
➔ μ – population mean
➔ N – population size
VARIANCE AND STANDARD DEVIATIONS

➔ the standard deviation is the square root of the variance


➔ the most stable and most reliable measure of variation because it is
affected by the value of each observation
➔ when the computed SD is small, it means that the values are
concentrated near the mean. Otherwise, they tend to scatter widely
about the mean.
➔ A better distribution is one in which the SD is small
Formulas:
➔  
standard score using deviation, d = x -

Standard deviation for raw score


VARIANCE AND STANDARD DEVIATIONS

➔  2. Standard deviation from a frequency distribution using coding method

 Set up the table with the needed entries.


 Fill up x’ laying off 0 for an assumed step.
 Fill up fx’ by multiplying f and x’. Find .
 Fill up fx’2 by multiplying fx’ and x’. Find .
MEASURES OF RELATIVE DISPERSION

➔ Useful when quantities to be compared have different units.

➔ Useful when the purpose it to reflect how large the variation is relative to the
average.
COEFFICIENT OF VARIATION

➔ lesser CV means that the set of data are relatively less scattered
about the mean than a distribution with a higher CV

➔ useful when means and standard deviations are different

◆ CV = (standard deviation ÷ mean ) x 100%


COEFFICIENT OF QUARTILE DEVIATION

➔ Useful when the distribution is an open-ended interval

➔ Defined in terms of the first and third quartiles


STATISTICAL DESCRIPTIONS
MEASURES OF SKEWNESS
 A distribution that is not symmetrical is said to be skewed
 Skewness refers to the degree of symmetry or asymmetry of a distribution
 Negatively skewed (skewed to the left) – if the distribution tails off to the
left. Otherwise, it is positively skewed (skewed to the right).
 Direction of skewness is determined by the relationship between the mean
and the median
 mean > median, then it is positively skewed
 mean < median, negatively skewed
 Coefficient of skewness indicates both the direction and magnitude of the
skewness of data.
 the closer the coefficient of skewness is to zero, the less skewed is the
distribution
Formulas
MEASURE OF KURTOSIS
Kurtosis refers to the peakedness or flatness of a distribution

 Mesokurtic – normal distribution (Ku = 3)


 Leptokurtic – more peaked than the normal ( Ku >3)
 Platykurtic – flatter than the normal (Ku < 3)
Formulas

Using Quartile Deviation

You might also like