Data Analysis
Data Analysis
Sukhbir Kaur
Processing of Data
Editing
– Field editing-of the forms. For example, fields left out,
writing not clear, abbreviations used.
– Central editing-after all the forms have been
collected.For example, entry in the wrong place, entry
recorded in months when it should be weeks,etc.
Coding-assigning numerals to answers
Classification
Tabulation
Using percentages
Presentation of Data
Classification of data
– Geographical i.e region wise. For example, population
region wise.
– Chronological i.e time wise. For example, sales year
wise.
– Qualitative i.e according to attributes. For example
attribute education-primary, middle, higher secondary,
university etc. Population on basis of employment-
employed and unemployed.
– Quantitative i.e according to magnitudes. For example,
classifying employees as per salaries.
Presentation of Data
Quantitative data can be further classified as discrete or
continuous.
– Discrete data: limited values of variable. For example,
no. of employees, no. of machines
– Continuous data: can take all values of the variable. For
example, weight, distance, volume
No. of No. of Age No. of
employees companies (years) workers
110 25 Continuous 20-25 15
Discrete frequency
frequency 120 35 25-30 22
distribution
distribution 130 70 30-35 38
140 100 35-40 47
150 18 40-45 18
160 12 45-50 10
Construction of a discrete frequency
distribution
Sample of 50 families was surveyed to find the number of
children per family. Make a discrete frequency distribution
table for the data below:
3 2 2 1 3 4 2 1 3 4 5 0 2
1 2 3 3 2 1 1 2 3 0 3 2 1
4 3 5 5 4 3 6 5 4 3 1 0 6
5 4 3 1 2 0 1 2 3 4 5
Construction of a discrete frequency
distribution
No. of children No. of families Frequency
(counts /tally
marks)
0 4
1 9
2 10
3 12
4 7
5 6
6 2
Construction of a continuous
frequency distribution
Class limits:indicate lowest and highest value that
can be included in the class. For example, 60-69,
60 is the lower limit and 69 the upper.
Class intervals: width of a class=upper limit-lower
limit
Class frequency: no. of observations falling within
a particular class.
Class mid point:value lying midway between
upper and lower class limits.
Construction of a continuous
frequency distribution
Type of class intervals
Exclusive: in this the upper limit in each class is excluded.
Sales No. of companies
20-25k 20
Firm with 25k sales will be
25-30k 28 included in the class 25-30k
30-35k 35
100 99 percentile
90
80 Q3
Cumulative 70 8th decile
frequency 60 Median
50
40
Q1
30
20
10
0
Coefficient of MAD=MAD/Mean
Measures of Variation and Skewness
Quartile Deviation = (Q3-Q1)/2
Coefficient of Q.D=(Q3-Q1)/(Q3+Q1)
This is superior as it is based on middle 50% observations
and ignores the extreme values. It is the only measure for
open ended distribution.
<Exercise> A survey of domestic consumption of electricity gave
following distribution of units consumed. Compute quartile deviation
and its coefficient.
No. of units No. of No. of units No. of consumers
consumers
Below 200 9 800-1000 45
200-400 18 1000-1200 38
400-600 27 1200-1400 20
600-800 32 1400& above 11
Measures of Variation and Skewness
Standard deviation
σ = √ {∑(x-x)2}/N
Variance = σ 2
σ = √ {{∑fd2}/N} - {{∑fd/N}2} * i
Symmetrical
distribution
1.Distribution of travelling allowance to salesmen. Compute
coefficient of skewness and comment on its value:
T.A Rs No. of T.A Rs No. of
1000-1200 14 1800-2000 15
1200-1400 16 2000-2200 7
1400-1600 20 2200-2400 6
1600-1800 18 2400-2600 4