Module 7b
Module 7b
Data Preparation
3
Data Processing
» DATA PROCESSING-Processing data
involves a number of closely related
operations which are performed with
the purpose of summarizing the
collected data and organizing these in a
manner that they answer the research
questions (objectives)
4
2
• Editing
3
• Coding
4
• Classification
5
• Tabulations
6
• Graphical representation
7
• Data cleaning
8
• Data Adjusting
5
Questionnaire Checking
» The initial step of data
preparation is questionnaire
checking.
» It involves the examination of all
questionnaires for their
completeness and interviewing
quality.
6
Questionnaire Checking
» Questionnaire is not acceptable
if:
a) incomplete
b) answered with inadequate
knowledge
c) respondent could not
understand the questions
Editing
» Process of examining the
collected raw data to detect
errors and omissions and to
correct when possible.
» Field Editing
» Central Editing
Editing
» The editor should familiarize with the copy of instructions given to the
interviewers.
» The original entry, if found incorrect, should not be destroyed or erased.
On the other hand it should be crossed out in such a manner that it is still
eligible .
» Any, modification to the original entry by the editor must be specifically
indicated.
» All completed questionnaire must bear signature of the editor and the
date.
» Incorrect answer to the questions can be corrected only if the editor is
absolutely sure of the answer, otherwise leave it as such.
» Inconsistent, incomplete or missing answers should not be used.
» Sure that all numerical answers are converted to same units
Coding
» This process of assigning numerals or
symbols to the responses is called
coding;
» It facilitates efficient analysis of the
collected data and helps in reducing
several replies to a small number of
classes.
Coding
» For Example:
» What is your job classification?
Management, Technical, Administrative,
Clerical
» Code numbers can be given as
Management =1
Technical=2
Administrative=3
Clerical=4
Missing =9
Classification
» process of arranging data in
groups or classes on the basis
of common characteristics
depending on the nature of
phenomenon involved.
Types of Classification
» A.Classification according to external characteristics.
» Classifications on geographical basis-In this type of
classification, the data that are collected from
different places are placed in different classes.
» E. g District Sales (Rs in lakhs)
» • Mumbai 400
» • Pune 250
» • Nagpur 200 etc
» Classification on periodical basis (chronological
classification)-In this type of classification, the data
belonging to a particular time or period are put under one
class. This type of classification is based on period.
» E.g year Sales(Rs In lakhs)
» 2019 500
» 2020 400
» 2021 300
Types of Classification
» B. Classification according to internal
characteristics
» Classification according to internal characteristics-Data
may be classified either according to attributes or
according to the magnitude of variables
» Classification according to Attributes-In this type data
are classified on the basis of common characteristic.
» E.g descriptive such as literacy, sex, religion etc. or
numerical such as weight, height, income etc
» Simple Classification-If the classification is based on one
particular attribute only it is called simple classification.
Eg; classification on the basis of sex
» Manifold Classification-If the classification is based on more than
one or several attribute it is called manifold or multiple
classifications, in this data are classified in several groups.
» For Example
Literate
Male
Illiterate
Population
Literate
Female
Illiterate
Classification according to
variables
» Classification according variables- Here the data are classified
to some characteristics that can be measured. Data are
classified on the basis of quantitative characteristics such as
age, height; weight etc.
» quantitative variables are grouped into
» a) Discrete variable- the variables can take only exact value, it
is called discrete variable.
E.g 20, 25, 30, 35, 40, 45, 50
» b) Continuous variables-the variables that can take any
numerical value within a specified range are called continuous
variable.
E.g 10-20, 20-30
Characteristics of an ideal
classification
» Unambiguity- Classification should be unambiguous. The
various classes should be defined properly.
» Stable- it should not change from enquiry to enquiry
» Flexibility- classification should have the capacity of
adjustment to new situations and circumstances.
» Homogeneity- each class should contain homogenous items.
» Suitability- it should be suitable to objects of any statistical
enquiry.
» Exhaustiveness- there should be no item which does not find
a class
Tabulation
» It is an orderly arrangement of data in
rows and columns. It is defined as the
“Measurement of data in columns and
rows. It is a stage between classification
of data and final analysis.
Objectives of Tabulation
» To clarify the purpose of enquiry
» To make the significance of data clear
» To express the data in least possible
space
» To enable comparative study
» To eliminate unnecessary data
» To help in further analysis of the data
Graphical
Representation
» one of the methods presenting data
in which simplifies the complexity of
quantitative data and make them
easily intelligible.
» Types of graphs
1. Bar chart
2. Line chart
3. Pie chart
Graphical
Representation
» E.g.
» Suppose that the sales of apopular drink in
the year 2019-20 in five geographical
regions denoted as A,B,C,D,E are 15245,
23762,9231,14980 and
12387,respectively ,measured in 10,000
USD.
Data Cleaning
» Data cleaning is the process of
fixing or removing incorrect,
corrupted, incorrectly
formatted, duplicate, or
incomplete data within a
dataset.
Data Adjusting
» Weight assigning
» Variable Respecification
» Scale Transformation
Problems in Data
preparation
» Don’t Know responses
» Use of percentages
» Missing values
» Outliers
25
THANKS!
Any questions?