0% found this document useful (0 votes)
1 views

Session 01

The document outlines a course on Data Processing, detailing assessment methods, definitions of data and information, and the importance of data processing. It describes the data processing cycle, including stages such as collection, preparation, input, processing, output, and storage, along with the significance of data coding. Additionally, it provides practical tasks related to data coding and understanding the differences between data and statistics.

Uploaded by

ongarobenson24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Session 01

The document outlines a course on Data Processing, detailing assessment methods, definitions of data and information, and the importance of data processing. It describes the data processing cycle, including stages such as collection, preparation, input, processing, output, and storage, along with the significance of data coding. Additionally, it provides practical tasks related to data coding and understanding the differences between data and statistics.

Uploaded by

ongarobenson24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Introduction to Data Processing

(STT05102)

Course Instructor: Eliah kazumali


Email: [email protected]
Eastern Africa Statistical Training Centre

SESSION 01
Method of Assessment:
 Tests 10%
 Assignment 10%
 Practicals 20%
 Semester Exam 60% ?
References
 Remenyi, D., Onofrei, G., & English, J. (2009). An
introduction to statistics using Microsoft Excel. Academic
Conferences Limited.
 Rose S, Spinks N, Canhoto, A.I (2015) .An introduction to
using Microsoft Excel for quantitative data analysis.
 CSPro User's Guide available at
https://www2.census.gov/software/cspro/documentation/cspr
o75.pdf
Definition of terms
What is Data?
 A collection of text, numbers and symbols with no meaning.
• Data therefore has to be processed, or provided with a context,
to have meaning.
 Examples
 3, 6, 9, 12
 161.2, 175.3, 166.4, 164.7, 169.3
 42, 63, 96, 74, 56, 86
 Yes,Yes, No,Yes, No,Yes, No,Yes
 These are meaningless sets of data. They could be the first
four answers in the 3 x table, the heights of
students but without a context we don’t know.
 None of the above data have meaning until they are given a
context and processed into useable form (Information).
Example 2

Data 42, 63, 96, 74, 56, 86

Context Scores in a course by six


students
Processing

Information ???
Data vs Information Example 1

Yes, Yes, No, Yes, No, Yes, No, Yes,


Data No, Yes, Yes

Responses to a research question


Context – “Would you stay in EASTC
hostel at 20,000 Tsh a year?”
Processing

Many people can stay in EATSC


Information hostel for 20,000 Tsh a year
Data Processing
 Data processing is the process of gathering and
manipulating raw data to produce useful
information.
 Processed data is often in form of tables,
diagrams, and reports.
 Data processing may involve various
processes/operations such as;
• Sorting
• Data Aggregation
• Analysis
• Classification
• Data Validation
• Data capture
• Data coding
• Data cleaning etc.

 Data can be collected from various sources such


as from survey, experiment, archival etc.,
and entered into a computer where they can be
processed to produce information (output).
Data Capture
 Data capture is the process of converting data from
questionnaires to an electronic file.
 Data capture methods are very expensive in terms
of both staff and time.
 Recently, computer assisted personal interviewing
(CAPI) which involve the interviewer entering data
directly into a computer terminal rather than onto a
form is used
Data coding
 Data coding consists of labelling the responses to questions
in a unique and abbreviated (using numerical codes) way in
order to facilitate data entry and manipulation.
 Codes should be formulated to be simple and easy, for
example, if Question 1 has four possible types of responses
then those four responses could be given the codes 1, 2, 3,
and 4.
 The advantage of coding is the simplistic storage of data as a
few-digit code, compared to lengthy alphabetical descriptions
which almost certainly will not be easy to categorize.
Coded and uncoded responses
 What sports do you participate in?______________
Responses can be:……….
 What sports do you participate in
1. Football
2. Netball
3. Basketball
Importance of data processing
 Facilitate correct decision making
 Report Making is simplified
 More accurate and reliable
 Storage reduction etc
Data Processing stages/cycle
The data processing cycle is the set of operations used to
transform data into useful information
 Collection of data
 Preparation of the data into a format suitable for data
entry, as well as error checking
 Entry of the data into the system, which may involve
manual data entry, scanning, machine encoding, and
so forth
 Processing of the data with computer programs
 Transmitting the resulting information to the user,
typically via screen or printed report, so that it can
be acted upon
 Storing the input data and output information for
future use
1) Collection
 Data is collected from various different resources.
 The first stage of the cycle, and is very crucial, since the
quality of data collected will impact heavily on the output
(garbage in, garbage out). The collection process needs
to ensure that the data gathered are both defined and
accurate, so that subsequent decisions based on the findings
are valid.
 Some data collection techniques include census, survey,
experiments and administrative data.
2) Preparation
 Once the data is collected, it then enters the data preparation
stage. At this stage, raw data is diligently checked for any
errors.
 Selecting only the data you need to use, dumping the data
that isn't complete or is not relevant.
 The purpose of this step is to eliminate bad data (redundant,
incomplete, or incorrect data) to create high-quality data.
3) Input
 The task where verified data is coded or converted into
machine readable form so that it can be processed
through a computer.
 Data entry is done through the use of a keyboard, scanner, or
data entry from an existing source.
 This time-consuming process requires speed and accuracy.
Due to the costs, many businesses are resorting to outsource
this stage.
4) Processing
 In this stage, raw facts or data is converted to meaningful
information.
 Data is subjected to various means and methods of
manipulation, the point where a computer program is being
executed, and it contains the program code and its current
activity. (SPSS, Stata, excel, R, Matlab, SAS, etc
 Many software programs are available for processing large
volumes of data within very short periods.
5) Output and interpretation
 The output/interpretation stage is the stage at which data is
finally usable to non-data scientists. It is translated,
readable, and often in the form of graphs, videos, images,
printed reports etc.).
 Output need to be interpreted so that it can provide
meaningful information that will guide future decisions
6) Storage
 Last stage in the data processing cycle, where data,
instruction and information are held for future use.
 The importance of this cycle is that it allows quick access and
retrieval of the processed information, allowing it to be
passed on to the next stage directly, when needed.
Task 01
1. How is Data different from statistics.
2. Why stages in data processing referred to as Data
Processing Cycle?
3. Explain why data coding is important?
Practical 01: Data Coding
 Use the datasets given to code

You might also like