Chapter 2 Data Collection and Preparation
Chapter 2 Data Collection and Preparation
and preparation
Jhon Loyd D. Criste
METHODS OF
DATA COLLECTION
■ automated data collection functions built
into business applications, websites and
mobile apps;
■ sensors that collect operational data from
industrial equipment, vehicles and other
machinery;
■ collection of data from information services
providers and other external data sources;
■ tracking social media, discussion forums,
reviews sites, blogs and other online
channels;
■ surveys, questionnaires and forms, done
online, in person or by phone, email or
regular mail;
Ethical Considerations in
Data Collection
• Informed consent
• Voluntary participation
• Do no harm
• Confidentiality
• Only assess relevant components
Data Cleaning and
Preprocessing in Excel
Data cleaning is the process
of fixing or removing incorrect,
corrupted, incorrectly
formatted, duplicate, or
incomplete data within a
dataset.
Types of problems
1. Empty rows
Problem:
It breaks the information into multiple tables instead
of one single table
There shouldn't be any empty rows in table
Treatment:
Select the Entire column and then filter
Filter for empty cell in any column
Types of problems
2. Duplicate Data
Problem:
Entire Record is same
Treatment:
Remove Duplicate
Highlight using Conditional formatting
Filter Data using advance filtering
Types of problems
3. Data Types and Data Consistency
Problem:
Data spelled incorrect
Some columns may have inconsistent data type
Treatment:
Find and Replace
Text to Column
Types of problems
3. Data Types and Data Consistency
Problem:
Data spelled incorrect
Some columns may have inconsistent data type
Treatment:
Find and Replace
Text to Column