Data Preparation & Cleaning
Data Preparation & Cleaning
Welcome to
class!
Today's Agenda
•Data Preparation
•Data Cleaning
•Sort and filter
•Conditional formatting-Text to Column
•Removing Duplicates
•Data Validation
Business Analytics
Objectives inform students the learning Rules provide the structure necessary for
outcomes of the class. What will they an engaging and productive class. Keep it
know? What will they be able to do? Why is simple and easy to follow. It can be general
this important to know? It's an effective to cover different situations or very specific
way to assess their learning progress. to your students.
Business Analytics
Data
Preparation
Brief Introduction
Data Preparation
Why we do Data Preparation? To make Data of good quality and usability of that data to ensure-Consistency, Completeness
and Ready data for analysis.
• Step 1. Data
collection.
• Step 2. Data discovery
and profiling.
• Step 3. Data
cleansing.
• Step 4. Data
structuring.
• Step 5. Data
transformation and
enrichment.
• Step 6. Data
validation and
publishing.
Data Preparation and Cleaning
Text to Columns
This feature splits text into separate columns
based on a delimiter.
Steps:
1.Select the column with the data you want
Let's to split.
2.Go to Data > Text to Columns.
Solve 3.Choose delimiter (e.g., comma, space, tab)
Together or fixed width.
Example:
Split "Customer Name" into "First Name" and
"Last Name" if stored as “HIMANSHI VERMA."
Filter and sorting using Ms Excel
Sorting and filtering help you organize and view specific data subsets efficiently.
• Sorting:
Sort data based on one or more columns, such as "Order Date" or "Sales" (ascending or descending).
Multi-level sorting: For example, sort first by "Region" and then by "Sales."
• Filtering:
Apply filters to display specific records based on criteria.
Example: Show orders where "Sales" exceed $500 or "Category" is "Furniture."
• Interactive Example:
Use Excel’s Sort & Filter options to:
Sort "Sales" in descending order.
Filter "Region" to display only "Central."
CONDITIONAL FORMATTING
Removing
duplicates
Data duplication creates .
Steps to remove duplication in
unnecessary cluttering in the data Ms Excel
sets and make it difficult for the
Step 1 Select the cell range that
user to extract meaningful insights.
In addition to these values can has duplicate
.
values.
disturb the working of formulas Step 2 Click on Data tab, select
that can lead to inaccurate result Remove Duplicates and then
leading to compromise with the
Step 3 Under Column tab check or
data integrity.
Ms Excel has a range of tools that uncheck the column,where the
are designed to address the duplicates are to be removed.
problem of duplication. Step 4. Click OK.
Removing Duplicates
Duplicate data can distort analysis. Removing duplicates ensures data
integrity.
• Steps:
1.Select the entire dataset or relevant columns.
2.Go to Data > Remove Duplicates.
3.Choose columns to identify duplicates (e.g., "Order ID" or "Customer
Name").
• Example:
• Remove duplicate entries in "Order ID" while retaining unique record s
Business Analytics Data Validation
Data validation is an important feature
of Ms Excel.
It enables the user to define the type
of data that can be entered in each
cell of a worksheet.
It helps a user by restricting entries.
In selected cells.
Data validation allow users to set a
validation rule to the worksheet. For
example, a user can set validation rule
to enter values between 0 to 10 or
enter a name having less than 15
alphabets.
Business Analytics