0% found this document useful (0 votes)
5 views

Data Preparation & Cleaning

The document outlines the key concepts and steps involved in data preparation and cleaning, including sorting, filtering, conditional formatting, removing duplicates, and data validation in Excel. It emphasizes the importance of ensuring data quality for effective analysis and provides detailed instructions for various Excel functions. The agenda also includes class objectives to enhance student learning outcomes in business analytics.

Uploaded by

Tannu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Data Preparation & Cleaning

The document outlines the key concepts and steps involved in data preparation and cleaning, including sorting, filtering, conditional formatting, removing duplicates, and data validation in Excel. It emphasizes the importance of ensuring data quality for effective analysis and provides detailed instructions for various Excel functions. The agenda also includes class objectives to enhance student learning outcomes in business analytics.

Uploaded by

Tannu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Data Preparation and Cleaning, Sort and filter, Conditional

formatting, Text to Column, Removing Duplicates, Data Validation


Business Analytics

Welcome to
class!

Today's Agenda

•Data Preparation
•Data Cleaning
•Sort and filter
•Conditional formatting-Text to Column
•Removing Duplicates
•Data Validation
Business Analytics

Class Objectives and Rules


Expectations and outcomes

Objectives inform students the learning Rules provide the structure necessary for
outcomes of the class. What will they an engaging and productive class. Keep it
know? What will they be able to do? Why is simple and easy to follow. It can be general
this important to know? It's an effective to cover different situations or very specific
way to assess their learning progress. to your students.
Business Analytics

Data
Preparation

Brief Introduction

The raw data collected from the


audience often have missing values,
errors or other inaccuracies in it. The
problem arises due to collection of
data from different sources that may
ome in different format and needs to
be transformed into common format
o run analytics.
Business Analytics

Data Preparation
Why we do Data Preparation? To make Data of good quality and usability of that data to ensure-Consistency, Completeness
and Ready data for analysis.

Concept and Definition Data Applications of data


preparation preparation
comprises of
 Data preparation is the various steps
process of gathering,
like 1.Enhanced result
combining, structuring and reliability.
1.Data 2. Identification and
organising data using
various analytical or cleaning resolution of data issues
business intelligence tools 2.Data 3. Informed decision
and techniques. making
integration 4.Cost reduction
 Data preparation act as a
3.Data transfer 5. Time and resource
Data Preparation Process

• Step 1. Data
collection.
• Step 2. Data discovery
and profiling.
• Step 3. Data
cleansing.
• Step 4. Data
structuring.
• Step 5. Data
transformation and
enrichment.
• Step 6. Data
validation and
publishing.
Data Preparation and Cleaning

Steps in Data Cleaning:


 Identify Missing Data: Check for blanks or null values and decide how to handle them (e.g., filling,
interpolation, or deletion).
 Remove Irrelevant Data: Identify columns or rows that are not relevant to your analysis and
remove them.
 Handle Outliers: Detect and address outliers to prevent skewed results.
 Standardize Data: Ensure consistent formats for dates, text (e.g., capitalization), and numerical
data.
 Validate Data Accuracy: Cross-check data for accuracy against source documents.
• Example Using the Attached File:
 Check for missing values in key columns like "Order Date," "Sales," or "Customer Name."
 Ensure consistency in fields such as "Category" and "Region."
Business Analytics

Text to Columns
This feature splits text into separate columns
based on a delimiter.
Steps:
1.Select the column with the data you want
Let's to split.
2.Go to Data > Text to Columns.
Solve 3.Choose delimiter (e.g., comma, space, tab)
Together or fixed width.
Example:
Split "Customer Name" into "First Name" and
"Last Name" if stored as “HIMANSHI VERMA."
Filter and sorting using Ms Excel

Filter means to filter out unnecessary data according to one’s


requirement.
Step 1. Click anywhere inside the table for which filter is to be applied.
Step 2. Click on Data tab--Filter option (drop down On the top button of
each column appear)
Step 3. Click on the drop down of that column for which the data is to be
filtered (checkboxes consisting distinct values of that particular column
will appear )
Step 4. Choose that value for which the filter is required
SORTING IN MS-EXCEL
How to use sort function using Ms Excel?
 Sorting means to sort the data or value in increasing or decreasing order.
 The sort function organised the data in a way that facilitate the user to
understand it easily.
Types of sorting
A. Single level sorting
B. Multi level sorting
C. Custom sorting.
 Sorting can be done alphabetically
 Sorting can be done numerically.
 Sorting can be done on the base of data and time.
STEPS IN SORTING

Steps for sorting.


• Step 1. Select the column from table whose data is to be
sorted.
• Step 2. Click on Data tab
• Step 3 Under Data tab -SORT (a dialogue box will appear).
• Step 4 Select two fields (column, Sort on).
• Step 5 For the third field (order)- select custom list
option.
• Step 6 click OK.
Sort and Filter

Sorting and filtering help you organize and view specific data subsets efficiently.
• Sorting:
 Sort data based on one or more columns, such as "Order Date" or "Sales" (ascending or descending).
 Multi-level sorting: For example, sort first by "Region" and then by "Sales."
• Filtering:
 Apply filters to display specific records based on criteria.
 Example: Show orders where "Sales" exceed $500 or "Category" is "Furniture."
• Interactive Example:
 Use Excel’s Sort & Filter options to:
 Sort "Sales" in descending order.
 Filter "Region" to display only "Central."
CONDITIONAL FORMATTING

• Conditional formatting is a special feature of Excel used to find


unique and duplicate values by formatting the cells.
• Conditional formatting allows the user to format the cell and
their data based on some conditions specified by the user.
• Conditional formatting is used to highlight cells that contain
values which meet a certain condition.
• Conditional formatting enables various features to the user to
make the data more informatic and readable.
Steps to do conditional formatting

Step 1 Select the column to which you want to apply


conditional formatting.
Step 2. Navigate to the Home tab and click on the
Conditional Formatting option.
Step 3 Choose the new rule option from drop down
menu.
Step 4. Create the desire formatting rule.
Step 5. Click OK to confirm the rule
Conditional Formatting

Conditional formatting visually highlights data based on specific conditions.


• Common Uses:
 Highlight sales greater than a threshold (e.g., $1,000).
 Use color scales to represent sales performance across regions.
 Identify duplicate entries in "Order ID."
• Interactive Example:
1.Select the "Sales" column.
2.Apply conditional formatting > Highlight Cells Rules > Greater Than.
3.Use color gradients to visualize sales performance.
Business Analytics

Removing
duplicates
 Data duplication creates .
Steps to remove duplication in
unnecessary cluttering in the data Ms Excel
sets and make it difficult for the
Step 1 Select the cell range that
user to extract meaningful insights.
 In addition to these values can has duplicate
.
values.
disturb the working of formulas Step 2 Click on Data tab, select
that can lead to inaccurate result Remove Duplicates and then
leading to compromise with the
Step 3 Under Column tab check or
data integrity.
 Ms Excel has a range of tools that uncheck the column,where the
are designed to address the duplicates are to be removed.
problem of duplication. Step 4. Click OK.
Removing Duplicates
Duplicate data can distort analysis. Removing duplicates ensures data
integrity.
• Steps:
1.Select the entire dataset or relevant columns.
2.Go to Data > Remove Duplicates.
3.Choose columns to identify duplicates (e.g., "Order ID" or "Customer
Name").
• Example:
• Remove duplicate entries in "Order ID" while retaining unique record s
Business Analytics Data Validation
Data validation is an important feature
of Ms Excel.
 It enables the user to define the type
of data that can be entered in each
cell of a worksheet.
 It helps a user by restricting entries.
In selected cells.
 Data validation allow users to set a
validation rule to the worksheet. For
example, a user can set validation rule
to enter values between 0 to 10 or
enter a name having less than 15
alphabets.
Business Analytics

Steps for data validations are:


Step 1. Select the cell
Step 2. Go to data tools ,choose data validation.
Step 3. In data validation ,click on the settings tab , then under ‘Allow option, select
options like For example: whole numbers, decimal, list, date, time, time, text length,
Custom.
Step 4. In the Data Validation box under the Settings tab click on “data option” ,select
condition
-If custom is chosen, then choose the formula according to which data needs to be
validated.
-Click the Input Message tab and write a custom message that will appear on entering
the wrong data.
-Select the show input message when cell is selected checkbox to display the
message when the user selects or hovers over the selected cell.
Step 5. Click OK. Data validation is now ready.
Data Validation
Data validation ensures data entered into a dataset meets specific criteria.
• Applications:
 Restrict entries to predefined lists (e.g., "Region" must be "North," "South," etc.).
 Set numerical limits (e.g., "Sales" must be between $0 and $10,000).
 Use formulas to create custom rules.
• Steps:
1.Select a cell or range.
2.Go to Data > Data Validation.
3.Define criteria (e.g., allow list, whole number, or custom formula).
• Interactive Example:
 Validate "Category" to allow only "Office Supplies," "Furniture," or "Technology."
.

You might also like