Data Cleaning
1. Handling Duplicates:
Identify Duplicates:
o Select the range of cells you want to check.
o Go to Home > Conditional Formatting > Highlight Cells Rules > Duplicate
Values.
o Choose a formatting style to highlight the duplicates. This helps you review them
before removing.
Remove Duplicates:
o Select the range of cells.
o Go to Data > Data Tools > Remove Duplicates.
o In the dialog box, select the columns you want to check for duplicates. If you
want to consider a row as a duplicate only if all selected columns have the same
values, keep all relevant columns checked.
o Click OK. Excel will tell you how many duplicate values were removed.
2. Dealing with Extra Spaces and Non-Printing Characters:
Remove Leading, Trailing, and Extra Spaces:
o Use the TRIM() function in a new column. For example, if your text is in cell A1,
enter =TRIM(A1) in another cell.
o TRIM() removes all spaces from text except for single spaces between words.
o Copy the results and paste them as values over the original column if you want to
replace the original data.
Remove Non-Printing Characters:
o Use the CLEAN() function. For instance, =CLEAN(A1) will remove non-printable
characters from the text in A1.
o You can combine TRIM() and CLEAN() like this: =TRIM(CLEAN(A1)) to handle
both issues.
3. Correcting Text Case:
UPPERCASE: Use the UPPER() function (e.g., =UPPER(A1)).
lowercase: Use the LOWER() function (e.g., =LOWER(A1)).
Proper Case (Capitalize First Letter of Each Word): Use the PROPER() function (e.g.,
=PROPER(A1)).
Apply these formulas to a new column and then copy and paste the values over the
original column if needed.
4. Splitting and Combining Text:
Split Text to Columns:
o Select the column you want to split.
o Go to Data > Data Tools > Text to Columns.
o Choose Delimited if your data is separated by characters like commas, spaces, or
tabs, or Fixed width if the data is aligned in columns with consistent spacing.
o Follow the steps in the wizard to specify the delimiters or fixed widths.
Combine Columns:
o Use the CONCATENATE() function or the & operator.
o For example, to combine the text in cells A1 and B1 with a space in between, you
can use =CONCATENATE(A1," ",B1) or =A1&" "&B1.
5. Handling Missing Values (Blanks):
Identify Blank Cells:
o Select the range of data.
o Press Ctrl + G (or Cmd + G on a Mac) to open the Go To dialog box.
o Click Special, select Blanks, and click OK. This will select all the blank cells in
your selected range.
Fill Blank Cells:
o Once the blank cells are selected, you can type a value (e.g., "0", "N/A") and
press Ctrl + Enter (or Cmd + Enter on a Mac) to fill all selected blank cells with
that value.
o You can also use formulas to fill blanks based on surrounding data (e.g., fill with
the value above: select the first blank cell, type =A1 if the value above is in A1,
and then press Ctrl + Enter to fill all selected blanks).
6. Finding and Replacing Data:
Press Ctrl + H (or Cmd + H on a Mac) to open the Find and Replace dialog box.
Enter the text or value you want to find in the Find what field.
Enter the text or value you want to replace it with in the Replace with field. Leave it
blank to delete the found value.
Click Find Next, Replace, or Replace All as needed.
7. Using Formulas for Data Transformation:
Excel has a wide range of functions for text manipulation, date and time formatting,
number conversion, and more. Explore functions like LEFT(), RIGHT(), MID(),
SUBSTITUTE(), TEXT(), DATE(), VALUE(), etc., to transform your data as required.
8. Data Validation:
Use data validation to set rules for what type of data can be entered into cells. This can
help prevent errors and inconsistencies in the first place. Go to Data > Data Tools >
Data Validation.
9. Conditional Formatting:
Beyond highlighting duplicates, you can use conditional formatting to highlight other
inconsistencies or errors in your data based on specific rules or formulas.
10. Convert Text to Numbers
If numbers are stored as text:
o Select the range → small warning icon will appear → click and choose Convert
to Number.
o OR use =VALUE(A1).
11. Detect and Correct Errors
Formulas like:
o =ISERROR(A1) or =IFERROR(A1, "Error Found")
Highlight errors and fix them systematically.