0% found this document useful (0 votes)
65 views

Data Cleaning and Visualization

The document provides an overview of logical functions in spreadsheets, highlighting their importance in data analysis for automating decision-making, improving accuracy, and enhancing insights. It details common logical functions like IF, AND, OR, NOT, IFS, IFERROR, and SWITCH, along with their syntax and examples. Additionally, it covers lookup functions such as VLOOKUP and HLOOKUP, financial functions like NPV and IRR, and array formulas including SUMPRODUCT and SUMIF.

Uploaded by

bijuaksel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Data Cleaning and Visualization

The document provides an overview of logical functions in spreadsheets, highlighting their importance in data analysis for automating decision-making, improving accuracy, and enhancing insights. It details common logical functions like IF, AND, OR, NOT, IFS, IFERROR, and SWITCH, along with their syntax and examples. Additionally, it covers lookup functions such as VLOOKUP and HLOOKUP, financial functions like NPV and IRR, and array formulas including SUMPRODUCT and SUMIF.

Uploaded by

bijuaksel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

MODULE-2

DATA CLEANING AND VISUALIZATION

Introduction to Logical Functions


Logical functions in spreadsheets (such as Microsoft Excel, Google Sheets, or similar tools)
are used to perform decision-making operations by evaluating conditions or comparisons.
These functions help automate tasks, apply conditional logic, and analyse data more efficiently.
They are essential for filtering, categorizing, or calculating data based on specified rules.

Why Are Logical Functions Important in Data Analysis?


Logical functions enable analysts to:
1. Automate Decision-Making: Replace manual checks by evaluating conditions and
returning results based on those conditions.
2. Simplify Complex Analysis: Perform conditional calculations without programming
knowledge.
3. Improve Accuracy: Reduce human error by relying on pre-defined logic.
4. Enhance Insights: Apply advanced filtering, categorization, and conditional
formatting to uncover patterns in data.

Common Applications of Logical Functions


1. Data Categorization:
o Example: Classifying students as "Pass" or "Fail" based on their scores.
2. Conditional Calculations:
o Example: Calculating bonuses only for employees who meet performance
criteria.
3. Error Handling:
o Example: Returning "N/A" or "Error" for invalid operations (e.g., division by
zero).
4. Advanced Filtering:
o Example: Identifying data that meet multiple criteria (e.g., customers who
purchased a specific product within a time frame).
5. Dynamic Reporting:
o Example: Creating dashboards that change based on specific inputs or
selections.

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


Key Logical Functions
Here are the most widely used logical functions in data analysis:

Function Purpose
IF Performs conditional checks and returns one value if TRUE and
another if FALSE.

AND
Checks if all conditions are TRUE.
OR Checks if any condition is TRUE.
NOT Reverses the logical result of a condition.
IFS Evaluates multiple conditions and returns corresponding values (Excel-
specific).

IFERROR Returns a specified value if a formula results in an error.


SWITCH Evaluates an expression against a list of values and returns the
corresponding result for the first match. If no match is found, returns a
default value.

1. IF Function
Returns one value if a condition is TRUE and another value if it is FALSE.
• Syntax: =IF(logical_test, value_if_true, value_if_false)
• Example:
Suppose we have a score in cell A1, and we want to determine Pass or Fail based on
whether the score is greater than or equal to 50.
Formula:
=IF(A1>=50, "Pass", "Fail")
Output:
o If A1 = 70 → Result is "Pass".
o If A1 = 40 → Result is "Fail".

2. AND Function
Returns TRUE if all conditions are TRUE.
• Syntax: =AND(condition1, condition2, ...)
• Example:
Check if a student passed both Math (A1) and Science (B1) with scores ≥50.
Formula:
=AND(A1>=50, B1>=50)

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


Output:
o If A1 = 60 and B1 = 70 → Result is TRUE.
o If A1 = 60 and B1 = 40 → Result is FALSE.

3. OR Function
Returns TRUE if at least one condition is TRUE
• Syntax: =OR(condition1, condition2, ...)
• Example:
Check if a student passed either Math (A1) or Science (B1) with scores ≥50.
Formula:
=OR(A1>=50, B1>=50)
Output:
o If A1 = 60 and B1 = 40 → Result is TRUE.
o If A1 = 40 and B1 = 30 → Result is FALSE.

4. NOT Function
Reverses the result of a condition.
• Syntax: =NOT(logical_test)
• Example:
Check if an item is not "Out of Stock" (A1).
Formula:
=NOT(A1="Out of Stock")
Output:
o If A1 = "In Stock" → Result is TRUE.
o If A1 = "Out of Stock" → Result is FALSE.

5. IFS Function (Excel-specific)


Returns a value corresponding to the first TRUE condition
• Syntax: =IFS(condition1, value1, condition2, value2, ...)
• Example:
Classify scores (A1) into categories:
o ≥80: "High"
o ≥50: "Medium"
o <50: "Low"

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


Formula:
=IFS(A1>=80, "High", A1>=50, "Medium", A1<50, "Low")
Output:
o If A1 = 90 → Result is "High".
o If A1 = 65 → Result is "Medium".
o If A1 = 40 → Result is "Low".

6. IFERROR Function
Returns a custom value if a formula results in an error; otherwise, returns the formula’s
result.
• Syntax: =IFERROR(formula, value_if_error)
• Example:
Handle errors when dividing numbers (A1/B1).
Formula:
=IFERROR(A1/B1, "Error")
Output:
o If A1 = 10, B1 = 2 → Result is 5.
o If A1 = 10, B1 = 0 → Result is "Error".

7. SWITCH Function
Evaluates an expression against a list of values and returns the corresponding result
for the first match. If no match is found, returns a default value.
• Syntax: =SWITCH(expression, value1, result1, [value2, result2], ..., [default])
• Example:
o Formula: =SWITCH(A1, "Red", 1, "Blue", 2, "Green", 3, 0)
o Description: Returns 1 if A1 is "Red", 2 if "Blue", 3 if "Green", and 0
otherwise.

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


Lookup and reference functions
Lookup and reference functions are essential tools in spreadsheets, used for searching and
retrieving data from a specific location in a table or range. They allow users to retrieve
specific data from large datasets, establish relationships, and enhance the functionality of
their analysis.

1. VLOOKUP Function
Searches for a value in the first column of a table and returns a value in the same row from a
specified column.
• Syntax:
=VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])
Parameters:
1. lookup_value: The value you want to look up.
2. table_array: The range of cells that contains the data.
3. col_index_num: The column number in the table array from which to retrieve the
value.
4. [range_lookup]: Optional. TRUE for an approximate match or FALSE for an exact
match.

Example: Look up the Department of the employee with ID 103


ID Name Department
101 Alice HR
102 Bob IT
103 Charlie Finance
104 Diana Marketing

=VLOOKUP(103, A1:C5, 3, FALSE)


Explanation:
• 103: This is the value you’re looking for (the ID).
• A1:C5: This is the range of data.
• 3: The result will come from the 3rd column in the range (Department).
• FALSE: You want an exact match.
Result:--Finance

2. HLOOKUP Function
Searches for a value in the first row of a table and returns a value in the same column from a
specified row.
• Syntax:
=HLOOKUP(lookup_value, table_array, row_index_num, [range_lookup])
Parameters:
1. lookup_value: The value you want to search for in the first row.
2. table_array: The range of cells containing the data, including the row where you
want to search and retrieve the value.
3. row_index_num: The row number (relative to the table_array) from which to return
the result.
Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy
4. [range_lookup] (optional):
o TRUE: Approximate match (default if omitted).
o FALSE: Exact match

Example: Find the Department (in the 3rd row) of the employee with ID 102.
ID 101 102 103
Name Alice Bob Charlie
Dept. HR IT Finance

=HLOOKUP(102, A1:D3, 3, FALSE)


Result:--IT

3. INDEX Function
Returns the value of a cell within a specified range based on row and column numbers.
• Syntax:
=INDEX(array, row_num, [column_num])
Parameters:
o array: The range of data.
o row_num: The row number in the range.
o column_num: The column number in the range.
Example: Retrieve the Department of the 3rd row.
ID Name Department
101 Alice HR
102 Bob IT
103 Charlie Finance
104 Diana Marketing

=INDEX(A1:C5, 3, 3)
Result: Finance

4. MATCH Function
Purpose: Returns the position of a value in a range.
• Syntax:
=MATCH(lookup_value, lookup_array, [match_type])
o lookup_value: The value you want to find.
o lookup_array: The range to search.
o match_type: 1 for less than, 0 for exact match, -1 for greater than.
Example: Find the position of "Charlie" in the list.
Name
Alice
Bob
Charlie
=MATCH("Charlie", A1:A4, 0)
Result: 3 (Charlie is in the 3rd row)

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


Comparison of Functions

Function Strengths Limitations


Requires lookup column to be the first
VLOOKUP Easy to use for vertical lookups.
column.
HLOOKUP Easy for horizontal lookups. Limited to horizontally aligned data.

INDEX Flexible for dynamic lookups. Requires specific row/column numbers.

MATCH Finds positions for dynamic Cannot return actual values.


references

Financial functions
Financial functions like NPV, IRR, and PMT are essential for analyzing financial data and
making decisions in spreadsheet tools such as Microsoft Excel, Google Sheets, or other
similar applications.

1. NPV (Net Present Value)


What is NPV?
• NPV is used to calculate the present value of an investment’s future cash flows,
minus the initial investment. It tells you whether an investment is worth it or not
based on a specified discount rate (required rate of return).
• In simple terms, NPV answers: "How much is this investment worth today?"
• Syntax : =NPV(rate, value1, value2, ...) + initial_investment
rate: The discount rate or required rate of return (e.g., 10%).
value1, value2, ...: Future cash flows (positive or negative).
initial_investment: The initial cost of the investment (it should be entered
separately and subtracted from the NPV result).
• Example:
• Imagine you want to invest in a project that requires $5,000 today (initial investment),
and you expect the following future cash inflows:
• -$5,000 (initial investment)
• Year 1: $2,000
• Year 2: $2,500
• Year 3: $3,000 The required return (discount rate) is 10%.

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


2. IRR (Internal Rate of Return)
What is IRR?
IRR is the rate at which the NPV of an investment becomes zero. It’s the rate of return that
makes the future cash flows break even with the initial investment.
In simple words, IRR answers: "What is the annual return you can expect from this
investment?"
Syntax : =IRR(values, [guess])
• values: Cash flows (starting with the initial investment followed by future inflows).
• [guess]: Optional. An estimated rate to help Excel calculate the IRR. If you don’t
provide a guess, Excel uses 10% by default.
Example:
Using the same cash flows as before:
• Initial investment: -$5,000 (outflow)
• Year 1: $2,000
• Year 2: $2,500
• Year 3: $3,000

3. PMT (Payment)
What is PMT?
PMT calculates the regular payment for a loan or investment based on constant payments
and a fixed interest rate. It's typically used for loan payments, mortgage payments, or
regular investments.
In simple words, PMT answers: "How much do I need to pay each period to repay a loan
or investment?"
Syntax: =PMT(rate, nper, pv, [fv], [type])
• rate: The interest rate per period (e.g., 5% annually, but if payments are monthly,
divide by 12).
• nper: The total number of periods (e.g., 12 months, 5 years).
• pv: The present value, or the loan amount (or investment amount).
• [fv]: The future value (optional, usually 0 if paying off the loan completely).
• [type]: When the payments are due. Use 0 for the end of the period (default) and 1 for
the beginning.

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


Example:
Imagine you borrow $10,000 for 5 years with an annual interest rate of 6%. You want to
calculate the monthly payment.
In Excel:
1. rate = 6% per year (monthly rate: 6%/12 = 0.5% per month)
2. nper = 5 years * 12 months = 60 months
3. pv = $10,000 (loan amount)
To calculate the monthly payment:
=PMT(0.06/12, 60, -10000)

Array formulas

1. SUMPRODUCT
Syntax: =SUMPRODUCT(array1, [array2], [array3], ...)
Explanation: The SUMPRODUCT function multiplies the corresponding elements in the
provided arrays (or ranges) and then adds them up. It’s typically used for weighted averages
or conditional calculations.
Example:

A B

Quantity Price

2 5

3 6

4 7

5 8

Formula: =SUMPRODUCT(A2:A5, B2:B5)


This multiplies each pair of values in columns A and B, then sums the result:
• (2 * 5) + (3 * 6) + (4 * 7) + (5 * 8)
• 10 + 18 + 28 + 40 = 96
Result:
The total value is 96

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


2. SUMIF
Syntax: =SUMIF(range, criteria, [sum_range])
Explanation: The SUMIF function adds up the values in a specified range that meet a given
condition (criteria).
• range: The range of cells you want to apply the criteria to.
• criteria: The condition that must be met.
• sum_range: The actual range to sum if different from the range.
Example:

A B

Product Sales

Apple 5

Banana 3

Apple 4

Orange 6

Formula to sum sales for "Apple": =SUMIF(A2:A5, "Apple", B2:B5)


This sums the sales in column B where the product in column A is "Apple": 5 + 4 = 9
Result:
The sum of sales for "Apple" is 9.
3. AVERAGE
Syntax: =AVERAGE(number1, [number2], ...)
Explanation: The AVERAGE function calculates the arithmetic mean (average) of a group
of numbers or cells.
Example:

A B

Product Sales

Apple 5

Banana 3

Apple 4

Orange 6

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


Formula to calculate the average sales: =AVERAGE(B2:B5)
This calculates the average of the values in column B:
• (5 + 3 + 4 + 6) / 4 = 18 / 4 = 4.5
Result:
The average sales value is 4.5.
4. TRANSPOSE
Syntax: =TRANSPOSE(array)
Explanation: The TRANSPOSE function switches the rows and columns of a given array or
range. If the data is in a row, it will turn it into a column, and vice versa.
Example:

Formula: =TRANSPOSE(A1:A4)
This transposes the column values into a row:
• 1, 2, 3, 4 in a single row.
5. ARRAY MULTIPLICATION
Syntax: =array1 * array2
Explanation: You can multiply two arrays (ranges) element-wise in Excel or Google Sheets.
This operation is done for each corresponding pair of values.
Example:

A B

Quantity Price

2 5

3 6

4 7

5 8

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


Formula to multiply arrays element-wise: =A2:A5 * B2:B5
This multiplies corresponding values from columns A and B:
• (2 * 5), (3 * 6), (4 * 7), (5 * 8)
Result:
The resulting array will be {10, 18, 28, 40}.
6. FILTER
Syntax: =FILTER(array, include, [if_empty])
Explanation: The FILTER function allows you to extract data from a given array based on a
condition or criteria. If no matches are found, you can define what should appear as the
output.
• array: The range of values to filter.
• include: The condition to filter by (this can be a logical expression).
• if_empty: Optional. The value to return if no matches are found.
Example:

A B

Product Price

Apple 4

Banana 6

Orange 8

Grape 3

Formula to filter prices greater than 5: =FILTER(A2:B5, B2:B5 > 5)


This filters the rows where the price is greater than 5: Banana, 6 and Orange, 8 are the results.
7. IMPORTRANGE
Syntax: =IMPORTRANGE(spreadsheet_url, range_string)
Explanation: The IMPORTRANGE function imports data from another Google Sheets
document. You provide the URL of the source spreadsheet and the range of cells to import.
• spreadsheet_url: The URL of the source Google Sheets document.
• range_string: The specific range to import (e.g., Sheet1!A1:B5).
Example:
Assuming you want to import data from another Google Sheets file:

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


Formula to import data:
=IMPORTRANGE("https://docs.google.com/spreadsheets/d/abc12345", "Sheet1!A1:B5")
This imports the data from range A1:B5 in the external sheet and places it in the current
sheet.

SUMPRODUCT: Multiplies corresponding elements in arrays and sums the results.


SUMIF: Adds values that meet a specified condition.
AVERAGE: Calculates the arithmetic mean of a range of numbers.
TRANSPOSE: Switches the rows and columns of an array.
ARRAY MULTIPLICATION: Multiplies corresponding values in two arrays.
FILTER: Filters data based on conditions.
IMPORTRANGE: Imports data from another Google Sheets document.

Handling Missing Values


Missing values can occur when data is incomplete or incorrectly entered. Excel provides
several ways to handle them.
Example 1: Removing rows with missing values
• You have a dataset of customers with some missing email addresses.
• Solution:
1. Select the range of your data.
2. Go to the Data tab and click on Filter.
3. Use the filter dropdown in the column with missing values, and uncheck the
blanks.
4. You can now delete the rows with missing data, or analyze them further.
Example 2: Filling missing values
• A product list has missing prices.
• Solution:
1. Select the column with missing prices.
2. Click on the Find & Select dropdown in the Home tab and choose Go To
Special.
3. Choose Blanks, which will highlight all blank cells.

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


Handling Duplicates
Duplicate values can distort analysis and skew results. Here's how to find and remove them:
Example 1: Removing duplicate rows
• You have a list of customer orders and some orders appear multiple times.
• Solution:
1. Select your data range.
2. Go to the Data tab and click Remove Duplicates.
3. In the dialog box, select the columns you want to check for duplicates (e.g.,
Customer ID and Order ID).
4. Click OK, and Excel will remove the duplicate rows.
Example 2: Highlighting duplicates
• You want to highlight duplicate values without removing them.
• Solution:
1. Select the data range.
2. Go to the Home tab and click Conditional Formatting.
3. Choose Highlight Cells Rules > Duplicate Values.
4. Choose a format (e.g., a color) and click OK.

Text to Columns
Splitting full names into first and last names
• Scenario: You have a list of customer full names, and you want to split them into first
and last names.
Select the "Full Name" column.
Go to the Data tab and click Text to Columns.
In the wizard, choose Delimited and click Next.
Choose Space as the delimiter and click Finish.
The names will be split into separate columns

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


Merging Cells
Merging cells for a title
• You want to create a header for a sales report.
1. Select the range of cells where you want to create a title (e.g., A1 to D1).
2. Click Merge & Center in the Home tab.
3. Now, you can type your header title, such as "Sales Report."

Advanced Text Functions


Text functions in Excel help manipulate and clean up text data. Here are some of the most
useful ones:
• TRIM: Removes extra spaces from text, leaving only single spaces between words.
This is useful when data has inconsistent spacing.
Syntax: =TRIM(A1)
• LEFT, RIGHT, and MID: These functions extract a specific number of characters
from a string.
Syntax:
o =LEFT(A1, 5) extracts the first 5 characters.
o =RIGHT(A1, 3) extracts the last 3 characters.
o =MID(A1, 2, 4) extracts 4 characters starting from the 2nd character.
• UPPER, LOWER, and PROPER: These functions help standardize the case of text.
o UPPER(A1) converts text to uppercase.
o LOWER(A1) converts text to lowercase.
o PROPER(A1) capitalizes the first letter of each word.
• SUBSTITUTE: Replaces occurrences of a specific substring with another.
=SUBSTITUTE(A1, "oldText", "newText")
These functions can be used together to clean data, such as removing unwanted spaces or
correcting capitalization.

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


Data Validation Rules
Data validation in Excel helps ensure that the data entered into a cell meets specific criteria.
Some key examples of data validation include:
• Restricting Data Entry: For example, you can set rules to ensure only numbers are
entered in a cell, or only dates in a specific range.
o To do this, select a range of cells, go to the Data tab > Data Validation.
o Under the Settings tab, you can choose from:
▪ Whole Number: Only allows integer values.
▪ Decimal: Allows decimal values within a range.
▪ Date: Allows only valid dates within a specified range.
▪ List: Restricts input to a list of values.

Error Checking
Excel has built-in tools for detecting and managing errors in your data:
• Error Indicators: Excel highlights cells with errors (like #VALUE!, #DIV/0!, etc.).
You can click on the small warning icon next to a cell to get more information about
the error and potential fixes.
• IFERROR: This function helps manage errors by allowing you to specify a custom
result if a formula returns an error.
• ISERROR, ISNA, ISBLANK: These functions help detect specific types of errors:
o =ISERROR(A1) returns TRUE if A1 contains any error.
o =ISNA(A1) returns TRUE if A1 contains an #N/A error.
o =ISBLANK(A1) returns TRUE if A1 is empty.

Creating and customizing charts

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


Bar Chart
1. Select the data you want to chart (including labels).
2. Go to the Insert tab on the Ribbon.
3. In the Charts group, click on the Bar Chart icon.
4. Choose the type of bar chart you prefer (Clustered Bar, Stacked Bar, etc.).
5. You can then customize the chart (e.g., change colors, labels, or the chart title) using
the Chart Tools that appear when you select the chart.
Line Chart
1. Select the data for the line chart.
2. Go to the Insert tab.
3. Click on Line Chart in the Charts group.
4. Choose your desired style (Line, Stacked Line, 100% Stacked Line, etc.).
5. Format the chart by clicking on the chart and using the Chart Tools options.
Pie Chart
1. Highlight the data (ensure you have a label for each segment).
2. Go to the Insert tab.
3. Click Pie Chart and select the style you like (e.g., 2D Pie, 3D Pie).
4. Customize the chart by right-clicking on individual sections of the pie to change
colors or labels.
Scatter Plot
1. Select your data, which should consist of two sets of values (X and Y axes).
2. Go to the Insert tab.
3. Click on Scatter Chart and choose the type of scatter chart.
4. You can add trendlines, adjust the axis, and apply colors through the Chart Tools.
Histogram
1. Select the data range for the histogram.
2. Go to the Insert tab and click on the Insert Statistic Chart icon.
3. Choose Histogram.
4. Excel will automatically group the data into bins. You can adjust the bin width and
style from the chart options.

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


Customizing the Chart
• Chart Style: Click on the chart, then use the Chart Design tab to choose different
chart styles and layouts.
• Color: Right-click elements (bars, lines, etc.) to change their color.
• Data Labels: Right-click on the data series and select Add Data Labels to display
values on the chart.
• Axis Titles: Click on the chart and then the Chart Elements button (the plus sign on
the top right of the chart) to add or remove axis titles, chart title, gridlines, and more.

Applying Conditional Formatting


Conditional formatting allows you to highlight data based on specific criteria, helping you
visualize trends or outliers.
Steps to Apply Conditional Formatting
1. Select the data range that you want to format.
2. Go to the Home tab and click Conditional Formatting.
3. Choose one of the formatting styles:
o Highlight Cells Rules: Allows you to highlight cells based on values (greater
than, less than, between, etc.).
o Top/Bottom Rules: Highlight the top or bottom percentages or values.
o Data Bars: Adds bars to cells to represent the value proportionally.
o Color Scales: Applies a gradient color scale to the selected data based on
value.
o Icon Sets: Adds icons (like arrows or flags) to indicate different ranges of
values.
4. Choose the specific condition you want to apply (e.g., "Greater Than" or "Top 10").
5. Adjust the formatting settings if necessary (e.g., change the color or icon style).

Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy


Ms. Aleena Rose, Sacred Heart College (Autonomous), Chalakudy

You might also like