0% found this document useful (0 votes)

9 views13 pages

EXP-12_IAIML

The document outlines an experiment focused on handling missing values in data, detailing various types of missing values and methods for identifying and addressing them using Python's Pandas library. It discusses techniques such as mean, median, mode imputation, forward and backward fill, and interpolation methods to manage missing data effectively. The experiment concludes that understanding the pattern of missingness is crucial for selecting appropriate handling techniques.

Uploaded by

samyak.18240

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views13 pages

EXP-12_IAIML

Uploaded by

samyak.18240

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Experiment No.

02
19/3/25
Date of Performance:

22/3/25
Date of Submission:

Program Execution/ Timely Viva Answer to Experiment

formation/ Submission Sample Total (10) Sign with
correction/ ethical practices (01) questions Date
(06) (03)

Experiment No. 12

AIM: Program to handle missing values in data.

LABORATORY OUTCOME:
CO3: Apply the most suitable search strategy to design problem solving agents.
CO4: Identify the pattern in data using scientific programming language.

PROBLEM STATEMENT: Program to handle missing values in data.

RELATED THEORY:

Introduction

Missing values are a common issue in machine learning. This occurs when a particular variable
lacks data points, resulting in incomplete information and potentially harming the accuracy and
dependability of your models.
What is a Missing Value?

Missing values are data points that are absent for a specific variable in a dataset. They can be
represented in various ways, such as blank cells, null values, or special symbols like “NA” or
“unknown.” These missing data points pose a significant challenge in data analysis and can lead to
inaccurate or biased results.

Types of Missing Values

There are three main types of missing values:

1. Missing Completely at Random (MCAR): MCAR is a specific type of missing data in

which the probability of a data point being missing is entirely random and
independent of any other variable in the dataset. In simpler terms, whether a value is
missing or not has nothing to do with the values of other variables or the
characteristics of the data point itself.
2. Missing at Random (MAR): MAR is a type of missing data where the probability of a
data point missing depends on the values of other variables in the dataset, but not on
the missing variable itself. This means that the missingness mechanism is not entirely
random, but it can be predicted based on the available information.
3. Missing Not at Random (MNAR): MNAR is the most challenging type of missing
data to deal with. It occurs when the probability of a data point being missing is
related to the missing value itself. This means that the reason for the missing data is
informative and directly associated with the variable that is missing.
Methods for Identifying Missing Data

Locating and understanding patterns of missingness in the dataset is an important step in

addressing its impact on analysis.There are several useful functions for detecting, removing, and
replacing null values in Pandas DataFrame.

Functions Descriptions

.isnull() Identifies missing values in a Series or

DataFrame.

.notnull() check for missing values in a pandas Series

or DataFrame. It returns a boolean Series
or DataFrame, where True indicates non-
missing values and False indicates missing
values.

.info() Displays information about the

DataFrame, including data types, memory
usage, and presence of missing values.

.isna() similar to notnull() but returns True for

missing values and False for non-missing
values.

dropna() Drops rows or columns containing missing

values based on custom criteria.

fillna() Fills missing values with specific values,

means, medians, or other calculated values.

replace() Replaces specific values with other values,

facilitating data correction and
standardization.

drop_duplicates() Removes duplicate rows based on specified

columns.

unique() Finds unique values in a Series or

DataFrame.

Common Representations
1. Blank cells: Empty cells in spreadsheets or databases often signify missing data.
2. Specific values: Special values like “NULL”, “NA”, or “-999” are used to represent
missing data explicitly.
3. Codes or flags: Non-numeric codes or flags can be used to indicate different types of
missing values.

Creating a Sample Dataframe

import pandas as pd

import numpy as np

# Creating a sample DataFrame with missing values

data = {

'School ID': [101, 102, 103, np.nan, 105, 106, 107, 108],

'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Henry'],

'Address': ['123 Main St', '456 Oak Ave', '789 Pine Ln', '101 Elm St', np.nan, '222 Maple Rd', '444 Cedar
Blvd', '555 Birch Dr'],

'City': ['Los Angeles', 'New York', 'Houston', 'Los Angeles', 'Miami', np.nan, 'Houston', 'New York'],

'Subject': ['Math', 'English', 'Science', 'Math', 'History', 'Math', 'Science', 'English'],

'Marks': [85, 92, 78, 89, np.nan, 95, 80, 88],

'Rank': [2, 1, 4, 3, 8, 1, 5, 3],

'Grade': ['B', 'A', 'C', 'B', 'D', 'A', 'C', 'B']

df = pd.DataFrame(data)

print("Sample DataFrame:")

print(df)

Output:

Removing Rows with Missing Values

● Simple and efficient: Removes data points with missing values altogether.
● Reduces sample size: Can lead to biased results if missingness is not random.
● Not recommended for large datasets: Can discard valuable information.

In this example, we are removing rows with missing values from the original DataFrame (df) using
the dropna() method and then displaying the cleaned DataFrame (df_cleaned).

# Removing rows with missing values

df_cleaned = df.dropna()

# Displaying the DataFrame after removing missing values

print("\nDataFrame after removing rows with missing values:")

print(df_cleaned)

Output:

Imputation Methods
● Replacing missing values with estimated values.
● Preserves sample size: Doesn’t reduce data points.
● Can introduce bias: Estimated values might not be accurate.

Here are some common imputation methods:

1- Mean, Median, and Mode Imputation:

● Replace missing values with the mean, median, or mode of the relevant variable.
● Simple and efficient: Easy to implement.
● Can be inaccurate: Doesn’t consider the relationships between variables.

In this example, we are explaining the imputation techniques for handling missing values in the
‘Marks’ column of the DataFrame (df). It calculates and fills missing values with the mean, median,
and mode of the existing values in that column, and then prints the results for observation.

1. Mean Imputation: Calculates the mean of the ‘Marks’ column in the DataFrame (df).
● df['Marks'].fillna(...): Fills missing values in the ‘Marks’ column with the mean value.
● mean_imputation: The result is stored in the variable mean_imputation.
2. Median Imputation: Calculates the median of the ‘Marks’ column in the DataFrame (df).
● df['Marks'].fillna(...): Fills missing values in the ‘Marks’ column with the median value.
● median_imputation: The result is stored in the variable median_imputation.
3. Mode Imputation: Calculates the mode of the ‘Marks’ column in the DataFrame (df). The
result is a Series.
● .iloc[0]: Accesses the first element of the Series, which represents the mode.
● df['Marks'].fillna(...): Fills missing values in the ‘Marks’ column with the mode value.

# Mean, Median, and Mode Imputation

mean_imputation = df['Marks'].fillna(df['Marks'].mean())

median_imputation = df['Marks'].fillna(df['Marks'].median())

mode_imputation = df['Marks'].fillna(df['Marks'].mode().iloc[0])

print("\nImputation using Mean:")

print(mean_imputation)

print("\nImputation using Median:")

print(median_imputation)

print("\nImputation using Mode:")

print(mode_imputation)

Output:

2. Forward and Backward Fill

● Replace missing values with the previous or next non-missing value in the same
variable.
● Simple and intuitive: Preserves temporal order.
● Can be inaccurate: Assumes missing values are close to observed values
● Forward Fill (forward_fill)
○ df['Marks'].fillna(method='ffill'): This method fills missing values in the ‘Marks’
column of the DataFrame (df) using a forward fill strategy. It replaces missing
values with the last observed non-missing value in the column.
○ forward_fill: The result is stored in the variable forward_fill.
● Backward Fill (backward_fill)
○ df['Marks'].fillna(method='bfill'): This method fills missing values in the ‘Marks’
column using a backward fill strategy. It replaces missing values with the next
observed non-missing value in the column.
○ backward_fill: The result is stored in the variable backward_fill.

# Forward and Backward Fill

forward_fill = df['Marks'].fillna(method='ffill')

backward_fill = df['Marks'].fillna(method='bfill')

print("\nForward Fill:")

print(forward_fill)

print("\nBackward Fill:")

print(backward_fill)

Output:

3. Interpolation Techniques
● Linear Interpolation
○ df['Marks'].interpolate(method='linear'): This method performs linear
interpolation on the ‘Marks’ column of the DataFrame (df). Linear interpolation
estimates missing values by considering a straight line between two adjacent non-
missing values.
○ linear_interpolation: The result is stored in the variable linear_interpolation.
● Quadratic Interpolation
○ df['Marks'].interpolate(method='quadratic'): This method performs Quadratic
Interpolation on the ‘Marks’ column. Quadratic interpolation estimates missing
values by considering a quadratic curve that passes through three adjacent non-
missing values.
○ quadratic_interpolation: The result is stored in the variable
quadratic_interpolation.

● Estimate missing values based on surrounding data points using techniques like
linear interpolation or spline interpolation.
● More sophisticated than mean/median imputation: Captures relationships between
variables.
● Requires additional libraries and computational resources.

Interpolation Techniques

linear_interpolation = df['Marks'].interpolate(method='linear')
quadratic_interpolation = df['Marks'].interpolate(method='quadratic')

print("\nLinear Interpolation:")
print(linear_interpolation)

print("\nQuadratic Interpolation:")
print(quadratic_interpolation)
Output:

RESULT/OUTPUT:
CONCLUSION:

In this experiment, we successfully implemented a program designed to handle missing values in datasets,
which is a critical step in data preprocessing for any data analysis or machine learning task. By applying
various imputation techniques, such as mean, median, mode, and advanced methods like regression
imputation, we demonstrated the effectiveness of these strategies in preserving the integrity of the dataset
while minimizing bias introduced by missing data. Additionally, we identified that the pattern of
missingness—whether it is missing completely at random, missing at random, or missing not at random—
plays a crucial role in determining the most appropriate handling technique.
QUESTIONS:

1. What is the purpose of exploring data?

a.To gain a better understanding of your data.
b.To gather your data into one repository.
c.To digitize your data.
d.To generate labels for your data.

2. What are the two main categories of techniques for exploring data? Choose two.
a.Histogram
b.Outliers
c.Visualization
d.Trends
e.Correlations
f.Summary statistics

3. Which method is used to fill missing values with the mean of a column in Pandas?

A dropna()

B fillna()

C mean()

D interpolate()

4. What will be the output of the following code?

import pandas as pd

import numpy as np

data = {'A': [1, 2, np.nan, 4], 'B': [np.nan, 2, 3, 4]}

df = pd.DataFrame(data)

result = df.isnull().sum()

print(result)
A) A 1
B 1
dtype: int64

B) A 2
B 2
dtype: int64

C) A 1
B 2
dtype: int64

D) A 2
B 1
dtype: int64

5 Which of the following methods is suitable for forward-filling missing data in a DataFrame?

A `fillna(method='ffill')`

B `fillna(method='bfill')`

C `interpolate()

D `dropna()`

Data Cleaning With Python and Pandas
No ratings yet
Data Cleaning With Python and Pandas
49 pages
unit 2 notes.docx-3
No ratings yet
unit 2 notes.docx-3
14 pages
Unit2
No ratings yet
Unit2
76 pages
FDS_U4.pptx
No ratings yet
FDS_U4.pptx
93 pages
CH 02 Data Handling Technique
No ratings yet
CH 02 Data Handling Technique
105 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Pandas Missing data
No ratings yet
Pandas Missing data
30 pages
How to Handle Missing Data in Python. [Explained in 5 Easy Steps]
No ratings yet
How to Handle Missing Data in Python. [Explained in 5 Easy Steps]
10 pages
Lab File
No ratings yet
Lab File
96 pages
Dmdw-Lab Manual
No ratings yet
Dmdw-Lab Manual
61 pages
Chapter 1. Data Preparation (2)
No ratings yet
Chapter 1. Data Preparation (2)
74 pages
Lecture 8 Handling Missing Values
No ratings yet
Lecture 8 Handling Missing Values
25 pages
Unit 3
No ratings yet
Unit 3
30 pages
Dealing with Missing Values
No ratings yet
Dealing with Missing Values
19 pages
Missing Data Handling
No ratings yet
Missing Data Handling
19 pages
Unit - 3 - R Programming
No ratings yet
Unit - 3 - R Programming
16 pages
Slides on DataII
No ratings yet
Slides on DataII
26 pages
Data Cleaning_Project work
No ratings yet
Data Cleaning_Project work
10 pages
Handling Missing Values in Python
No ratings yet
Handling Missing Values in Python
9 pages
lec 4
No ratings yet
lec 4
9 pages
Code explanation for date types
No ratings yet
Code explanation for date types
8 pages
Unit 2 Data Preprocessing (1)
No ratings yet
Unit 2 Data Preprocessing (1)
66 pages
ADS-EXP2
No ratings yet
ADS-EXP2
3 pages
Lec9 Dealing With Missing Values
No ratings yet
Lec9 Dealing With Missing Values
22 pages
3 -Missing Values-1
No ratings yet
3 -Missing Values-1
9 pages
ML Practical 03
No ratings yet
ML Practical 03
20 pages
chapter3 DS
No ratings yet
chapter3 DS
17 pages
exp3-2
No ratings yet
exp3-2
5 pages
Pandas
No ratings yet
Pandas
4 pages
DS Problem Statements and Codes
No ratings yet
DS Problem Statements and Codes
21 pages
handling missing values
No ratings yet
handling missing values
5 pages
Python Amit
No ratings yet
Python Amit
11 pages
Data - Preprocessing - 2
No ratings yet
Data - Preprocessing - 2
10 pages
Adsl Exp 3 2024
No ratings yet
Adsl Exp 3 2024
11 pages
Lecture 4 New Data Pre Processing
No ratings yet
Lecture 4 New Data Pre Processing
41 pages
Missing Values
No ratings yet
Missing Values
3 pages
DA unit 2 15m handling missing data
No ratings yet
DA unit 2 15m handling missing data
3 pages
Missing Data Values and How To Handle It
No ratings yet
Missing Data Values and How To Handle It
5 pages
Kenny-230722-Data Cleaning With Python and Pandas - Detecting Missing Values
No ratings yet
Kenny-230722-Data Cleaning With Python and Pandas - Detecting Missing Values
13 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
IB English Guys - A Doll's House
100% (3)
IB English Guys - A Doll's House
49 pages
Ass-2 Ds
No ratings yet
Ass-2 Ds
29 pages
Unit 5 Python
No ratings yet
Unit 5 Python
30 pages
PG Prospectus 2025
No ratings yet
PG Prospectus 2025
79 pages
6 Different Ways To Compensate For Missing Values in A Dataset
No ratings yet
6 Different Ways To Compensate For Missing Values in A Dataset
12 pages
Avinash DA 6
No ratings yet
Avinash DA 6
3 pages
DSBDA Lab Assignment No 2
No ratings yet
DSBDA Lab Assignment No 2
7 pages
Centraltendencywhattoconsider 1
No ratings yet
Centraltendencywhattoconsider 1
6 pages
DA lab
No ratings yet
DA lab
27 pages
Hammer v. Moreland - Document No. 3
No ratings yet
Hammer v. Moreland - Document No. 3
2 pages
Adjectives
100% (5)
Adjectives
6 pages
AI351 Lecture 1 - Data Preprocessing
No ratings yet
AI351 Lecture 1 - Data Preprocessing
8 pages
The Self in The Relationships and Through Spiritual
100% (1)
The Self in The Relationships and Through Spiritual
24 pages
FDS Unit 2
No ratings yet
FDS Unit 2
8 pages
Data Cleaning in Python
No ratings yet
Data Cleaning in Python
6 pages
Missing Data
No ratings yet
Missing Data
25 pages
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
No ratings yet
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
36 pages
Missingvaluetreatment-Ex 2 Code
No ratings yet
Missingvaluetreatment-Ex 2 Code
2 pages
Handling The Missing Values
No ratings yet
Handling The Missing Values
4 pages
Missing Data
No ratings yet
Missing Data
14 pages
ISAT 600 Progress Report 2
No ratings yet
ISAT 600 Progress Report 2
6 pages
Data Analytics lab manual
No ratings yet
Data Analytics lab manual
47 pages
GCSE SolvingQuadraticsByFactorising
No ratings yet
GCSE SolvingQuadraticsByFactorising
26 pages
How To Use The Evidence - Assessment and Application of Scientific Evidence
No ratings yet
How To Use The Evidence - Assessment and Application of Scientific Evidence
91 pages
Expect A Move of God Suddenly in Your Life - Meyer
100% (5)
Expect A Move of God Suddenly in Your Life - Meyer
61 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
Digital Marketing Interview Guide
No ratings yet
Digital Marketing Interview Guide
12 pages
Express File Upload: #Node - Js Express Notes
No ratings yet
Express File Upload: #Node - Js Express Notes
30 pages
Art Act
No ratings yet
Art Act
3 pages
14 December 2016 1481689118374
No ratings yet
14 December 2016 1481689118374
6 pages
Some Hypotheses On Three Types of Abduction
100% (1)
Some Hypotheses On Three Types of Abduction
19 pages
Vocab. For Identifying A Choice (Page 43)
No ratings yet
Vocab. For Identifying A Choice (Page 43)
31 pages
Rogue Inquisitor - 12 Criminal Half-Elf Chaotic Neutral: Attuned: - The Ring - Darkasan - Amulet of Health
No ratings yet
Rogue Inquisitor - 12 Criminal Half-Elf Chaotic Neutral: Attuned: - The Ring - Darkasan - Amulet of Health
3 pages
Ad SODE-NICMAR PDF
No ratings yet
Ad SODE-NICMAR PDF
13 pages
Secrets
No ratings yet
Secrets
29 pages
10-English-Fire-and-Ice---Assignment
No ratings yet
10-English-Fire-and-Ice---Assignment
2 pages
Nutrition and Immunity
No ratings yet
Nutrition and Immunity
40 pages
M. M. Rahman Co.: Statement of Financial Position
No ratings yet
M. M. Rahman Co.: Statement of Financial Position
5 pages
Padilla V Congress
No ratings yet
Padilla V Congress
6 pages
Example Repertoire For Flute: Baroque Period
No ratings yet
Example Repertoire For Flute: Baroque Period
2 pages
Course 572
No ratings yet
Course 572
8 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Excel Techniques
From Everand
Excel Techniques
Online Trainees
2/5 (1)
JD Associate Concertmaster May 19
No ratings yet
JD Associate Concertmaster May 19
3 pages
Practice 1
No ratings yet
Practice 1
2 pages
Daftar Pustaka
No ratings yet
Daftar Pustaka
3 pages
Work-Life Balance of Working Women-A Study of Female Nurses in Health Care Sector of J&K
No ratings yet
Work-Life Balance of Working Women-A Study of Female Nurses in Health Care Sector of J&K
6 pages
Apostila de Inglês
No ratings yet
Apostila de Inglês
28 pages
CFP&A
No ratings yet
CFP&A
4 pages
Autocollimator: The Autocollimator Combines Both Optical Tools, The
No ratings yet
Autocollimator: The Autocollimator Combines Both Optical Tools, The
9 pages

EXP-12_IAIML

Uploaded by

EXP-12_IAIML

Uploaded by

Experiment No.

Program Execution/ Timely Viva Answer to Experiment

AIM: Program to handle missing values in data.

PROBLEM STATEMENT: Program to handle missing values in data.

Types of Missing Values

There are three main types of missing values:

1. Missing Completely at Random (MCAR): MCAR is a specific type of missing data in

Locating and understanding patterns of missingness in the dataset is an important step in

.isnull() Identifies missing values in a Series or

.notnull() check for missing values in a pandas Series

.info() Displays information about the

.isna() similar to notnull() but returns True for

dropna() Drops rows or columns containing missing

fillna() Fills missing values with specific values,

replace() Replaces specific values with other values,

drop_duplicates() Removes duplicate rows based on specified

unique() Finds unique values in a Series or

Creating a Sample Dataframe

# Creating a sample DataFrame with missing values

'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Henry'],

'Subject': ['Math', 'English', 'Science', 'Math', 'History', 'Math', 'Science', 'English'],

'Rank': [2, 1, 4, 3, 8, 1, 5, 3],

'Grade': ['B', 'A', 'C', 'B', 'D', 'A', 'C', 'B']

Removing Rows with Missing Values

# Removing rows with missing values

# Displaying the DataFrame after removing missing values

print("\nDataFrame after removing rows with missing values:")

Here are some common imputation methods:

1- Mean, Median, and Mode Imputation:

# Mean, Median, and Mode Imputation

print("\nImputation using Mean:")

print("\nImputation using Median:")

print("\nImputation using Mode:")

2. Forward and Backward Fill

# Forward and Backward Fill

1. What is the purpose of exploring data?

4. What will be the output of the following code?

data = {'A': [1, 2, np.nan, 4], 'B': [np.nan, 2, 3, 4]}

You might also like