0% found this document useful (0 votes)

7 views

Pandas_Notes

Pandas is an open-source Python library crucial for data manipulation and preprocessing in machine learning, offering features like Series and DataFrame for data handling, and tools for data cleaning and transformation. It supports various data operations including reading/writing files, indexing, merging, and grouping, while also integrating seamlessly with other libraries like NumPy and Matplotlib. The library is essential for tasks such as data preprocessing, feature engineering, and exploratory data analysis.

Uploaded by

Kuwar Raghvendra Singh

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Pandas_Notes

Uploaded by

Kuwar Raghvendra Singh

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Detailed Notes on Pandas for Machine Learning

Interviews

Introduction to Pandas
Pandas is an open-source Python library providing high-performance, easy-to-use data
structures and data analysis tools. It is a fundamental library for data manipulation and
preprocessing in machine learning.

Key Features:
• Data Structures: Offers Series and DataFrame for handling labeled and tabular
data.

• Data Manipulation: Provides tools for reshaping, merging, sorting, and filtering
data.

• Data Cleaning: Supports handling missing values, duplicates, and applying trans-
formations.

• Integration: Works seamlessly with NumPy, Matplotlib, and other ML libraries.

Core Data Structures

1. Series
A one-dimensional labeled array capable of holding any data type.

import pandas as pd
s = pd.Series([1, 2, 3, 4], index=[’a’, ’b’, ’c’, ’d’])

Key Attributes and Methods:

• s.index: Returns the index of the Series.

• s.values: Returns the values of the Series.

• s.head(n): Returns the first n elements.

• s.tail(n): Returns the last n elements.

1
2. DataFrame
A two-dimensional labeled data structure, similar to a spreadsheet or SQL table.

data = {’Name’: [’Alice’, ’Bob’], ’Age’: [25, 30]}

df = pd.DataFrame(data)

Key Attributes and Methods:

• df.shape: Returns the dimensions of the DataFrame.

• df.columns: Lists column labels.

• df.dtypes: Displays data types of each column.

• df.info(): Provides a summary of the DataFrame.

• df.describe(): Generates descriptive statistics for numerical columns.

Essential Pandas Operations

1. Reading and Writing Data
• CSV Files:

df = pd.read_csv(’file.csv’)
df.to_csv(’output.csv’, index=False)

• Excel Files:

df = pd.read_excel(’file.xlsx’)
df.to_excel(’output.xlsx’, index=False)

• JSON Files:

df = pd.read_json(’file.json’)
df.to_json(’output.json’)

2. Indexing and Selecting Data

• Accessing Columns:

df[’column_name’]
df[[’col1’, ’col2’]]

• Accessing Rows:

2
df.loc[0] # By label
df.iloc[0] # By position

• Slicing:

df.loc[1:3, [’col1’, ’col2’]]

df.iloc[1:3, 0:2]

3. Data Cleaning
• Handling Missing Values:

df.isnull().sum() # Count missing values

df.fillna(value) # Fill missing values
df.dropna() # Remove rows with missing values

• Renaming Columns:

df.rename(columns={’old_name’: ’new_name’}, inplace=True)

• Removing Duplicates:

df.drop_duplicates(inplace=True)

4. Data Transformation
• Apply Functions:

df[’col’] = df[’col’].apply(lambda x: x * 2)

• Mapping Values:

df[’col’] = df[’col’].map({’A’: 1, ’B’: 2})

• Replacing Values:

df.replace({’old_val’: ’new_val’}, inplace=True)

3
5. Merging and Joining Data
• Concatenation:

pd.concat([df1, df2], axis=0)

• Merging:

pd.merge(df1, df2, on=’key’, how=’inner’)

• Joining:

df1.join(df2, how=’left’)

6. Grouping and Aggregation

• Group By:

grouped = df.groupby(’column_name’)
grouped[’col’].mean()

• Aggregations:

df.agg({’col1’: ’mean’, ’col2’: ’sum’})

Advanced Topics in Pandas

1. Working with Time Series
• Converting to Datetime:

df[’date’] = pd.to_datetime(df[’date’])

• Setting Index:

df.set_index(’date’, inplace=True)

• Resampling:

df.resample(’M’).mean() # Monthly average

4
2. Handling Categorical Data
• Converting to Categorical:

df[’category’] = df[’category’].astype(’category’)

• Creating Dummies:

pd.get_dummies(df[’category’])

3. Pivot Tables
• Creating Pivot Tables:

df.pivot_table(values=’value_col’, index=’row_col’, columns=’col_col’, aggfun

Applications in Machine Learning

1. Data Preprocessing
• Handling missing values, normalization, and encoding.

df.fillna(df.mean(), inplace=True)
df[’encoded’] = pd.get_dummies(df[’category’], drop_first=True)

2. Feature Engineering
• Creating new features using existing columns.

df[’new_feature’] = df[’col1’] / df[’col2’]

3. Exploratory Data Analysis (EDA)

• Summarizing data using descriptive statistics.

df.describe()
df.corr()

5
4. Integration with Other Libraries
• Scikit-learn: Used for feature extraction and model training.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
df_scaled = scaler.fit_transform(df)

• Matplotlib and Seaborn: Used for visualization.

import matplotlib.pyplot as plt

import seaborn as sns
sns.heatmap(df.corr(), annot=True)

Practice Questions for Interviews

1. How would you handle missing data in Pandas?

2. Explain the difference between loc and iloc.

3. How do you perform one-hot encoding in Pandas?

4. What is the use of pivot tables in Pandas?

5. Demonstrate how to merge two DataFrames with different keys.

6. How do you group data and calculate the mean in Pandas?

7. Explain how you would preprocess categorical data for machine learning.

8. Write code to calculate the correlation between numerical columns in a DataFrame.

9. How do you filter rows based on a condition in Pandas?

10. Describe how Pandas can be used for feature engineering.

Summary
Pandas is a powerful library essential for data manipulation and preprocessing in ma-
chine learning workflows. Its wide range of functionalities, from data cleaning to feature
engineering, makes it an indispensable tool in any data scientist’s toolkit.

Pandas Basics
No ratings yet
Pandas Basics
84 pages
Mastering Objectoriented Python
From Everand
Mastering Objectoriented Python
Steven F. Lott
5/5 (2)
File Handling in WebSphere Message Broker V6.1
No ratings yet
File Handling in WebSphere Message Broker V6.1
20 pages
Rpgii and Sys36
No ratings yet
Rpgii and Sys36
914 pages
What is pandas
No ratings yet
What is pandas
9 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
DevOps Session 3 Pandas.pptx
No ratings yet
DevOps Session 3 Pandas.pptx
33 pages
Ii Unit Pandas
No ratings yet
Ii Unit Pandas
30 pages
99c949c0-5910-425f-9ac5-155882800fa5
No ratings yet
99c949c0-5910-425f-9ac5-155882800fa5
36 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
Python Pandas
No ratings yet
Python Pandas
13 pages
Pandas For Data Science
No ratings yet
Pandas For Data Science
42 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Unit 4
No ratings yet
Unit 4
36 pages
Pandas
No ratings yet
Pandas
4 pages
Pandas (Ziad)
No ratings yet
Pandas (Ziad)
38 pages
Pandas
No ratings yet
Pandas
25 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
JOINS (1)
No ratings yet
JOINS (1)
10 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Pandas
No ratings yet
Pandas
5 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Practical Guide To Pandas For Data Science
No ratings yet
Practical Guide To Pandas For Data Science
26 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
2_Pandas
No ratings yet
2_Pandas
22 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
mypnotes
No ratings yet
mypnotes
3 pages
unit-3(FODS)
No ratings yet
unit-3(FODS)
34 pages
Phan1_Pandas_Numpy_Matplotlib
No ratings yet
Phan1_Pandas_Numpy_Matplotlib
158 pages
Pandas
No ratings yet
Pandas
12 pages
Pandas
No ratings yet
Pandas
94 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Pandas
No ratings yet
Pandas
41 pages
Pandas
No ratings yet
Pandas
13 pages
pandas
No ratings yet
pandas
10 pages
Pandas
No ratings yet
Pandas
9 pages
Pandas
No ratings yet
Pandas
28 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandas PDF(2)
No ratings yet
Pandas PDF(2)
25 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
64 pages
DataFrame.docx
No ratings yet
DataFrame.docx
95 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Python Pandas Tutorial
No ratings yet
Python Pandas Tutorial
6 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
18_Pandas
No ratings yet
18_Pandas
33 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Usage of NumPy for Numerical Data in Detail
No ratings yet
Usage of NumPy for Numerical Data in Detail
52 pages
Python Pandas Cheatsheety
No ratings yet
Python Pandas Cheatsheety
7 pages
Pandas
No ratings yet
Pandas
29 pages
exp3 python (1)
No ratings yet
exp3 python (1)
15 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
1 page
Pandas
No ratings yet
Pandas
41 pages
Pandas
No ratings yet
Pandas
8 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
Class XII IP Key Points (Python Pandas)
No ratings yet
Class XII IP Key Points (Python Pandas)
5 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Áudio HTML Video DOM Reference
No ratings yet
Áudio HTML Video DOM Reference
8 pages
Core Course Vii J2Ee Technologies Objective
No ratings yet
Core Course Vii J2Ee Technologies Objective
72 pages
Learning Terraform Notes: Linkedin
No ratings yet
Learning Terraform Notes: Linkedin
81 pages
PL-SQL Queries
No ratings yet
PL-SQL Queries
12 pages
C Array
No ratings yet
C Array
17 pages
How To Add Pandas To Spyder?: Ans-Import Pandas As PD
No ratings yet
How To Add Pandas To Spyder?: Ans-Import Pandas As PD
3 pages
Unit III
No ratings yet
Unit III
8 pages
Mock Test
No ratings yet
Mock Test
3 pages
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
0% (1)
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
40 pages
DMW Lab Manual (1) EDIT
No ratings yet
DMW Lab Manual (1) EDIT
118 pages
Os Lab6
No ratings yet
Os Lab6
12 pages
Release Strategy Procedure For Purchase Order
No ratings yet
Release Strategy Procedure For Purchase Order
7 pages
Reading Strings-Tiny Text Editor Reading Console Input:Scanner Writing Console Output
No ratings yet
Reading Strings-Tiny Text Editor Reading Console Input:Scanner Writing Console Output
31 pages
Lesson 01 Introduction To API Testing
No ratings yet
Lesson 01 Introduction To API Testing
35 pages
Lec 1-SOFTWARE QUALITY ENGINEERING introduction
No ratings yet
Lec 1-SOFTWARE QUALITY ENGINEERING introduction
60 pages
04 BCSet
No ratings yet
04 BCSet
49 pages
Vercel Headless Mini Guide Updated
No ratings yet
Vercel Headless Mini Guide Updated
10 pages
Basics of SQL: Oracle Day 1 Afternoon Session
No ratings yet
Basics of SQL: Oracle Day 1 Afternoon Session
14 pages
Lecture#5 - Chap#2 (Syntax Directed Translator (Part-I) )
No ratings yet
Lecture#5 - Chap#2 (Syntax Directed Translator (Part-I) )
34 pages
Installation of Hot Runner
No ratings yet
Installation of Hot Runner
67 pages
Project Green Kart
No ratings yet
Project Green Kart
36 pages
POM With Selenium in Python - Learning Guide: On Weekend
No ratings yet
POM With Selenium in Python - Learning Guide: On Weekend
10 pages
Computer Architecture Course Intro
100% (1)
Computer Architecture Course Intro
13 pages
Weekly Report
No ratings yet
Weekly Report
10 pages
SQL Simplified
No ratings yet
SQL Simplified
11 pages
ADVANCE JAVA
No ratings yet
ADVANCE JAVA
6 pages
SAP ABAP - Interview Questions?
No ratings yet
SAP ABAP - Interview Questions?
8 pages
Web Programming: With Python and Javascript
No ratings yet
Web Programming: With Python and Javascript
47 pages

Pandas_Notes

Uploaded by

Pandas_Notes

Uploaded by

Detailed Notes on Pandas for Machine Learning

• Integration: Works seamlessly with NumPy, Matplotlib, and other ML libraries.

Core Data Structures

Key Attributes and Methods:

• s.index: Returns the index of the Series.

• s.values: Returns the values of the Series.

• s.head(n): Returns the first n elements.

• s.tail(n): Returns the last n elements.

data = {’Name’: [’Alice’, ’Bob’], ’Age’: [25, 30]}

Key Attributes and Methods:

• df.shape: Returns the dimensions of the DataFrame.

• df.columns: Lists column labels.

• df.dtypes: Displays data types of each column.

• df.info(): Provides a summary of the DataFrame.

• df.describe(): Generates descriptive statistics for numerical columns.

Essential Pandas Operations

2. Indexing and Selecting Data

df.loc[1:3, [’col1’, ’col2’]]

df.isnull().sum() # Count missing values

df.rename(columns={’old_name’: ’new_name’}, inplace=True)

df[’col’] = df[’col’].map({’A’: 1, ’B’: 2})

df.replace({’old_val’: ’new_val’}, inplace=True)

pd.concat([df1, df2], axis=0)

pd.merge(df1, df2, on=’key’, how=’inner’)

6. Grouping and Aggregation

df.agg({’col1’: ’mean’, ’col2’: ’sum’})

Advanced Topics in Pandas

df.resample(’M’).mean() # Monthly average

df.pivot_table(values=’value_col’, index=’row_col’, columns=’col_col’, aggfun

Applications in Machine Learning

df[’new_feature’] = df[’col1’] / df[’col2’]

3. Exploratory Data Analysis (EDA)

from sklearn.preprocessing import StandardScaler

• Matplotlib and Seaborn: Used for visualization.

import matplotlib.pyplot as plt

Practice Questions for Interviews

2. Explain the difference between loc and iloc.

3. How do you perform one-hot encoding in Pandas?

4. What is the use of pivot tables in Pandas?

5. Demonstrate how to merge two DataFrames with different keys.

6. How do you group data and calculate the mean in Pandas?

8. Write code to calculate the correlation between numerical columns in a DataFrame.

9. How do you filter rows based on a condition in Pandas?

10. Describe how Pandas can be used for feature engineering.

You might also like