0% found this document useful (0 votes)
13 views

Day 4 Data Manipulation With Pandas

Uploaded by

Deep gaichor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Day 4 Data Manipulation With Pandas

Uploaded by

Deep gaichor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Day 4: Data Manipulation with Pandas

Introduction to Pandas: Pandas is a powerful Python library for data manipulation and
analysis. It provides data structures like Series and DataFrame, which are ideal for handling
structured data.

# Example of importing Pandas


import pandas as pd

# Creating a DataFrame from a dictionary


data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)
print(df)

DataFrames: Creation, Indexing, and Selection: DataFrames are two-dimensional labeled


data structures with columns of potentially different types. Indexing and selection operations
allow you to access specific rows and columns of a DataFrame.

# Example of indexing and selection in Pandas DataFrame


print(df['Name']) # Selecting a single column
print(df[['Name', 'Age']]) # Selecting multiple columns
print(df.iloc[0]) # Selecting a single row by index
print(df.loc[df['City'] == 'New York']) # Selecting rows based on a
condition

Data Cleaning: Handling Missing Data, Data Transformation: Pandas provides methods
for handling missing data, such as dropping or filling missing values. It also supports various
data transformation operations like merging, reshaping, and aggregating data.

# Example of handling missing data and data transformation


data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, None, 35, 40],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)
print(df.dropna()) # Drop rows with missing values
print(df.fillna(0)) # Fill missing values with a specified value

Output:

Name Age City


0 Alice 25 New York
2 Charlie 35 Chicago
3 David 40 Houston

Name Age City


0 Alice 25.0 New York
1 Bob 0.0 Los Angeles
2 Charlie 35.0 Chicago
3 David 40.0 Houston

Pandas is an essential tool for data manipulation and analysis in Python, and mastering its
usage is crucial for working with structured datasets effectively.
Day 4: Data Manipulation with Pandas
Introduction to Pandas
Pandas:

 Powerful library for data manipulation and analysis


 Built on top of Numpy
 Install with pip install pandas

Importing Pandas:
python
Copy code
import pandas as pd

DataFrames: Creation, Indexing, and Selection


Creating DataFrames:

 From a dictionary:

python
Copy code
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

 From a CSV file:

python
Copy code
df = pd.read_csv('data.csv')

Indexing:

 Default index starts at 0


 Setting a custom index:

python
Copy code
df.set_index('Name', inplace=True)

Selection:

 Selecting columns:
python
Copy code
df['Age']
df[['Name', 'City']]

 Selecting rows:

python
Copy code
df.iloc[0] # By position
df.loc['Alice'] # By index

 Conditional selection:

python
Copy code
df[df['Age'] > 30]

Data Cleaning: Handling Missing Data, Data


Transformation
Handling Missing Data:

 Identifying missing data:

python
Copy code
df.isnull().sum()

 Dropping missing data:

python
Copy code
df.dropna(inplace=True)

 Filling missing data:

python
Copy code
df.fillna(value=0, inplace=True)

Data Transformation:

 Adding new columns:

python
Copy code
df['Age_in_10_years'] = df['Age'] + 10

 Applying functions:

python
Copy code
df['Age_squared'] = df['Age'].apply(lambda x: x**2)

Example:
python
Copy code
import pandas as pd

# Creating a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, None],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Handling missing data


df['Age'].fillna(df['Age'].mean(), inplace=True)

# Data transformation
df['Age_in_10_years'] = df['Age'] + 10

print(df)

This concludes the note for Day 4: Data Manipulation with Pandas.

You might also like