0% found this document useful (0 votes)
12 views

EDA With Pandas CheatSheet

Uploaded by

kollilokesh24
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

EDA With Pandas CheatSheet

Uploaded by

kollilokesh24
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Exploratory Data Analysis (EDA) with Pandas [CheatSheet]

### Importing Pandas

```python

import pandas as pd

```

### Loading Data

```python

df = pd.read_csv('file.csv') # Load a CSV file

df = pd.read_excel('file.xlsx') # Load an Excel file

df = pd.read_json('file.json') # Load a JSON file

```

### Basic Data Inspection

```python

df.head() # Display first 5 rows

df.tail() # Display last 5 rows

df.shape # Display number of rows and columns

df.info() # Display concise summary of the DataFrame

df.describe() # Generate descriptive statistics

df.columns # List all column names

df.dtypes # Display data type of each column

```

### Data Selection


```python

df['column_name'] # Select a single column

df[['col1', 'col2']] # Select multiple columns

df.iloc[0] # Select the first row

df.iloc[0:5] # Select the first 5 rows

df.loc[df['column'] > value] # Select rows based on column value condition

```

### Data Cleaning

```python

df.dropna() # Drop rows with missing values

df.fillna(value) # Fill missing values with a specific value

df.drop(columns=['col1', 'col2']) # Drop specific columns

df.rename(columns={'old_name': 'new_name'}) # Rename columns

df.duplicated() # Find duplicate rows

df.drop_duplicates() # Drop duplicate rows

```

### Data Transformation

```python

df['new_column'] = df['col1'] + df['col2'] # Create a new column

df.apply(lambda x: x + 1) # Apply a function to each element

df.groupby('column') # Group by a column

df.sort_values(by='column', ascending=False) # Sort by a column

df.pivot_table(index='col1', columns='col2', values='col3') # Pivot table

```
### Visualization (with Matplotlib)

```python

import matplotlib.pyplot as plt

df['column'].hist() # Histogram of a column

df.plot(kind='bar') # Bar plot

df.plot(kind='line') # Line plot

df.plot(kind='scatter', x='col1', y='col2') # Scatter plot

plt.show() # Display the plot

```

### Saving Data

```python

df.to_csv('file.csv', index=False) # Save DataFrame to a CSV file

df.to_excel('file.xlsx', index=False) # Save DataFrame to an Excel file

df.to_json('file.json') # Save DataFrame to a JSON file

```

You might also like