0% found this document useful (0 votes)
2 views

BasicAnalysis Using PYTHON

This document provides a guide for performing basic data analysis using Python with libraries such as Pandas, NumPy, Matplotlib, and Seaborn. It covers installation of libraries, data loading, exploration, cleaning, basic analysis, visualization, and saving cleaned data. The guide includes code examples for each step to help users get started with their data analysis tasks.

Uploaded by

shreyassurve161
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

BasicAnalysis Using PYTHON

This document provides a guide for performing basic data analysis using Python with libraries such as Pandas, NumPy, Matplotlib, and Seaborn. It covers installation of libraries, data loading, exploration, cleaning, basic analysis, visualization, and saving cleaned data. The guide includes code examples for each step to help users get started with their data analysis tasks.

Uploaded by

shreyassurve161
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

To perform basic analysis using Python, you'll primarily use libraries like

Pandas, NumPy, and Matplotlib or Seaborn for data handling, manipulation,


and visualization.

Here's a simple guide to get you started.

1. Install Required Libraries

If you don't already have the libraries installed, you can install them using
pip:

code

pip install pandas numpy matplotlib seaborn

2. Loading Data

First, import the necessary libraries and load the data. You can load data
from various formats like CSV, Excel, etc.

Example for loading a CSV file:

python

import pandas as pd

Load dataset

df = pd.read_csv('your_data.csv')

3. Explore the Data

You can perform some basic exploration to understand the data.

- Check the first few rows of the dataset:

python

df.head()
```

- Get basic info about data types and missing values:

python

df.info()

- Get summary statistics:

python

df.describe()

4. Data Cleaning

This step often involves handling missing data, duplicates, or fixing data
types.

- Handle missing data by filling or dropping:

python

df.fillna(0, inplace=True) # Fill missing values with 0

df.dropna(inplace=True) # Drop rows with missing values

- Drop duplicates :

python

df.drop_duplicates(inplace=True)

5. Basic Analysis

You can begin with basic descriptive statistics and visualizations.

a. Descriptive Statistics
- Mean, median, mode:

python

mean_value = df['column_name'].mean()

median_value = df['column_name'].median()

mode_value = df['column_name'].mode()[0]

- Value counts (for categorical variables):

python

df['category_column'].value_counts()

b. Group By Analysis

You can group data by a particular column and perform aggregate


operations.

python

grouped_data = df.groupby('category_column')['numerical_column'].sum()

c. Correlation

Check the correlation between numerical features.

python

correlation_matrix = df.corr()

print(correlation_matrix)

6. Basic Data Visualization

Visualization is key to data analysis.

a. Histograms
To visualize the distribution of a column:

python

import matplotlib.pyplot as plt

df['column_name'].hist()

plt.show()b. Scatter Plot

To check the relationship between two variables:

python

df.plot(kind='scatter', x='column1', y='column2')

plt.show()

c. Box Plot

To identify outliers:

python

df.boxplot(column='numerical_column')

plt.show()

d. Correlation Heatmap (using Seaborn)

For a more visual representation of correlation:

python

import seaborn as sns

sns.heatmap(df.corr(), annot=True, cmap='coolwarm')

plt.show()

7. Saving Cleaned Data

After cleaning and analysis, you might want to save the processed data.
python

df.to_csv('cleaned_data.csv', index=False)

Example Workflowpython

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

Load data

df = pd.read_csv('your_data.csv')

Basic exploration

print(df.head())

print(df.info())

print(df.describe())

Handle missing values

df.fillna(0, inplace=True)

Descriptive statistics

print(df['age'].mean()) # Example for 'age' column

print(df['category'].value_counts()) # For categorical data

Visualize data

df['age'].hist()

plt.show()

sns.heatmap(df.corr(), annot=True, cmap='coolwarm')


plt.show()

This workflow should get you started on basic data analysis using Python!
You can further enhance this by using more advanced libraries like SciPy for
statistical analysis or StatsModels for regression and other statistical
models.

You might also like