BasicAnalysis Using PYTHON
BasicAnalysis Using PYTHON
If you don't already have the libraries installed, you can install them using
pip:
code
2. Loading Data
First, import the necessary libraries and load the data. You can load data
from various formats like CSV, Excel, etc.
python
import pandas as pd
Load dataset
df = pd.read_csv('your_data.csv')
python
df.head()
```
python
df.info()
python
df.describe()
4. Data Cleaning
This step often involves handling missing data, duplicates, or fixing data
types.
python
- Drop duplicates :
python
df.drop_duplicates(inplace=True)
5. Basic Analysis
a. Descriptive Statistics
- Mean, median, mode:
python
mean_value = df['column_name'].mean()
median_value = df['column_name'].median()
mode_value = df['column_name'].mode()[0]
python
df['category_column'].value_counts()
b. Group By Analysis
python
grouped_data = df.groupby('category_column')['numerical_column'].sum()
c. Correlation
python
correlation_matrix = df.corr()
print(correlation_matrix)
a. Histograms
To visualize the distribution of a column:
python
df['column_name'].hist()
python
plt.show()
c. Box Plot
To identify outliers:
python
df.boxplot(column='numerical_column')
plt.show()
python
plt.show()
After cleaning and analysis, you might want to save the processed data.
python
df.to_csv('cleaned_data.csv', index=False)
Example Workflowpython
import pandas as pd
Load data
df = pd.read_csv('your_data.csv')
Basic exploration
print(df.head())
print(df.info())
print(df.describe())
df.fillna(0, inplace=True)
Descriptive statistics
Visualize data
df['age'].hist()
plt.show()
This workflow should get you started on basic data analysis using Python!
You can further enhance this by using more advanced libraries like SciPy for
statistical analysis or StatsModels for regression and other statistical
models.