Data Exploration and Visualization Laboratory - AD3301 - Lab Manual
Data Exploration and Visualization Laboratory - AD3301 - Lab Manual
SCIENCE
II Year/III Semester
Lab Manual
Ex.No:1
Date: Install the data Analysis and Visualization tool : Python
Aim:
To install the data analysis and visualization tool.
Anaconda is an open-source software that contains Jupyter, spyder, etc that are used for large data
processing, data analytics, heavy scientific computing. Anaconda works for R and python
programming language. Spyder(sub-application of Anaconda) is used for python. Opencv for
python will work in spyder. Package versions are managed by the package management system
called conda.
To install Jupyter using Anaconda, just go through the following instructions: Launch
Anaconda Navigator:
Click on the Install Jupyter Notebook Button:
Finished Installation:
Launching Jupyter:
Result:
Thus data analysis and visualization tool was installed successfully.
Ex.No.2
Date: Perform exploratory data analysis (EDA) on Email data set
Aim:
To perform exploratory data analysis (EDA) on email data sets using python.
Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
df = pd.read_csv("C:/Users/Administrator/Desktop/EDA-AIDS/EDA-AIDS-Lab
Manual/jemima_email.csv")
df.head()
df.describe(include='all')
df.info(
plt.show()
sns.countplot(x='Labels', data=df,)
plt.xticks(rotation=90)
plt.show()
df['From'].value_counts().plot(kind='bar', title='From', figsize=(16,9))
plt.xticks(rotation=90)
plt.show()
df['Date'] = pd.to_datetime(df['Date'])
df['Date'].value_counts().plot(kind='bar', title='Datewise email', figsize=(16,9))
plt.xticks(rotation=90)
plt.show()
df['Labels'].value_counts().plot(kind='bar', title='Labels distribution', figsize=(16,9))
plt.xticks(rotation=90)
plt.show()
plt.plot(df['From'],df['Date'])
plt.xticks(rotation=90)
plt.show()
df['From'].value_counts().plot(kind='pie', autopct='%1.1f%%')
plt.axis('equal')
plt.show()
Result:
Thus Exploratory Data Analysis (EDA) on email data sets was performed using python.
Ex.No.3
Date: Working with Numpy arrays, Panda’s data frames, Basic
plots using Matplotlib
Aim:
To write a python program to work with numpy arrays, pandas data frames,basic
plots using matplotlib.
Program:
NUMPY
PANDAS
import pandas as pd
df = pd.read_csv("weather_by_cities.csv")
g = df.groupby("city") g
import pandas as pd
df = pd.read_excel("survey.xls")
MATPLOT LIB
Result:
Thus a Python program to work with Numpy array, Pandas data frames, Basic plots
using Matplotlib was written and executed successfully.
Ex.No.4
Date: Explore various variable and row filters in R for cleaning data
Aim:
To explore various variable and row filters in R for cleaning data.
Program:
Result:
Thus Exploring various variable and row filters in R for cleaning data was done successfully
EX.NO 5
Date: Perform Time Series Analysis and apply the various visualization
Techniques
Aim:
To write a python program to perform time series analysis and apply the various
visualization techniques
Program:
!pip install pandas numpy matplotlib gitpython statsmodels seaborn
!git clone https://github.com/Neelu-Tiwari/Dataset.git datasets/timestamp
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import statsmodels.api as sm
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf
url = 'datasets/timestamp/stock_data.csv'
df = pd.read_csv(url, parse_dates=True)
df.drop(columns=['Unnamed: 0','Name'], inplace=True)
df.head()
df.plot(subplots=True, figsize=(5,5))
plt.show()
# decomposition
close = seasonal_decompose(df['Close'], model='multiplicative', period = 500)
trend = close.trend
seasonal = close.seasonal
residual = close.resid
# trend analysis
plt.figure(figsize=(8,2))
plt.plot(trend,
label='Trend')
plt.legend(loc='best')
plt.show()
# seasonality analysis
plt.figure(figsize=(8,2))
plt.plot(seasonal,label='Seasonality')
plt.legend(loc='best')
plt.show()
# residuals
plt.figure(figsize=(8,2))
plt.plot(residual, label='Residuals')
plt.legend(loc='best')
plt.tight_layout()
plt.show()
# autocorrelation
plot_acf(df['Close'])
plt.show()
Result:
To write a python program to perform time series analysis and apply the various
visualization techniques.
Ex.No.6
Date: Perform Data Analysis and representation on Map Using various map
data sets with Mouse Rollover Effect and user interaction.
Aim:
To perform data analysis and representation on Map using various map data sets with Mouse
Rollover Effect and user interaction.
Program:
Result:
Thus Data analysis and representation on Map using various map data sets with Mouse Rollover
Effect and user interaction was performed .
Ex.No.7
Aim:
Program:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
plt.figure(figsize=(5, 5))
m = Basemap(projection='ortho', lat_0=20, lon_0=78)
#Africa
#m = Basemap(projection='ortho', lat_0=50,
lon_0=-100) m.drawcoastlines()
m.bluemarble(scale
=0.5); plt.show()
fig = plt.figure(figsize=(12,9))
m = Basemap(projection='mill',
llcrnrlat = 0,
urcrnrlat = 90,
llcrnrlon = 0,
urcrnrlon = 180,
resolution = 'c')
m.drawcoastlines()
fig = plt.figure(figsize=(12,9))
m = Basemap(projection='mill',
llcrnrlat = -90,
urcrnrlat = 90,
llcrnrlon = -180,
urcrnrlon = 180,
resolution = 'c')
m.drawcoastlines()
m.drawcountries(color='red')
m.drawstates(color='blue')
fig = plt.figure(figsize=(12,9))
m = Basemap(projection='mill',
llcrnrlat = 0,
urcrnrlat = 90,
llcrnrlon = 0,
urcrnrlon = 180,
resolution = 'c')
m.drawrivers(color='blue')
fig = plt.figure(figsize=(12,9))
m = Basemap(projection='mill',
llcrnrlat = -90,
urcrnrlat = 90,
llcrnrlon = -180,
urcrnrlon = 180,
resolution = 'c')
m.drawmapboundary(color='pink', linewidth=10, fill_color='aqua')
m.fillcontinents(color='lightgreen', lake_color='aqua')
fig = plt.figure(figsize=(12,9)) m
= Basemap(projection='mill',
llcrnrlat = -90,
urcrnrlat = 90,
llcrnrlon = -180,
urcrnrlon = 180, resolution
= 'c')
m.drawlsmask(land_color='red', ocean_color='aqua', lakes=True)
fig = plt.figure(figsize=(12,9))
m.etopo()
fig = plt.figure(figsize=(12,9)) m =
Basemap(projection='mill',
llcrnrlat = -90,
urcrnrlat = 90,
llcrnrlon = -180,
urcrnrlon = 180, resolution = 'c')
m.drawcoastlines()
m.drawparallels(np.arange(-90,90,10),labels=[True,False,False,False])
m.drawmeridians(np.arange(-180,180,30),labels=[0,0,0,1])
#np.arange(start,stop,step)
#labels=[left,right,top,bottom]
plt.show()
fig = plt.figure(figsize=(12,9))
m = Basemap(projection='mill',
llcrnrlat = -90,
urcrnrlat = 90,
llcrnrlon = -180,
urcrnrlon = 180,
resolution = 'c')
m.drawcoastlines()
m.drawparallels(np.arange(-90,90,10),labels=[True,False,False,False])
m.drawmeridians(np.arange(-180,180,30),labels=[0,0,0,1])
#np.arange(start,stop,step)
#labels=[left,right,top,bottom]
plt.show()
import pandas as pd
d = pd.read_csv("states.csv")
print(d)
fig = plt.figure(figsize=(12,9))
m = Basemap(projection='mill',
llcrnrlat = -90,
urcrnrlat = 90,
llcrnrlon = -180,
urcrnrlon = 180,
resolution = 'c')
m.drawcoastlines()
m.drawparallels(np.arange(-90,90,10),labels=[True,False,False,False])
m.drawmeridians(np.arange(-180,180,30),labels=[0,0,0,1])
sites_lat_y = d['latitude'].tolist()
sites_lon_x = d['longitude'].tolist()
m.scatter(sites_lon_x, sites_lat_y, latlon=True, s=5, c='blue', marker='o')
plt.title('Basemap tutorial', fontsize=20)
plt.show()
Result:
Thus cartographic visualization for multiple datasets was built successfully.
Ex.No.8
Aim:
To perform EDA on wine quality dataset.
Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings as wr
wr.filterwarnings('ignore')
sns.countplot(x='quality',data=df)
sns.swarmplot(x="quality",y="alcohol",data=df)
sns.violinplot(x="quality",y="density",data=df)
sns.violinplot(x="quality",y="alcohol",data=df)
#lets see whether our data has outliers or not: # create
box plots
fig, ax = plt.subplots(ncols=6, nrows=2, figsize=(20,10)) index = 0
ax = ax.flatten()
Method 1
sns.pairplot(df)
Result:
Thus EDA on wine quality dataset was performed successfully.
Ex.No.9
Date:
Use a case study on a data set and apply the various EDA and visualization
techniques and present analysis report.
Aim:
To use a case study on a data set and apply the various EDA and visualization techniques and present
analysis report.
Program:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv('employees.csv')
df.head()
df.shape
df.describe()
df.info()
df = pd.read_csv("Iris.csv")
sns.pairplot(df.drop(['petalwidth'], axis = 1), hue='petallength', height=2)
Result:
Thus EDA and Data Visualization techniques as present an analysis report was done
successfully.