0% found this document useful (0 votes)
20 views

Labdev

dev lab manual

Uploaded by

pushparajp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Labdev

dev lab manual

Uploaded by

pushparajp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 57

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND

DATA SCIENCE

B.Tech. - Artificial Intelligence and Data

Science Anna University

Regulation: 2021 AD3301- Data

Exploration and Visualization

II Year/III

Semester Lab

Manual
Ex.No:1 Install the data Analysis and Visualization tool : Python
Aim:
To install the data analysis and visualization tool.

Install Python:

Step 1: Step 1: Download Python


1. Open a web browser and go to the Python download page for Windows.
2. You will see the latest version of Python available for download. Click the download button to
save the installer (.exe file) to your computer.
Step 2: Follow the installation instructions for your operating system (Windows, macOS, or Linux).
1. Locate the downloaded installer file (usually in your Downloads folder) and double-click it to
start the installation process.
2. Important: Before you proceed, make sure to check the box that says "Add Python to PATH".
This option is essential because it allows you to run Python commands from any location in the
command prompt.
3. Select "Install Now" to proceed with the default installation, which includes Python and a few
basic tools like pip (Python's package manager) and the Python documentation.
o Alternatively, you can choose "Customize installation" to select additional options or change
the installation location, but the default installation is typically sufficient for most users.
The installation process will take a few moments. Once complete, a confirmation screen will
appear. You can close the installer after that
Step 3: Verify the Installation
1. Open the Command Prompt by pressing Win + R, typing cmd, and hitting Enter.

In the command prompt, type


Python --version

Step 4: Install Data Analysis Libraries (NumPy, pandas, and Matplotlib)


1. With Python installed, you can now use pip, Python's package installer, to install essential data analysis
libraries.
2. In the Command Prompt, type the following command to install the libraries:

pip install numpy pandas matplotlib


3. This command will download and install NumPy, pandas, and Matplotlib — libraries commonly used for data
analysis and visualization. You will see messages indicating the progress of the installation.
4. After installation, verify that the libraries are installed by running a simple import test in Python:
o Open the Python interpreter by typing python in the Command Prompt and pressing Enter.
o Then, type the following lines one by one:

import numpy
import pandas
import matplotlib.pyplot as plt
If no errors are displayed, the libraries are correctly installed.

Install a Python IDE (Integrated Development Environment):

Installing Jupyter Notebook using Anaconda:

Anaconda is an open-source software that contains Jupyter, spyder, etc that are used for large
data processing, data analytics, heavy scientific computing.
Anaconda works for R and python programming language.
Package versions are managed by the package management system called conda.
To install Jupyter using Anaconda, just go through the following instructions:
Steps for Anaconda installation:

LAUNCH ANACONDA NAVIGATOR:


Click on the Install Jupyter Notebook Button:

Beginning the Installation:


Loading Packages:

Finished Installation:
Launching Jupyter:
Result:
Thus data analysis and visualization tool was installed successfully.
Ex.No.2 Perform exploratory data analysis (EDA) on Email data set
Aim:
To perform exploratory data analysis (EDA) on email data sets using python.
Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline
import seaborn as sns
df = pd.read_csv("C:/Users/Administrator/Desktop/EDA-AIDS/EDA-AIDS-Lab
Manual/manickam_email.csv")

df.head()
df.describe(include='all')

df.info()
plt.show()

sns.countplot(x='Labels', data=df,)
plt.xticks(rotation=90)
plt.show()
df['From'].value_counts().plot(kind='bar', title='From', figsize=(16,9))
plt.xticks(rotation=90)
plt.show()

df['Date'] = pd.to_datetime(df['Date'])
df['Date'].value_counts().plot(kind='bar', title='Datewise email', figsize=(16,9))
plt.xticks(rotation=90)
plt.show()
df['Labels'].value_counts().plot(kind='bar', title='Labels distribution', figsize=(16,9))
plt.xticks(rotation=90)
plt.show()
plt.plot(df['From'],df['Date'])
plt.xticks(rotation=90)
plt.show()
df['From'].value_counts().plot(kind='pie', autopct='%1.1f%%')
plt.axis('equal')
plt.show()

Result:
Thus Exploratory Data Analysis (EDA) on email data sets was performed using python.
Ex.No.3 Working with Numpy arrays, Panda’s data frames, Basic plots using
Matplotlib
Aim:
To write a python program to work with numpy arrays, pandas data
frames,basic plots using matplotlib.

Program:
NUMPY
PANDAS
import pandas as pd
df = pd.read_csv("weather_by_cities.csv")
g = df.groupby("city") g

for city, data in g: print("city:",city) print("\n")


print("data:",data)
g.size()

import pandas as pd
df = pd.read_excel("survey.xls")
MATPLOT LIB
Result:
Thus a Python program to work with Numpy array, Pandas data frames, Basic plots
using Matplotlib was written and executed successfully.
Ex.No.4 Explore various variable and row filters in R for cleaning data

Aim:
To explore various variable and row filters in R for cleaning data.

Program:
Result:
Thus Exploring various variable and row filters in R for cleaning data was done successfully
EX.NO 5 Perform Time Series Analysis and apply the various visualization
Techniques

Aim:
To write a python program to perform time series analysis and apply the various
visualization techniques

Program:
!pip install pandas numpy matplotlib gitpython statsmodels seaborn
!git clone https://github.com/Neelu-Tiwari/Dataset.git datasets/timestamp

import pandas as pd import numpy as np


import matplotlib.pyplot as plt mport seaborn as sns
%matplotlib inline

import statsmodels.api as sm
from statsmodels.tsa.seasonal import seasonal_decompose from
statsmodels.graphics.tsaplots import plot_acf

url = 'datasets/timestamp/stock_data.csv' df = pd.read_csv(url,


parse_dates=True)
df.drop(columns=['Unnamed: 0','Name'], inplace=True) df.head()

df.plot(subplots=True, figsize=(5,5))
plt.show()
# decomposition

close = seasonal_decompose(df['Close'], model='multiplicative', period = 500) trend = close.trend


seasonal = close.seasonal residual = close.resid

# trend analysis plt.figure(figsize=(8,2)) plt.plot(trend,


label='Trend') plt.legend(loc='best') plt.show()
# seasonality analysis
plt.figure(figsize=(8,2))
plt.plot(seasonal,label='Seasonality')
plt.legend(loc='best')
plt.show()

# residuals
plt.figure(figsize=(8,2))
plt.plot(residual, label='Residuals')
plt.legend(loc='best')
plt.tight_layout()
plt.show()

# autocorrelation
plot_acf(df['Close'])
plt.show()

Result:
To write a python program to perform time series analysis and apply the various
visualization techniques.
Ex.No.6 Perform Data Analysis and representation on Map Using various map data
sets with Mouse Rollover Effect and user interaction.

Aim:
To perform data analysis and representation on Map using various map data sets with Mouse
Rollover Effect and user interaction.

Program:
Result:
Thus Data analysis and representation on Map using various map data sets with Mouse Rollover
Effect and user interaction was performed .
Ex.No.7 To build cartographic visualization for multiple datasets

Aim:
To build cartographic visualization for multiple datasets.

Program:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap

plt.figure(figsize=(5, 5))
m = Basemap(projection='ortho', lat_0=20, lon_0=78)
#Africa
#m = Basemap(projection='ortho', lat_0=50, lon_0=-100)
m.drawcoastlines()
m.bluemarble(scale=0.5);
plt.show()

fig = plt.figure(figsize=(12,9))

m = Basemap(projection='mill',
llcrnrlat = 0,
urcrnrlat = 90,
llcrnrlon = 0,
urcrnrlon = 180,
resolution = 'c')

m.drawcoastlines()
fig = plt.figure(figsize=(12,9))

m = Basemap(projection='mill',
llcrnrlat = -90,
urcrnrlat = 90,
llcrnrlon = -180,
urcrnrlon = 180,
resolution = 'c')

m.drawcoastlines()
m.drawcountries(color='red')
m.drawstates(color='blue')
fig = plt.figure(figsize=(12,9))

m = Basemap(projection='mill',
llcrnrlat = 0,
urcrnrlat = 90,
llcrnrlon = 0,
urcrnrlon = 180,
resolution = 'c')
m.drawrivers(color='blue')

fig = plt.figure(figsize=(12,9))

m = Basemap(projection='mill',
llcrnrlat = -90,
urcrnrlat = 90,
llcrnrlon = -180,
urcrnrlon = 180,
resolution = 'c')
m.drawmapboundary(color='pink', linewidth=10, fill_color='aqua')
m.fillcontinents(color='lightgreen', lake_color='aqua')
fig = plt.figure(figsize=(12,9))

m = Basemap(projection='mill',
llcrnrlat = -90,
urcrnrlat = 90,
llcrnrlon = -180,
urcrnrlon = 180,
resolution = 'c')
m.drawlsmask(land_color='red', ocean_color='aqua', lakes=True)
fig = plt.figure(figsize=(12,9))
m.etopo()

fig = plt.figure(figsize=(12,9)) m =

Basemap(projection='mill',
llcrnrlat = -90,
urcrnrlat = 90,
llcrnrlon = -180,
urcrnrlon = 180, resolution = 'c')
m.drawcoastlines()
m.drawparallels(np.arange(-90,90,10),labels=[True,False,False,False])
m.drawmeridians(np.arange(-180,180,30),labels=[0,0,0,1])

#np.arange(start,stop,step)
#labels=[left,right,top,bottom]

plt.title('Basemap tutorial', fontsize=20)

plt.show()
fig = plt.figure(figsize=(12,9))

m = Basemap(projection='mill',
llcrnrlat = -90,
urcrnrlat = 90,
llcrnrlon = -180,
urcrnrlon = 180,
resolution = 'c')

m.drawcoastlines()
m.drawparallels(np.arange(-90,90,10),labels=[True,False,False,False])
m.drawmeridians(np.arange(-180,180,30),labels=[0,0,0,1])

m.scatter(85,12,latlon=True, s=500, c='red', marker='o', alpha=1, edgecolor='k', linewidth=1,


zorder=2) m.scatter(-135,60,latlon=True, s=5000, c='blue', marker='^', alpha=1, edgecolor='k',
linewidth=1,
zorder=1)

#np.arange(start,stop,step)
#labels=[left,right,top,bottom]

plt.title('Basemap tutorial', fontsize=20)

plt.show()
import pandas as pd

d = pd.read_csv("states.csv")
print(d)
fig = plt.figure(figsize=(12,9))

m = Basemap(projection='mill',
llcrnrlat = -90,
urcrnrlat = 90,
llcrnrlon = -180,
urcrnrlon = 180,
resolution = 'c')

m.drawcoastlines()

m.drawparallels(np.arange(-90,90,10),labels=[True,False,False,False])
m.drawmeridians(np.arange(-180,180,30),labels=[0,0,0,1])

sites_lat_y = d['latitude'].tolist()
sites_lon_x = d['longitude'].tolist()
m.scatter(sites_lon_x, sites_lat_y, latlon=True, s=5, c='blue', marker='o')
plt.title('Basemap tutorial', fontsize=20)
plt.show()

Result:
Thus cartographic visualization for multiple datasets was built successfully.
Ex.No.8 Perform EDA on Wine Quality

Aim:
To perform EDA on wine quality dataset.

Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings as wr
wr.filterwarnings('ignore')
sns.countplot(x='quality',data=df)

sns.swarmplot(x="quality",y="alcohol",data=df)
sns.violinplot(x="quality",y="density",data=d
f)

sns.violinplot(x="quality",y="alcohol",data=df)
#lets see whether our data has outliers or not: # create box plots

fig, ax = plt.subplots(ncols=6, nrows=2, figsize=(20,10)) index = 0


ax = ax.flatten()

for col, value in df.items():


sns.boxplot(y=col, data=df, color='b', ax=ax[index]) index += 1
plt.tight_layout(pad=0.5, w_pad=0.7, h_pad=5.0)
#Method 2
plt.figure(figsize=(15,10))
sns.heatmap(df.corr(), annot=True, fmt='.2f', linewidths=2)

Method 1
sns.pairplot(df)
Result:
Thus EDA on wine quality dataset was performed successfully.
Ex.No.9 Use a case study on a data set and apply the various EDA and
visualization techniques and present analysis report.

Aim:
To use a case study on a data set and apply the various EDA and visualization techniques and present
analysis report.

Program:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv('employees.csv')
df.head()

df.shape

df.describe()
df.info()

df['Start Date'] = pd.to_datetime(df['Start Date'])


df.nunique()

df["Gender"].fillna("No Gender", inplace = True)


df.isnull().sum()
sns.histplot(x='Salary', data=df, )
plt.show()

sns.boxplot( x="Salary", y='Team', data=df, )


plt.show()
sns.scatterplot( x="Salary", y='Team', data=df, hue='Gender', size='Bonus %')
plt.legend(bbox_to_anchor=(1, 1), loc=2)
plt.show()

df = pd.read_csv("Iris.csv")
sns.pairplot(df.drop(['petalwidth'], axis = 1)s, hue='petallength', height=2)

Result:
Thus EDA and Data Visualization techniques as present an analysis report was done
successfully.

You might also like