0% found this document useful (0 votes)
11 views

lab record dev

Uploaded by

sdssumi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

lab record dev

Uploaded by

sdssumi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

EXP.NO.

: 1
DATE: Installing the data Analysis and Visualization Tools

AIM:
To install the data Analysis and Visualization tool: R/ Python /Tableau Public/ Power
BI.
PROGRAM 1:
# importing the pands package
import pandas as pd
# creating rows
hafeez = ['Hafeez', 19] aslan =
['Aslan', 21] kareem =
['Kareem', 18]
# pass those Series to the DataFrame #
passing columns as well
data_frame = pd.DataFrame([hafeez, aslan, kareem], columns = ['Name', 'Age']) #
displaying the DataFrame
print(data_frame)

OUTPUT
If you run the above program, you will get the following results. Name
Age
0 Hafeez 19
1 Aslan 21
2 Kareem 18

PROGRAM 2:
# importing the pyplot module to create graphs
import matplotlib.pyplot as plot
# importing the data using pd.read_csv() method data =
pd.read_csv('CountryData.IND.csv')

# creating a histogram of Time period


data['Time period'].hist(bins = 10)

1
OUTPUT
If you run the above program, you will get the following results.
<matplotlib.axes._subplots.AxesSubplot at 0x25e363ea8d0>

RESULT:
The installation of the data Analysis and Visualization tool: R/ Python /Tableau
Public/ Power BI are succesfully completed.
2
EXP.NO.: 2
DATE: Exploratory Data Analysis (EDA) On Datasets Like Email Data Set

AIM:
To perform Exploratory Data Analysis (EDA) on datasets like email data set. Export all
your emails as a dataset, import them inside a pandas data frame, visualize them and get
different insights from the data.

PROGRAM:
Create a CSV file with only the required attributes:
with open('mailbox.csv', 'w') as outputfile:
writer =csv.writer(outputfile)
writer.writerow(['subject','from','date','to','label','thread'])
for message in mbox:
writer.writerow ([message['subject'], message['from'],
message['date'],
message['to'],
message['X-Gmail-Labels'],
message['X-GM-THRID']
The output of the preceding code is as follows:
subject object
from object date object
to object label object
thread float64
dtype: object
def plot_number_perdhour_per_year(df, ax, label=None, dt=1,
smooth=False,
weight_fun=None, **plot_kwargs):

tod = df[df['timeofday'].notna()]['timeofday'].values year =


df[df['year'].notna()]['year'].values
Ty = year.max() - year.min() T
= tod.max() - tod.min() bins = int(T
/ dt)

3
if weight_fun is None:
weights = 1 / (np.ones_like(tod) * Ty * 365.25 / dt) else:
weights = weight_fun(df) if
smooth:
hst, xedges = np.histogram(tod, bins=bins, weights=weights); x =
np.delete(xedges, -1) + 0.5*(xedges[1] - xedges[0])
hst = ndimage.gaussian_filter(hst, sigma=0.75) f =
interp1d(x, hst, kind='cubic')
x = np.linspace(x.min(), x.max(), 10000) hst = f(x)

ax.plot(x, hst, label=label, **plot_kwargs) else: ax.hist(tod,


bins=bins, weights=weights, label=label,
**plot_kwargs); ax.grid(ls=':',
color='k')
orientation = plot_kwargs.get('orientation')
if orientation is None or orientation == 'vertical':
ax.set_xlim(0, 24)
ax.xaxis.set_major_locator(MaxNLocator(8))
ax.set_xticklabels([datetime.datetime.strptime(str(int(np.mod(ts, 24))),
"%H").strftime("%I %p")
for ts in ax.get_xticks()]); elif
orientation == 'horizontal':
ax.set_ylim(0, 24)
ax.yaxis.set_major_locator(MaxNLocator(8))

ax.set_yticklabels([datetime.datetime.strptime(str(int(np.mod(ts, 24))),
"%H").strftime("%I %p")

for ts in ax.get_yticks()]);

4
OUTPUT

RESULT:
Thus the above program was executed succesfully.
5
EXP.NO.: 3
Working with Numpy arrays, Pandas data frames, Basic plots using
DATE: Matplotlib.

AIM:
To Work with Numpy arrays, Pandas data frames, Basic plots using Matplotlib.
PROGRAM 1:
import numpy as np
from matplotlib import pyplot as plt
x = np.arange(1,11) y = 2
*x+5
plt.title("Matplotlib demo")
plt.xlabel("x axis caption")
plt.ylabel("y axis caption")
plt.plot(x,y)
plt.show()

OUTPUT
The above code should produce the following output −

PROGRAM 2:
import pandas as pd
import matplotlib.pyplot as plt

6
# creating a DataFrame with 2 columns
dataFrame = pd.DataFrame(
{
"Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
"Reg_Price": [2000, 2500, 2800, 3000, 3200, 3500],
"Units": [100, 120, 150, 170, 180, 200]
}
)

# plot a line graph


plt.plot(dataFrame["Reg_Price"], dataFrame["Units"])
plt.show()

OUTPUT
This will produce the following output −

RESULT:
Thus the above program was executed succesfully.
7
EXP.NO.: 4
Explore Various Variable And Row Filters In R For Cleaning Data.
DATE:
Apply Various Plot Features In R On Sample Data Sets And
Visualize.

AIM:
To explore various variable and row filters in R for cleaning data. Apply various
plot features in R on sample data sets and visualize.

PROCEDURE:
install.packages("data.table") # Install data.table package
library("data.table") # Load data.table
We also create some example data.
dt_all <- data.table(x = rep(month.name[1:3], each = 3), y =
rep(c(1, 2, 3), times = 3),
z = rep(c(TRUE, FALSE, TRUE), each = 3)) # Create data.table
head(dt_all)

Table 1

x y z
1 January 1 TRUE
2 January 2 TRUE
3 January 3 TRUE
4 February 1 FALSE

5 February 2 FALSE
6 February 3 FALSE

Filter Rows by Column Values


In this example, I’ll demonstrate how to select all those rows of the example data for
which column x is equal to February. With the use of %in%, we can choose a set of values
of x. In this example, the set only contains one value.
dt_all[x %in% month.name[c(2)], ] # Rows where x is February

8
Table 2
x y z
1 February 1 FALSE
2 February 2 FALSE
3 February 3 FALSE
Filter Rows by Column Values
In this example, I’ll demonstrate how to select all those rows of the example data for
which column x is equal to February. With the use of %in%, we can choose a set of values
of x. In this example, the set only contains one value.
dt_all[x %in% month.name[c(2)], ] # Rows where x is February

Table 2

x y z
1 February 1 FALSE
2 February 2 FALSE
3 February 3 FALSE

Filter Rows by Multiple Column Value


In the previous example, we addressed those rows of the example data for which one
column was equal to some value. In this example, we condition on the values of multiple
columns.
dt_all[x %in% month.name[c(2)] & y == 1, ] # Rows, where x is February and y is 1

Table 3
x y z
1 February 1 FALSE

RESULT:
Thus the above program was executed succesfully.
9
EXP.NO.: 5
DATE: Performing Time Series Analysis And Apply The Various Visualization
Techniques.

AIM:
To perform Time Series Analysis and apply the various visualization Techniques.

PROGRAM:
import matplotlib as mpl import
matplotlib.pyplot as plt import
seaborn as sns
import numpy as np
import pandas as pd
plt.rcParams.update({'figure.figsize': (10, 7), 'figure.dpi': 120}) #
Import as Dataframe
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'])
df.head()

Date Value
0 1991-07-01 3.526591
1 1991-08-01 3.180891
2 1991-09-01 3.252221
3 1991-10-01 3.611003
4 1991-11-01 3.565869
# Time series data source: fpp pacakge in R.
import matplotlib.pyplot as plt
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')
# Draw Plot
def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):
plt.figure(figsize=(16,5), dpi=dpi)
plt.plot(x, y, color='tab:red')
plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)
plt.show()

10
plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales in Australia
from 1992 to 2008.')

OUTPUT

RESULT:
Thus the above program was executed succesfully.

11
EXP.NO.: 6
DATE: Performing Data Analysis and representation on a Map using
various Map data sets with Mouse Rollover effect, user interaction.

AIM:
To perform Data Analysis and representation on a Map using various Map data sets with
Mouse Rollover effect, user interaction.

PROGRAM:

# 1. Draw the map background fig =


plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution='h',
lat_0=37.5, lon_0=-119,
width=1E6, height=1.2E6)
m.shadedrelief()
m.drawcoastlines(color='gray')
m.drawcountries(color='gray')
m.drawstates(color='gray')
# 2. scatter city data, with color reflecting population # and
size reflecting area
m.scatter(lon, lat, latlon=True,
c=np.log10(population), s=area,
cmap='Reds', alpha=0.5)
# 3. create colorbar and legend plt.colorbar(label=r'$\
log_{10}({\rm population})$') plt.clim(3, 7)
# make legend with dummy points for a
in [100, 300, 500]:
plt.scatter([], [], c='k', alpha=0.5, s=a,
label=str(a) + ' km$^2$')
plt.legend(scatterpoints=1, frameon=False,
labelspacing=1, loc='lower left');

12
OUTPUT

RESULT:
Thus the above program was executed succesfully.

13
EXP.NO.: 7
DATE: Building Cartographic Visualization For Multiple Datasets Involving
Various Countries Of The World

AIM:
To build cartographic visualization for multiple datasets involving various countries of
the world.

PROGRAM:
alt.Chart(zipcodes).transform_filter (
'-150 < datum.longitude && 22 < datum.latitude && datum.latitude < 55'
). transform_calculate(
digit='datum.zip_code[0]'
).mark_line( strokeWidth
=0.5
).encode( longitude='longitude:Q',
latitude='latitude:Q',
color='digit:N',
order='zip_code:O'

).project( type='albersUs
a'
).properties( width=
900, height=500
).configure_view( stroke
=None
)

OUTPUT

14
alt.layer(
alt.Chart(alt.topo_feature(usa, 'states')).mark_geoshape(
fill='#ddd', stroke='#fff', strokeWidth=1
),
alt.Chart(airports).mark_circle(size=9).encode( latitud
e='latitude:Q', longitude='longitude:Q',
tooltip='iata:N'
)
).project( type='albersUs
a'
).properties(
width=900,
height=500
).configure_view( stroke
=None
)

OUTPUT

RESULT:
Thus the above program was executed succesfully.

15
EXP.NO.: 8
DATE: Performing EDA on Wine Quality Data Set

AIM:
To perform EDA on Wine Quality Data Set.

PROGRAM:
#importing libraries

import numpy as np

import pandas as pd
importmatplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [4]: 1 #features in data

df.columns
Out [4]: Index([‘fixed acidity’, volatile acidity’, ‘citric acid’, ‘residual su gar’,
;chlorides’, ‘free sulfur dioxide’, total sulfur dioxide’, ‘den sity’,
‘pH’, ‘sulphates’, ‘alcohol’, ‘quality’],
dtype=’object’)
In [5]: #few datapoints

df.head( )

In [13]: sns.catplot(x=‘quality’,data=df,kind=‘count’)

Out [13]: <seaborn.axisgrid.facegrid at022b7de0dba8 ?? >

16
OUTPUT

RESULT:
Thus the above program was executed succesfully.

17
EXP.NO.: 9
DATE: Using A Case Study On A Data Set And Apply The Various EDA And
Visualization Techniques And Present An Analysis Report

AIM:
To use a case study on a data set and apply the various EDA and visualization techniques
and present an analysis report.

PROGRAM:
import datetime
import math
import pandas as pd import
random import radar
from faker import Faker fake =
Faker()
def generateData(n): listdata = []
start = datetime.datetime(2019, 8, 1)
end = datetime.datetime(2019, 8, 30) delta = end
- start
for _ in range(n):

date = radar.random_datetime(start='2019-08-1', stop='2019-08-


30').strftime("%Y-%m-%d")
price = round(random.uniform(900, 1000), 4)

18
Date Price

2019-08-01 999.598900

2019-08-02 957.870150

2019-08-04 978.674200

2019-08-05 963.380375

2019-08-06 978.092900

2019-08-07 987.847700

2019-08-08 952.669900

2019-08-10 973.929400

2019-08-13 971.485600

2019-08-14 977.036200

listdata.append([date, price])
df = pd.DataFrame(listdata, columns = ['Date', 'Price']) df['Date']
= pd.to_datetime(df['Date'], format='%Y-%m-%d') df =
df.groupby(by='Date').mean()
import matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] = (14, 10) plt.plot(df)

19
OUTPUT

And the plotted graph looks something like this:

RESULT:
Thus the above program was executed succesfully.

20

You might also like