lab record dev
lab record dev
: 1
DATE: Installing the data Analysis and Visualization Tools
AIM:
To install the data Analysis and Visualization tool: R/ Python /Tableau Public/ Power
BI.
PROGRAM 1:
# importing the pands package
import pandas as pd
# creating rows
hafeez = ['Hafeez', 19] aslan =
['Aslan', 21] kareem =
['Kareem', 18]
# pass those Series to the DataFrame #
passing columns as well
data_frame = pd.DataFrame([hafeez, aslan, kareem], columns = ['Name', 'Age']) #
displaying the DataFrame
print(data_frame)
OUTPUT
If you run the above program, you will get the following results. Name
Age
0 Hafeez 19
1 Aslan 21
2 Kareem 18
PROGRAM 2:
# importing the pyplot module to create graphs
import matplotlib.pyplot as plot
# importing the data using pd.read_csv() method data =
pd.read_csv('CountryData.IND.csv')
1
OUTPUT
If you run the above program, you will get the following results.
<matplotlib.axes._subplots.AxesSubplot at 0x25e363ea8d0>
RESULT:
The installation of the data Analysis and Visualization tool: R/ Python /Tableau
Public/ Power BI are succesfully completed.
2
EXP.NO.: 2
DATE: Exploratory Data Analysis (EDA) On Datasets Like Email Data Set
AIM:
To perform Exploratory Data Analysis (EDA) on datasets like email data set. Export all
your emails as a dataset, import them inside a pandas data frame, visualize them and get
different insights from the data.
PROGRAM:
Create a CSV file with only the required attributes:
with open('mailbox.csv', 'w') as outputfile:
writer =csv.writer(outputfile)
writer.writerow(['subject','from','date','to','label','thread'])
for message in mbox:
writer.writerow ([message['subject'], message['from'],
message['date'],
message['to'],
message['X-Gmail-Labels'],
message['X-GM-THRID']
The output of the preceding code is as follows:
subject object
from object date object
to object label object
thread float64
dtype: object
def plot_number_perdhour_per_year(df, ax, label=None, dt=1,
smooth=False,
weight_fun=None, **plot_kwargs):
3
if weight_fun is None:
weights = 1 / (np.ones_like(tod) * Ty * 365.25 / dt) else:
weights = weight_fun(df) if
smooth:
hst, xedges = np.histogram(tod, bins=bins, weights=weights); x =
np.delete(xedges, -1) + 0.5*(xedges[1] - xedges[0])
hst = ndimage.gaussian_filter(hst, sigma=0.75) f =
interp1d(x, hst, kind='cubic')
x = np.linspace(x.min(), x.max(), 10000) hst = f(x)
ax.set_yticklabels([datetime.datetime.strptime(str(int(np.mod(ts, 24))),
"%H").strftime("%I %p")
for ts in ax.get_yticks()]);
4
OUTPUT
RESULT:
Thus the above program was executed succesfully.
5
EXP.NO.: 3
Working with Numpy arrays, Pandas data frames, Basic plots using
DATE: Matplotlib.
AIM:
To Work with Numpy arrays, Pandas data frames, Basic plots using Matplotlib.
PROGRAM 1:
import numpy as np
from matplotlib import pyplot as plt
x = np.arange(1,11) y = 2
*x+5
plt.title("Matplotlib demo")
plt.xlabel("x axis caption")
plt.ylabel("y axis caption")
plt.plot(x,y)
plt.show()
OUTPUT
The above code should produce the following output −
PROGRAM 2:
import pandas as pd
import matplotlib.pyplot as plt
6
# creating a DataFrame with 2 columns
dataFrame = pd.DataFrame(
{
"Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
"Reg_Price": [2000, 2500, 2800, 3000, 3200, 3500],
"Units": [100, 120, 150, 170, 180, 200]
}
)
OUTPUT
This will produce the following output −
RESULT:
Thus the above program was executed succesfully.
7
EXP.NO.: 4
Explore Various Variable And Row Filters In R For Cleaning Data.
DATE:
Apply Various Plot Features In R On Sample Data Sets And
Visualize.
AIM:
To explore various variable and row filters in R for cleaning data. Apply various
plot features in R on sample data sets and visualize.
PROCEDURE:
install.packages("data.table") # Install data.table package
library("data.table") # Load data.table
We also create some example data.
dt_all <- data.table(x = rep(month.name[1:3], each = 3), y =
rep(c(1, 2, 3), times = 3),
z = rep(c(TRUE, FALSE, TRUE), each = 3)) # Create data.table
head(dt_all)
Table 1
x y z
1 January 1 TRUE
2 January 2 TRUE
3 January 3 TRUE
4 February 1 FALSE
5 February 2 FALSE
6 February 3 FALSE
8
Table 2
x y z
1 February 1 FALSE
2 February 2 FALSE
3 February 3 FALSE
Filter Rows by Column Values
In this example, I’ll demonstrate how to select all those rows of the example data for
which column x is equal to February. With the use of %in%, we can choose a set of values
of x. In this example, the set only contains one value.
dt_all[x %in% month.name[c(2)], ] # Rows where x is February
Table 2
x y z
1 February 1 FALSE
2 February 2 FALSE
3 February 3 FALSE
Table 3
x y z
1 February 1 FALSE
RESULT:
Thus the above program was executed succesfully.
9
EXP.NO.: 5
DATE: Performing Time Series Analysis And Apply The Various Visualization
Techniques.
AIM:
To perform Time Series Analysis and apply the various visualization Techniques.
PROGRAM:
import matplotlib as mpl import
matplotlib.pyplot as plt import
seaborn as sns
import numpy as np
import pandas as pd
plt.rcParams.update({'figure.figsize': (10, 7), 'figure.dpi': 120}) #
Import as Dataframe
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'])
df.head()
Date Value
0 1991-07-01 3.526591
1 1991-08-01 3.180891
2 1991-09-01 3.252221
3 1991-10-01 3.611003
4 1991-11-01 3.565869
# Time series data source: fpp pacakge in R.
import matplotlib.pyplot as plt
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')
# Draw Plot
def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):
plt.figure(figsize=(16,5), dpi=dpi)
plt.plot(x, y, color='tab:red')
plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)
plt.show()
10
plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales in Australia
from 1992 to 2008.')
OUTPUT
RESULT:
Thus the above program was executed succesfully.
11
EXP.NO.: 6
DATE: Performing Data Analysis and representation on a Map using
various Map data sets with Mouse Rollover effect, user interaction.
AIM:
To perform Data Analysis and representation on a Map using various Map data sets with
Mouse Rollover effect, user interaction.
PROGRAM:
12
OUTPUT
RESULT:
Thus the above program was executed succesfully.
13
EXP.NO.: 7
DATE: Building Cartographic Visualization For Multiple Datasets Involving
Various Countries Of The World
AIM:
To build cartographic visualization for multiple datasets involving various countries of
the world.
PROGRAM:
alt.Chart(zipcodes).transform_filter (
'-150 < datum.longitude && 22 < datum.latitude && datum.latitude < 55'
). transform_calculate(
digit='datum.zip_code[0]'
).mark_line( strokeWidth
=0.5
).encode( longitude='longitude:Q',
latitude='latitude:Q',
color='digit:N',
order='zip_code:O'
).project( type='albersUs
a'
).properties( width=
900, height=500
).configure_view( stroke
=None
)
OUTPUT
14
alt.layer(
alt.Chart(alt.topo_feature(usa, 'states')).mark_geoshape(
fill='#ddd', stroke='#fff', strokeWidth=1
),
alt.Chart(airports).mark_circle(size=9).encode( latitud
e='latitude:Q', longitude='longitude:Q',
tooltip='iata:N'
)
).project( type='albersUs
a'
).properties(
width=900,
height=500
).configure_view( stroke
=None
)
OUTPUT
RESULT:
Thus the above program was executed succesfully.
15
EXP.NO.: 8
DATE: Performing EDA on Wine Quality Data Set
AIM:
To perform EDA on Wine Quality Data Set.
PROGRAM:
#importing libraries
import numpy as np
import pandas as pd
importmatplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [4]: 1 #features in data
df.columns
Out [4]: Index([‘fixed acidity’, volatile acidity’, ‘citric acid’, ‘residual su gar’,
;chlorides’, ‘free sulfur dioxide’, total sulfur dioxide’, ‘den sity’,
‘pH’, ‘sulphates’, ‘alcohol’, ‘quality’],
dtype=’object’)
In [5]: #few datapoints
df.head( )
In [13]: sns.catplot(x=‘quality’,data=df,kind=‘count’)
16
OUTPUT
RESULT:
Thus the above program was executed succesfully.
17
EXP.NO.: 9
DATE: Using A Case Study On A Data Set And Apply The Various EDA And
Visualization Techniques And Present An Analysis Report
AIM:
To use a case study on a data set and apply the various EDA and visualization techniques
and present an analysis report.
PROGRAM:
import datetime
import math
import pandas as pd import
random import radar
from faker import Faker fake =
Faker()
def generateData(n): listdata = []
start = datetime.datetime(2019, 8, 1)
end = datetime.datetime(2019, 8, 30) delta = end
- start
for _ in range(n):
18
Date Price
2019-08-01 999.598900
2019-08-02 957.870150
2019-08-04 978.674200
2019-08-05 963.380375
2019-08-06 978.092900
2019-08-07 987.847700
2019-08-08 952.669900
2019-08-10 973.929400
2019-08-13 971.485600
2019-08-14 977.036200
listdata.append([date, price])
df = pd.DataFrame(listdata, columns = ['Date', 'Price']) df['Date']
= pd.to_datetime(df['Date'], format='%Y-%m-%d') df =
df.groupby(by='Date').mean()
import matplotlib.pyplot as plt
19
OUTPUT
RESULT:
Thus the above program was executed succesfully.
20