0% found this document useful (0 votes)
16 views

dsa-lab-manual (1)

The document outlines the syllabus for a Data Science and Analytics Laboratory course, focusing on developing data analytic code in Python and utilizing various libraries for data handling and visualization. It includes exercises on NumPy, Pandas, and Matplotlib, covering topics such as array manipulation, data frame creation, and basic plotting techniques. The course aims to equip students with practical skills in data analysis and visualization using Python tools.

Uploaded by

malathitagorecse
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

dsa-lab-manual (1)

The document outlines the syllabus for a Data Science and Analytics Laboratory course, focusing on developing data analytic code in Python and utilizing various libraries for data handling and visualization. It includes exercises on NumPy, Pandas, and Matplotlib, covering topics such as array manipulation, data frame creation, and basic plotting techniques. The course aims to equip students with practical skills in data analysis and visualization using Python tools.

Uploaded by

malathitagorecse
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 72

lOMoARcPSD|36439593

lOMoARcPSD|36439593

SYLLABUS

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY LTPC


0042
COURSE OBJECTIVES
 To develop data analytic code in python
 To be able to use python libraries for handling data
 To develop analytical applications using python
 To perform data visualization using plots

Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh
Suggested Exercises:

1. Working with Pandas data frames


2. Basic plots using Matplotlib
3. Frequency distributions, Averages, Variability
4. Normal curves, Correlation and scatter plots, Correlation coefficient
5. Regression
6. Z-test
7. T-test
8. ANOVA
9. Building and validating linear models
10. Building and validating logistic
models 11.Time Series Analysis

TOTAL : 60 PERIODS
HARDWARE:
 Standalone Desktops with Windows OS

SOFTWARE:
 Python with statistical Packages

viii

Downloaded
lOMoARcPSD|36439593

EX NO: 1A WORKING WITH NUMPY ARRAYS

NUMPY:

NumPy is a Python library used for working with arrays .It also has functions for working
in domain of linear algebra, fourier transform, and matrices. NumPy was created in 2005 by Travis
Oliphant. It is an open source project and you can use it freely. NumPy stands for Numerical
Python.

It is a general-purpose array-processing package. It provides a high-performance


multidimensional array object, and tools for working with these arrays.

It is the fundamental package for scientific computing with Python. It contains various
features including these important ones:

 A powerful N-dimensional array object


 Sophisticated (broadcasting) functions
 Tools for integrating C/C++ and Fortran code
 Useful linear algebra, Fourier transform, and random number capabilities

AIM
Write a Python program to demonstrate basic array characteristics.

ALGORITHM
Step1: Start
Step2: Import numpy module

Step3: Print the basic characteristics of


array Step4: Stop

PROGRAM
importnumpy as np
# Creating array object
arr = np.array( [[ 1, 2,
3],
[ 4, 2, 5]] )

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

# Printing type of arr object


print("Array is of type: ",
type(arr)) # Printing array
dimensions (axes)
print("No. of dimensions: ",
arr.ndim) # Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of
array print("Size of array: ", arr.size)

# Printing type of elements in array

print("Array stores elements of type: ", arr.dtype)

OUTPUT
Array is of type: <class
'numpy.ndarray'> No. of dimensions: 2
Shape of array: (2,
3) Size of array: 6
Array stores elements of type: int32

RESULT
Thus the python program working with NumPy array has been implemented and executed
successfully.

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

EX.NO :1B PROGRAM TO PERFORM ARRAY SLICING

SLICING:
Similar to Python lists, numpy arrays can be sliced. Since arrays may be
multidimensional, we mustspecify a slice for each dimension of the array

AIM
Write a Python Program to Perform Array Slicing.

ALGORITHM
Step1: Start

Step2: import numpy module


Step3: Create an array and apply the slicing
operator Step4: Print the output
Step5: Stop

PROGRAM
Importnumpy as np

a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print(a)

print("After
slicing") print(a[1:])

OUTPUT
[[1 2 3]
[3 4 5]
[4 5 6]]

After slicing
[3 4 5] [4 5 6]]

RESULT
Thus the python program to perform array slicing has been implemented and
executed successfully.

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

EX NO : 1C PROGRAM TO PERFORM ARRAY SLICING

AIM
Write a Python Program to Perform Array Slicing.

ALGORITHM
Step1: Start
Step2: import numpy module
Step3: Create an array and apply the slicing
operator Step4: Print the output

Step5: Stop

PROGRAM
# array to begin
with import numpy
as np
a = np.array([[1,2,3],[3,4,5],[4,5,6]])

print('Our array
is:' ) print(a)
# this returns array of items in the second
column print('The items in the second column
are:' ) print(a[...,1])
print('\n' )

# Now we will slice all items from the second


row print ('The items in the second row are:' )
print(a[1,...])
print('\n' )

# Now we will slice all items from column 1


onwards print('The items column 1 onwards are:' )
print(a[...,1:])

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

OUTPUT:
Our array is:
[[1 2 3]

[3 4 5]
[4 5 6]]

The items in the second column are:


[2 4 5]

The items in the second row are:

[3 4 5]
The items column 1 onwards are:

[[2 3]
[4 5]
[5 6]]

RESULT
Thus the python program to perform array slicing has been implemented and executed
successfully.

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

EX NO :2A WORKING WITH PANDAS DATA FRAME


PANDAS:
It is a Python library. Pandas is used to analyze data. A Pandas DataFrame is a 2 dimensional
datastructure, like a 2 dimensional array, or a table with rows and columns.Pandas DataFrame can
becreatedfrom thelists, dictionary andfrom a list ofdictionaryetc.

CREATE A DATAFRAME USING A LIST OF ELEMENTS


AIM
Write a program to create a dataframe using a list of elements.
ALGORITHM
Step1: Start

Step2: import numpy and pandas module

Step3: Create a dataframe using list of

elements Step4: Print the output


Step5: Stop
PROGRAM
import pandas as
pd import pandas
as pd # list of
strings
lst = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
# Calling DataFrame constructor on

list df = pd.DataFrame(lst)
print(df)

OUTPUT
0
0 A
1 B
2 C
3 D
4 E
5 F
6 G
RESULT
Thus the python program for dataframe using list of elements has been implemented
and executedsuccessfully.

Downloaded by DURGALAKSHMI B
EX NO: 2B CREATE A DATAFRAME USING THE DICTIONARY
DATAFRAME:
To create DataFrame from dict of narray/list, all the narray must be of same length. If index is
passed then the length index should be equal to the length of arrays. If no index is passed, then by
default, index will be range(n) where n is the array length.

AIM
Write a program to create a dataframe using dictionary of elements.

ALGORITHM
Step1: Start

Step2: import numpy and pandas module


Step3: Create a dataframe using the
dictionary Step4: Print the output

Step5: Stop

PROGRAM
import pandas as pd

# intialise data of lists.

data = {'Name':['Tom', 'nick', 'krish',


'jack'], 'Age':[20, 21, 19, 18]}
# Create DataFrame
df =
pd.DataFrame(data) #
Print the output.
print(df)

OUTPUT:
Name Age

0 Tom 20

1 nick 21
0 krish 19

1 jack 18
RESULT
Thus the python to create dataframe using dictionary program has been implemented
and executedsuccessfully
EX NO: 2C COLUMN SELECTION

Column Selection
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and
columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and
renaming.

Column Selection: In Order to select a column in Pandas DataFrame, we can either access the
columns by calling them by their columns name.

AIM
Write a program to select a column from dataframe.

ALGORITHM
Step1: Start
Step2: import pandas module
Step3: Create a dataframe using the dictionary

Step4: Select the specific columns and print the


output Step5: Stop

PROGRAM
import pandas as pd
# Define a dictionary containing employee data

data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32], 'Address':['Delhi',
'Kanpur', 'Allahabad', 'Kannauj'], 'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Convert the dictionary into
DataFrame df = pd.DataFrame(data)
print(df)

# select two columns


print(df[['Name', 'Qualification']]
OUTPUT:

RESULT
Thus the python program for coloumn selection has been implemented and executed
successfully.
EX NO: 2D CHECKING FOR MISSING VALUES USING ISNULL() AND NOTNULL()

In order to check missing values in Pandas DataFrame, we use a function isnull() and
notnull(). Both function help in checking whether a value is NaN or not.These function can also be
used in Pandas Series in order to find null values in a series.

AIM
Write a program to check the missing values from the dataframe.

ALGORITHM
Step1: Start
Step2: import pandas module
Step3: Create a dataframe using the dictionary
Step4: Check the missing values using isnull()
function Step5: print the output
Step6: Stop

PROGRAM
# importing pandas as
pd import pandas as pd
# importing numpy as
np importnumpy as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan,
95], 'Second Score': [30, 45, 56,
np.nan],
'Third Score':[np.nan, 40, 80, 98]}

# creating a dataframe from


list df = pd.DataFrame(dict)
# using isnull()
function df.isnull()
OUTPUT:

RESULT
Thus the python program checking for missing value using isnull() and nonull() has
been implemented and executed successfully.
EX NO: 2E DROPPING MISSING VALUES USING DROPNA()

In order to drop a null values from a dataframe, we used dropna() function this function
drop Rows/Columns of datasets with Null values in different ways.
AIM
Write a program to drop rows with at least one Nan value (Null value)

ALGORITHM
Step1: Start
Step2: import pandas module
Step3: Create a dataframe using the dictionary
Step4: Drop the null values using dropna() funtion
Step5: print the output
Step6: Stop
PROGRAM
Drop rows with at least one Nan value (Null value)

# importing pandas as
pd import pandas as pd
# importing numpy as
np import numpy as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],'Second Score': [30, np.nan, 45, 56],
'Third Score':[52, 40, 80, 98],
'Fourth Score':[np.nan, np.nan, np.nan,
65]} # creating a dataframe from dictionary
df =
pd.DataFrame(dict) # using
dropna()function

df.dropna()
OUTPUT:

RESULT
Thus the python program for Drop missing values has been implemented and executed successfully.
EX NO: 3A BASIC PLOTS USING MATPLOTLIB

MATPLOTLIB:
It is a Python library that helps in visualizing and analyzing the data and helps in better
understanding of the data with the help of graphical, pictorial visualizations that can be simulated
using the matplotlib library. Matplotlib is a comprehensive library for static, animated and
interactivevisualizations.
AIM
Write a python program to create a simple plot using plot() function.
ALGORITHM
Step1:Define the x-axis and corresponding y-axis values as
lists. Step2:Plot them on canvas using .plot() function.
Step3:Give a name to x-axis and y-axis using .xlabel() and .ylabel()
functions. Step4:Give a title to your plot using .title() function.
Step5:Finally, to view your plot, we use .show()
function. Step6: Stop

PROGRAM
importmatplotlib.pyplot as
plt # x axis values
x = [1,2,3]

# corresponding y axis
values y = [2,4,1]
# plotting the
points plt.plot(x, y)
# naming the x axis
plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')
# giving a title to my
graph plt.title('My first
graph!')
# function to show the
plot plt.show()
OUTPUT:

RESULT
Thus the python program for basic Matplotlib has been implemented and executed successfully.
EX NO: 3B COMPUTE THE X AND Y COORDINATES AND CREATE A PLOT

AIM
Write a python program to create a plot by computing the x and y coordinates.

ALGORITHM
Step1: Compute the x and y coordinates for points on a sine
curve Step2: Plot the points using matplotlib
Step3:Display the
output Step4: Stop

PROGRAM
importnumpyasnp
importmatplotlib.pyplotasplt
x =np.arange(0, 3*np.pi,
0.1) y =np.sin(x)

plt.plot(x,
y)
plt.show()

OUTPUT

RESULT
Thus the python program to compute X and Y coordinates has been implemented
and executedsuccessfully.
EX NO: 3C DRAWING MULTIPLE LINES USING PLOT FUNCTION

AIM
Write a python program to draw multiple lines using plot() function.

ALGORITHM
Step1: Compute the x and y coordinates for points on a sine and cosine
curve Step2: Plot the points using matplotlib
Step3:Display the
output Step4: Stop
PROGRAM
importnumpy as np
importmatplotlib.pyplot as plt
# Compute the x and y coordinates for points on sine and cosine
curves x = np.arange(0, 3 * np.pi, 0.1)
y_sin = np.sin(x)
y_cos = np.cos(x)
# Plot the points using matplotlib
plt.plot(x, y_sin)
plt.plot(x, y_cos)
plt.xlabel('x axis label')
plt.ylabel('y axis label')
plt.title('Sine and Cosine')
plt.legend(['Sine',
'Cosine']) plt.show()
OUTPUT

RESULT
Thus the python program multiple line using plot functiont has been implemented
and executedsuccessfully.
Ex No: 3D BASIC PLOT USING MATPLOTLIB
AIM
Write a python program for basic plot using matplotlib

ALGORITHM
Step1: import the library
Step2: Plot the points using
matplotlib Step3: Display the output
Step4: Stop

PROGRAM
Line plot :

from matplotlib import pyplot as


plt x = [5, 2, 9, 4, 7]
y = [10, 5, 8, 4, 2]

plt.plot(x,y
) plt.show()

Bar plot :

from matplotlib import pyplot as


plt x = [5, 2, 9, 4, 7]
y = [10, 5, 8, 4, 2]

plt.bar(x,y
)
plt.show()
Histogram :

from matplotlib import pyplot as


plt y = [10, 5, 8, 4, 2]
plt.hist(y)
plt.show(
)

Scatter Plot :

from matplotlib import pyplot as


plt x = [5, 2, 9, 4, 7]
y = [10, 5, 8, 4, 2]
plt.scatter(x,
y) plt.show()

RESULT
Thus the python program for basic plot using Matplotlib has been implemented
and executedsuccessfully.
EX NO:4A CONDITIONAL FREQUENCY DISTRIBUTION

Conditional Frequency:
In the previous topic, you have studied about Frequency Distributions FreqDist function
computesthe frequency of each item in a list. While computing a frequency distribution, you observe
occurrence count of an event.
A Conditional Frequency is a collection of frequency distributions, computed based on a
condition. For computing a conditional frequency, you have to attach a condition to every
occurrence of an event. Let's consider the following list for computing Conditional Frequency.

AIM
To write a python program to show the conditional Frequency distribution

ALGORITHM
Step 1: Start
Step 2: Import Pandas, Numpy And Nltk
Step 3: List The Items As ‘F’ For Fruits And ’V’ For Vegetables
Step 4: Display The Frequency Of Each Items In The
List Step 5: Stop

PROGRAM:
importnumpyasnp# linear algebra
importpandasaspd# data processing, CSV file I/O (e.g. pd.read_csv)

importnltk
items = ['apple', 'apple', 'kiwi', 'cabbage', 'cabbage',
'potato'] nltk.FreqDist(items)
c_items= [('F','apple'), ('F','apple'), ('F','kiwi'), ('V','cabbage'), ('V','cabbage'), ('V','potato') ]
cfd=nltk.ConditionalFreqDist(c_items)
cfd.conditions(
) cfd.plot()
cfd['V']
OUTPUT

FreqDist({'cabbage': 2, 'potato': 1})

RESULT
Thus the python program for conditional frequency distribution has been implemented
and executedsuccessfully.
EX NO: 4B FREQUENCY OF WORDS, OF A PARTICULAR GENRE, IN
BROWN CORPUS.
AIM
To write a python program determine the frequency of words, of a particular genre, in
corpus brown
.
ALORITHM
Step 1: Start
Step 2: Import All Necessary Libraries
Step 3: Display The Frequency Of Each Items In The
List Step 4:Setting Cumulative Argument Value To
True.
Step 5: Stop

PROGRAM

fromnltk.corpusimport brown
cfd=nltk.ConditionalFreqDist([ (genre, word) for genre inbrown.categories() for word
inbrown.words(categories=genre
) ]) cfd
cfd.conditions()
cfd.tabulate(conditions=['government', 'humor', 'reviews'],samples=['leadership', 'worsh
ip', 'hardship'])
cfd.plot(conditions=['government', 'humor', 'reviews'],samples=['leadership', 'worship', 'hardship'])
cfd.tabulate(conditions=['government', 'humor', 'reviews'], samples=['leadership',
'worship', 'hardship'], cumulative =True)
news_fd=cfd['news']
news_fd.most_common
(3) news_fd['the']
OUTPUT
leadership worship hardship
government 12 3 2
humor 1 0 0
reviews 14 1 2

leadership worship hardship


government 12 15 17
humor 1 1 1
reviews 14 15 17

RESULT
Thus the python program frequency of words, of a particular genre, in brown corpus has
been implemented and executed successfully.
EX NO: 4C FREQUENCY OF LAST CHARACTER APPEARING IN ALL
NAMES ASSOCIATED WITH MALES AND FEMALES RESPECTIVELY
AND COMPARES THEM

AIM
To write a python program frequency of last character appearing in all names associated
with malesand females respectively and compares them.
ALORITHM
Step 1: Start
Step 2: Import All Necessary Libraries
Step 3: Display The Frequency Of Each Items In The
List Step 4: Plot
Step 5: Stop

PROGRAM
fromnltk.corpusimport names
nt= [(fid.split('.')[0], name[-1]) for fid innames.fileids() for name
innames.words(fid) ] cfd2 =nltk.ConditionalFreqDist(nt)
cfd2['female']['a']
cfd2['male']['a']
cfd2['female'] > cfd2['male']
cfd2.tabulate(samples=['a',
'e']) cfd2.plot()
OUTPUT
a e
female 1773 1432
male 29 468

RESULT
Thus the python program frequency of last character appearing in all names
associated with malesand females respectively and compares them has been implemented and
executed successfully.
EX NO: 4D AVERAGE OF LIST USING LOOP

AIM
To write a python program for finding a average of list using loop.

ALGORITHM
Step 1: Start
Step 2: Define A Class Cal_Average
Step 3: Sum_Num = Sum_Num + T
Step 4: Avg = Sum_Num /
Len(Num) Step 5: Stop

PROGRAM:
defcal_average(num):
sum_num =
0 for t in
num:
sum_num = sum_num + t
avg = sum_num /
len(num) returnavg
print("The average is", cal_average([18,25,3,41,5]))

OUTPUT:
The average is 18.4

RESULT
Thus the python program finding a average of list using loop has been implemented and
executedsuccessfully.
EX NO: 4E AVERAGE OF LIST USING BUILT IN FUNCTIONS

AIM
To write a python program to find the average of list using built in functions.

ALGORITHM
STEP 1: Start STEP
STEP 2: define a list
STEP 3: avg =
sum(number_list)/len(number_list) STEP
4:printavg
STEP 5: Stop

PROGRAM
number_list = [45, 34, 10, 36, 12, 6, 80]
avg =
sum(number_list)/len(number_list)
print("The average is ", round(avg,2))

OUTPUT:
The average is 31.86

RESULT
Thus the python program finding a average of list using built in functions has
been implemented andexecuted successfully.
Ex No: 4F AVERAGE OF LIST USING MEAN FUNCTION

AIM
To write a python program to find the average of list using mean function.

ALGORITHM
Step 1: Start
Step 2: Define A List
Step 3: Import Mean From
Statistics Step 4: Avg =
Mean(Number_List) Step 5:
Printavg
Step 6: Stop

PROGRAM
from statistics import mean
number_list = [45, 34, 10, 36, 12, 6,
80] avg = mean(number_list)
print("The average is ", round(avg,2))
OUTPUT:
The average is 31.86

RESULT
Thus the python program average of list using mean function has been implemented and
executedsuccessfully.
EX NO: 4G AVERAGE OF LIST USING NUMPY LIBRARY

AIM
To write a python program to find the average of list using numpy library.

ALGORITHM
Step 1: Start
Step 2: Import Mean From
Numpy Step 3: Define A List
Step 4: Avg =
Mean(Number_List) Step
5:Printavg
Step 6: Stop

PR0GRAM
fromnumpy import mean
number_list = [45, 34, 10, 36, 12, 6, 80]
avg = mean(number_list)
print ("The average is ", round(avg,2))
OUTPUT:
The average is 31.86

RESULT
Thus the python program average of list using numpy library has been implemented and
executedsuccessfully.
EX NO: 4H VARIANCE OF SAMPLE SET

AIM
To write a python program to show variance of sample set.

ALGORITHM
Step 1: Start
Step 2: Import
Statistics Step 3:
Define A List
Step 4: Print
Statistics.Variance(Sample)) Step 5: Stop

PROGRAM
import statistics
sample = [2.74, 1.23, 2.63, 2.22, 3, 1.98]
print("Variance of sample set is % s" , statistics.variance(sample))

OUTPUT :
Variance of sample set is 0.40924

RESULT
Thus the python program to show variance of sample set has been implemented and executed
successfully.
EX NO: 4I VARIANCE ON A RANGE OF DATA-TYPES

AIM
To write a python program to show variance on a range of data-types.
ALGORITHM
Step 1: Start
Step 2: Import All Necessary
Libraries Step 3: Define Samples
Step 4: Print Variance Of
Sample Step 5: Stop

PROGRAM
from statistics import variance
from fractions import Fraction as
fr sample1 = (1, 2, 5, 4, 8, 9, 12)
sample2 = (-2, -4, -3, -1, -5, -6)
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)
sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),fr(5, 6), fr(7, 8))
sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)
print("Variance of Sample1 is ",variance(sample1))
print("Variance of Sample2 is ",variance(sample2))
print("Variance of Sample3 is ",variance(sample3))
print("Variance of Sample4 is ", variance(sample4))
print("Variance of Sample5 is ",variance(sample5))

OUTPUT
Variance of Sample1 is
15.80952380952381 Variance of Sample2
is 3.5
Variance of Sample3 is 61.125
Variance of Sample4 is 1/45
Variance of Sample5 is 0.17613000000000006

RESULT
Thus the python program to show variance on a range of data-types has been implemented
and executed successfully.
EX NO: 4J STATISTICS

AIM
To write a python program to show statistics.

ALGORITHM
Step 1: Start
Step 2: Import
Statistics Step 3:
Define A List
Step 4: M=Statistics.Mean(Sample)
Step 5: Stop

PROGRAM
import statistics
sample = (1, 1.3, 1.2, 1.9, 2.5, 2.2)
m = statistics.mean (sample)
print("Variance of Sample set is ",statistics.variance(sample, xbar = m))

OUTPUT
Variance of Sample set is 0.3656666666666667

RESULT
Thus the python program to show statistics has been implemented and executed successfully.
EX NO: 5A CREATE NORMAL CURVE
`
AIM

To write a python program to create a normal curve.

ALGORITHM

STEP 1: Start
STEP 2: import all necessary
packages STEP 3: create distribution
STEP 4: visualize the
distribution STEP 5: Stop
PROGRAM

From scipy.stats import norm


importnumpy as np
importmatplotlib.pyplot as
plt importseaborn as sb
data = np.arange(1,10,0.01)
pdf = norm.pdf(data , loc = 5.3 , scale =
1 ) sb.set_style('whitegrid')
sb.lineplot(data, pdf , color =
'black') plt.xlabel('Heights')
plt.ylabel('Probability Density')
OUTPUT

Text(0, 0.5, 'Probability Density')

RESULT
Thus the python program to create a normal curve has been implemented and executed
successfully.
EX NO: 5B CORRELATION AND SCATTER PLOTS
CORRELATION:
Correlation means an association. It is a measure of the extent to which two variables are related.
AIM:
To write a python program correlation and scatter plots.
ALGORITHM:
Step 1: Importing the libraries.
Step 2: Finding the Correlation between two variables.
Step 3: Plotting the graph. Here we are using scatter plots. A scatter plot is a diagram where each
value in the data set is represented by a dot. Also, it shows a relationship between two variables.

PROGRAM:
importsklearn
importnumpy as np
importmatplotlib.pyplot as
plt import pandas as pd
y = pd.Series([1, 2, 3, 4, 3, 5, 4])

x = pd.Series([1, 2, 3, 4, 5, 6, 7])

correlation =
y.corr(x)
print(correlation)
plt.scatter(x, y)
# This will fit the best line into the graph

plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y, 1)) (np.unique(x)), color='red')


OUTPUT:

RESULT:
Thus the python program to correlation and scatter plots has been implemented and executed
successfully.
SCATTER PLOT:
Scatter plot is a graph of two sets of data along the two axes. It is used to visualize
the relationship between the two variables.

In python matplotlib, the scatterplot can be created using the pyplot.plot() or the
pyplot.scatter(). Using these functions, you can add more feature to your scatter plot, like changing
the size, color orshape of the points.

i) SIMPLE SCATTER PLOT

AIM:

To write a python program simple scatter plots.


ALGORITHM:
Step 1: Importing the libraries.
Step 2: Finding the Correlation between two variables.
Step 3: Plotting the graph. Here we are using scatter plots. A scatter plot is a diagram where each
value in the data set is represented by a dot. Also, it shows a relationship between two variables.

PROGRAM:

x = range(50)
y = range(50) +
np.random.randint(0,30,50) plt.scatter(x, y)
plt.rcParams.update({'figure.figsize':(10,8),
'figure.dpi':100}) plt.title('Simple Scatter plot')
plt.xlabel('X -
value') plt.ylabel('Y
- value') plt.show()
OUTPUT:

RESULT
Thus the python program for simple scatter Plot has been implemented and executed
successfully.
ii) SIMPLE SCATTER PLOT WITHCOLORED

POINTS AIM:
To write a python program Simple Scatterplot with colored points.
ALGORITHM:
Step 1: Importing the libraries.
Step 2: Finding the Correlation between two variables.
Step 3: Plotting the graph. Here we are using scatter plots. A scatter plot is a diagram where each
value in the data set is represented by a dot. Also, it shows a relationship between two variables.

PROGRAM:
x = range(50)

y = range(50) + np.random.randint(0,30,50)
plt.rcParams.update({'figure.figsize':(10,8),
'figure.dpi':100}) plt.scatter(x, y, c=y, cmap='Spectral')
plt.colorbar()
plt.title('Simple Scatter
plot') plt.xlabel('X - value')

plt.ylabel('Y -
value') plt.show()

OUTPUT:

RESULT:
Thus the python program Simple Scatterplot with colored points has been implemented
andexecuted successfully.
EX NO: 5C CORRELATION COEFFICIENT
Variables within a dataset can be related for lots of reasons.

For example:
One variable could cause or depend on the values of another
variable. One variable could be lightly associated with another
variable.
Two variables could depend on a third unknown variable.
It can be useful in data analysis and modelling to better understand the relationships
between variables. The statistical relationship between two variables is referred to as their
correlation.
A correlation could be positive, meaning both variables move in the same direction, or
negative, meaning that when one variable’s value increases, the other variables’ values decrease.
Correlation can also be neutral or zero, meaning that the variables are unrelated.
Positive Correlation: both variables change in the same direction.
Neutral Correlation: No relationship in the change of the
variables. Negative Correlation: variables change in opposite
directions.

NUMPY CORRELATION CALCULATION

AIM:
To write a program to calculate the correlation coefficient.

ALGORITHM:
STEP 1: Import the numpy packages.
STEP 2: Define two NumPy arrays. Call them x and y
STEP3: Call np.corrcoef() with both arrays as arguments
STEP 4: corrcoef() returns the correlation matrix, which is a two-dimensional array with
the correlation coefficients.

PROGRAM:
importnumpy as np
x = np.arange(10, 20)
y = np.array([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])
r = np.corrcoef(x,
y) print(r)
OUTPUT:

RESULT:
Thus the python program calculate the correlation coefficient has been implemented
and executedsuccessfully.
PEARSON’S CORRELATION

The Pearson correlation coefficient can be used to summarize the strength of the linear
relationship between two data samples.The Pearson’s correlation coefficient is calculated as the
covariance of thetwo variables divided by the product of the standard deviation of each data sample.
It is thenormalization of the covariance between the two variables to give an interpretable score.
Pearson's correlation coefficient = covariance(X, Y) / (stdv(X) * stdv(Y))

AIM:
To write a program to calculate the Pearson correlation coefficient between two variables.

ALGORITHM:
Step 1: Import The Needed
Packages. Step 2: Provide The Data.
Step 3: Thepearsonr() Scipy Function Can Be Used To Calculate The Pearson’s Correlation
Coefficient Between Two Data Samples With The Same
Length. Step 4: Display The Correlation Coefficient.

PROGRAM:
fromnumpy.random import
randn fromnumpy.random
import seed fromscipy.stats
import pearsonr seed(1)
data1 = 20 * randn(1000) + 100
data2 = data1 + (10 * randn(1000) + 50)
corr,_ = pearsonr(data1, data2)
print('Pearsons correlation:', corr)

OUTPUT:
Pearsons correlation: 0.887611908579531

RESULT:
Thus the python program to calculate the Pearson correlation coefficient between
two variables hasbeen implemented and executed successfully.
6.REGRESSION

EX NO: 6A SIMPLE LINEAR REGRESSIONWITH SCIKIT LEARN

AIM:
To write a program simple linear regression with scikit-learn.

ALGORITHM:
Step 1: Import The Packages And Classes.
Step 2: Provide Data To Work With And Eventually Do Appropriate
Transformations. Step 3: Create A Regression Model And Fit It With Existing Data.
Step 4: Check The Results Of Model Fitting To Know Whether The Model Is
Satisfactory. Step 5: Apply The Model For Predictions.

PROGRAM:
importnumpy as np
fromsklearn.linear_model import
LinearRegression x = np.array([5, 15, 25, 35, 45,
55]).reshape((-1, 1))
y = np.array([5, 20, 14, 32, 22, 38])
model = LinearRegression().fit(x,
y) r_sq = model.score(x, y)
print('coefficient of determination:',
r_sq) y_pred = model.predict(x)
print('predicted response:', y_pred)

OUTPUT:
coefficient of determination: 0.715875613747954
predicted response: [ 8.33333333 13.73333333 19.13333333 24.53333333 29.93333333
35.33333333]

RESULT:
Thus the python program simple linear regression with scikit-learn has been implemented
andexecuted successfully.
EX NO: 6B MULTIPLE LINEAR REGRESSIONWITH SCIKIT-LEARN

AI
M
To write a program multiple linear regression with scikit-learn.
ALGORITHM:
Step 1:Import Packages And
Classes Step 2: Provide Data
Step 3:Create A Model And Fit
It Step 4: Get Results
Step 5: Predict Response

PROGRAM:
importnumpy as np
fromsklearn.linear_model import LinearRegression
x = [[0, 1], [5, 1], [15, 2], [25, 5], [35, 11], [45, 15], [55, 34], [60, 35]]
y = [4, 5, 20, 14, 32, 22, 38, 43]
x, y = np.array(x), np.array(y)
model = LinearRegression().fit(x,
y)

r_sq = model.score(x, y)
print('coefficient of determination:',
r_sq) print('intercept:',
model.intercept_) print('slope:',
model.coef_)
y_pred = model.predict(x)
print('predicted response:',
y_pred)

OUTPUT:
coefficient of determination:
0.8615939258756775 intercept:
5.52257927519819
slope: [0.44706965 0.25502548]
predicted response: [ 5.77760476 8.012953 12.73867497 17.9744479 23.97529728 29.4660957
38.78227633 41.27265006]
RESULT:
Thus the python program multiple linear regression with scikit-learn has been
implemented andexecuted successfully.
lOMoARcPSD|36439593

EX NO: 7 Z-TEST CASE STUDIES

AIM

To Perform Z-test

ALGORITHM

Step1: Start

Step2: Import math,numpy,statsmodels&ztest

Step3: create a list & Print the ztest list

Step4: Stop

PROGRAM

# imports

import math

importnumpy as np

fromnumpy.random import randn

fromstatsmodels.stats.weightstats import ztest

# Generate a random array of 50 numbers having mean 110 and sd 15

# similar to the IQ scores data we assume above

mean_iq = 110

sd_iq = 15/math.sqrt(50)

alpha =0.05

null_mean =100

data = sd_iq*randn(50)+mean_iq

# print mean and sd

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))

# now we perform the test. In this function, we passed data, in the value parameter

# we passed mean value in the null hypothesis, in alternative hypothesis we check whether the

# mean is larger

ztest_Score, p_value= ztest(data,value = null_mean, alternative='larger')

# the function outputs a p_value and z-score corresponding to that value, we compare the

# p-value with alpha, if it is greater than alpha then we do not null hypothesis

# else we reject it.

if(p_value< alpha):

print("Reject Null Hypothesis")

else:

print("Fail to Reject NUll Hypothesis")

OUTPUT

mean=110.17 stdv=2.34

Reject Null Hypothesis

RESULT

Thus the program for Z-Test case studies has been executed and verified successfully.

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

EX NO: 8 T-TEST CASE STUDIES

AIM

To Perform T-test for sampling distribution.

ALGORITHM

Step1: Start

Step2: Import random &numpy

Step3: Calculate the standard deviation

Step4: Stop

PROGRAM

# Importing the required libraries and packages


importnumpy as np
fromscipy import stats
# Defining two random distributions
# Sample Size
N = 10
# Gaussian distributed data with mean = 2 and var =
1 x = np.random.randn(N) + 2
# Gaussian distributed data with mean = 0 and var = 1
y = np.random.randn(N)
# Calculating the Standard Deviation
# Calculating the variance to get the standard deviation
var_x = x.var(ddof = 1)
var_y = y.var(ddof = 1)
# Standard Deviation
SD = np.sqrt((var_x + var_y) / 2)

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

print("Standard Deviation =", SD)


# Calculating the T-Statistics
tval = (x.mean() - y.mean()) / (SD * np.sqrt(2 /
N)) # Comparing with the critical T-Value
# Degrees of freedom
dof = 2 * N - 2
# p-value after comparison with the T-Statistics
pval = 1 - stats.t.cdf( tval, df = dof)
print("t = " + str(tval))
print("p = " + str(2 * pval))
## Cross Checking using the internal function from SciPy Package
tval2, pval2 = stats.ttest_ind(x, y)
print("t = " + str(tval2))
print("p = " + str(pval2))

OUTPUT:

Standard Deviation = 0.7642398582227466

t = 4.87688162540348

p = 0.0001212767169695983

t = 4.876881625403479

p = 0.00012127671696957205

RESULT

Thus the program for T-test case studies has been executed and verified successfully.

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

EX NO: 9 ANOVA CASE STUDIES

AIM

To Perform ANOVA test.

ALGORITHM

Step1: Start
Step2: Import scipy
Step3: import statsmodels
Step4: calculate ANOVA F and p value
Step 5: Stop

PROGRAM

importscipy.stats as stats
# statsf_oneway functions takes the groups as input and returns ANOVA F and p value
fvalue, pvalue = stats.f_oneway(df['A'], df['B'], df['C'], df['D'])
print(fvalue, pvalue)
# 17.492810457516338 2.639241146210922e-05
# get ANOVA table as R like output
importstatsmodels.api as sm
fromstatsmodels.formula.api import ols
# Ordinary Least Squares (OLS) model
model = ols('value ~ C(treatments)', data=df_melt).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
anova_table

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

OUTPUT

(ANOVA F and p value)

# sum_sq df F PR(>F)

#C(treatments) 3010.95 3.0 17.49281 0.000026

#Residual 918.00 16.0 NaN NaN

# ANOVA table using bioinfokit v1.0.3 or later (it uses wrapper script for anova_lm)

frombioinfokit.analys import stat

res = stat()

RESULT

Thus the program for ANOVA case studies has been executed and verified successfully.

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

EX NO: 10 CROSS-VALIDATION WITH LINEAR REGRESSION

AIM :

To Perform Linear Regression

ALGORITHM

Step1: Start

Step2: Import numpy,pandas,seaborn,matplotlib&sklearn

Step3: calculate linear regression using the appropriate functions

Step4: display the result

Step 5: Stop

PROGRAM

# import all libraries


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re

import sklearn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import scale
from sklearn.feature_selection import RFE
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline

import warnings # supress warnings


warnings.filterwarnings('ignore')
# import Housing.csv
housing = pd.read_csv('../input/cross-val/Housing.csv')

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

housing.head()

# number of observations
len(housing.index)
# filter only area and price
df = housing.loc[:, ['area', 'price']]
df.head()
# recaling the variables (both)
df_columns = df.columns
scaler = MinMaxScaler()
df = scaler.fit_transform(df)

# rename columns (since now its an np array)


df = pd.DataFrame(df)
df.columns = df_columns

df.head()
# visualise area-price relationship
sns.regplot(x="area", y="price", data=df, fit_reg=False)
# split into train and test
df_train, df_test = train_test_split(df,
train_size = 0.7,
test_size = 0.3,
random_state = 10)
print(len(df_train))
print(len(df_test))
381
164
# split into X and y for both train and test sets
# reshaping is required since sklearn requires the data to be in shape
# (n, 1), not as a series of shape (n, )
X_train = df_train['area']
X_train = X_train.values.reshape(-1, 1)
y_train = df_train['price']

X_test = df_test['area']
X_test = X_test.values.reshape(-1, 1)
y_test = df_test['price']

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

OUTPUT:

stori mainr hotwaterheatin airconditioni furnishingsta


price area bedrooms bathrooms guestroom basement parking prefarea
es oad g ng tus

13300
0 7420 4 2 3 yes no no no yes 2 yes furnished
000

12250
1 8960 4 4 4 yes no no no yes 3 no furnished
000

12250 semi-
2 9960 3 2 2 yes no yes no no 2 yes
000 furnished

12215
3 7500 4 2 2 yes no yes no yes 3 yes furnished
000

11410
4 7420 4 1 2 yes yes yes
000

545

area price
0 0.396564 1.000000
1 0.502405 0.909091
2 0.571134 0.909091
3 0.402062 0.906061
4 0.396564 0.836364

<matplotlib.axes._subplots.AxesSubplot at 0x7fe94d630160>

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

381
164

RESULT

Thus the program for Linear Regression has been executed and verified successfully.

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

EX NO: 11 LOGISTIC REGRESSION

AIM :

To Perform Logistic Regression

ALGORITHM

Step1: Start

Step2: Import numpy,pandas,seaborn,matplotlib&sklearn

Step3: calculate logistic regression using the appropriate functions

Step4: display the result

Step 5: Stop

PROGRAM
portnumpy as np
import pandas as pd
importseaborn as sb
importmatplotlib.pyplot as plt
importsklearn
from pandas import Series,
DataFrame frompylab import
rcParams fromsklearn import
preprocessing
fromsklearn.linear_model import LogisticRegression
fromsklearn.model_selection import train_test_split
fromsklearn import metrics
fromsklearn.metrics import classification_report
'exec(%matplotlib inline)'
rcParams['figure.figsize'] = 10, 8
sb.set_style('whitegrid')
url = 'https://raw.githubusercontent.com/BigDataGal/Python-for-Data-Science/master/titanic-
train.csv'

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

titanic = pd.read_csv(url)
titanic.columns =
['PassengerId','Survived','Pclass','Name','Sex','Age','SibSp','Parch','Ticket','Fare','Cabin','Embarke
d']
titanic.head()
sb.countplot(x='Survived',data=titanic, palette='hls')
titanic.isnull().sum()
titanic.info()

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

OUTPUT

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked

Braund, Mr.
0 1 0 male 22.0 1 0 A/5 21171 7.2500 NaN S
Owen Harris

Cumings, Mrs. John


1 2 1 Bradley (Florence Briggs female 38.0 1 0 PC 17599 71.2833 C85 C
Th...

STON/O2.
2 3 1 Heikkinen, Miss. Laina female 26.0 0 0 7.9250 NaN S
3101282

Futrelle, Mrs. Jacques


3 4 1 female 35.0 1 0 113803 53.1000
Heath (Lily May Peel)

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

PassengerId 0

Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId 891 non-null int64
Survived 891 non-null int64
Pclass 891 non-null int64
Name 891 non-null object
Sex 891 non-null object
Age 714 non-null float64
SibSp 891 non-null int64
Parch 891 non-null int64
Ticket 891 non-null object
Fare 891 non-null float64
Cabin 204 non-null object
Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

RESULT

Thus the program for Logistics Regression has been executed and verified successfully.

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

EX NO: 12 TIME SERIES ANALYSIS

AIM :

To Perform Time series analysis

ALGORITHM

Step1: Start

Step2: Import numpy,pandas, matplotlib&seaborn

Step3: draw the plot

Step4: display the plot

Step 5: Stop

PROGRAM

fromdateutil.parser import parse

importmatplotlib as mpl

importmatplotlib.pyplot as plt

importseaborn as sns

importnumpy as np

import pandas as pd

plt.rcParams.update({'figure.figsize': (10, 7), 'figure.dpi': 120})

# Import as Dataframe

df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'])

df.head()

# dataset source: https://github.com/rouseguy

df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/MarketArrivals.csv')

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

df = df.loc[df.market=='MUMBAI', :]

df.head()

# Time series data source: fpppacakge in R.

importmatplotlib.pyplot as plt

df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')

# Draw Plot

defplot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):

plt.figure(figsize=(16,5), dpi=dpi)

plt.plot(x, y, color='tab:red')

plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)

plt.show()

plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales in Australia from 1992 to
2008.')

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

OUTPUT

ser = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')

ser.head()

# dataset source: https://github.com/rouseguy

df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/MarketArrivals.csv')
df = df.loc[df.market=='MUMBAI', :]
df.head()

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

# Time series data source: fpppacakge in R.

importmatplotlib.pyplot as plt
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')

# Draw Plot
defplot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):
plt.figure(figsize=(16,5), dpi=dpi)
plt.plot(x, y, color='tab:red')
plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)
plt.show()
plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales in Australia from 1992 to
2008.')
# Import data
df = pd.read_csv('datasets/AirPassengers.csv', parse_dates=['date'])

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

x = df['date'].values
y1 = df['value'].values

# Plot
fig, ax = plt.subplots(1, 1, figsize=(16,5), dpi= 120)
plt.fill_between(x, y1=y1, y2=-y1, alpha=0.5, linewidth=2, color='seagreen')
plt.ylim(-800, 800)
plt.title('Air Passengers (Two Side View)', fontsize=16)
plt.hlines(y=0, xmin=np.min(df.date), xmax=np.max(df.date), linewidth=.5)
plt.show()

# Import Data
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')
df.reset_index(inplace=True)

# Prepare data
df['year'] = [d.year for d in df.date]
df['month'] = [d.strftime('%b') for d in
df.date] years = df['year'].unique()

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

# Prep Colors
np.random.seed(100)
mycolors = np.random.choice(list(mpl.colors.XKCD_COLORS.keys()), len(years), replace=False)

# Draw Plot
plt.figure(figsize=(16,12), dpi= 80)
for i, y in enumerate(years):
if i > 0:
plt.plot('month', 'value', data=df.loc[df.year==y, :], color=mycolors[i], label=y)
plt.text(df.loc[df.year==y, :].shape[0]-.9, df.loc[df.year==y, 'value'][-1:].values[0], y, fontsize=12,
color=mycolors[i])

# Decoration
plt.gca().set(xlim=(-0.3, 11), ylim=(2, 30), ylabel='$Drug Sales$', xlabel='$Month$')
plt.yticks(fontsize=12, alpha=.7)
plt.title("Seasonal Plot of Drug Sales Time Series", fontsize=20)
plt.show()

# Import Data

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')
df.reset_index(inplace=True)

# Prepare data
df['year'] = [d.year for d in df.date]
df['month'] = [d.strftime('%b') for d in
df.date] years = df['year'].unique()

# Draw Plot
fig, axes = plt.subplots(1, 2, figsize=(20,7), dpi= 80)
sns.boxplot(x='year', y='value', data=df, ax=axes[0])
sns.boxplot(x='month', y='value', data=df.loc[~df.year.isin([1991, 2008]), :])

# Set Title
axes[0].set_title('Year-wise Box Plot\n(The Trend)', fontsize=18);
axes[1].set_title('Month-wise Box Plot\n(The Seasonality)',
fontsize=18) plt.show()

Downloaded by DURGALAKSHMI B
lOMoARcPSD|36439593

RESULT

Thus the program for Time series analysis has been executed and verified successfully.

Downloaded by DURGALAKSHMI B

You might also like