0% found this document useful (0 votes)
11 views

Final Fds Manual

The document outlines the installation and exploration of Python packages including NumPy, SciPy, Jupyter, Statsmodels, and Pandas. It provides detailed descriptions of each package's features, installation instructions, and examples of basic operations for NumPy and Pandas. The document concludes with successful execution results for various operations performed using these packages.

Uploaded by

sugapriyaksk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Final Fds Manual

The document outlines the installation and exploration of Python packages including NumPy, SciPy, Jupyter, Statsmodels, and Pandas. It provides detailed descriptions of each package's features, installation instructions, and examples of basic operations for NumPy and Pandas. The document concludes with successful execution results for various operations performed using these packages.

Uploaded by

sugapriyaksk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

DOWNLOAD, INSTALL AND EXPLORE THE FEATURES

EX.NO.1 OF NUMPY, SCIPY, JUPYTER, STATSMODELS AND


PANDAS PACKAGES

AIM:

To download install and explore the features of Numpy, Scipy, Jupyter, Statsmodels and
Pandas packages.

FEATURES OF PYTHON PACKAGES:

1. NUMPY
One of the most fundamental packages in Python, NumPy is a general-purpose array-
processing package. It provides high-performance multidimensional array objects and tools
to work with the arrays. NumPy is an efficient container of generic multi-dimensional data.
NumPy’s main object is the homogeneous multidimensional array. It is a table of Elements
or numbers of the same datatype, indexed by a tuple of positive integers. In NumPy,
dimensions are called axes and the number of axes is called rank. NumPy’s array class is
called ndarray aka array.
 Basic array operations: add, multiply, slice, flatten, reshape, index arrays
 Advanced array operations: stack arrays, split into sections, broadcast arrays
 Work with DateTime or Linear Algebra
 Basic Slicing and Advanced Indexing in NumPy Python.

2. SCIPY
The SciPy library is one of the core packages that make up the SciPy stack. Now, there is a
difference between SciPy Stack and SciPy, the library. SciPy builds on the NumPy array
object and is part of the stack which includes tools like Matplotlib, Pandas, and SymPy with
additional tools, SciPy library contains modules for efficient mathematical routines as linear
algebra, interpolation, optimization, integration, and statistics. There are various issues
related to Scientific Computation that arises while working with data science.
 SciPy provides us with a variety of sub-packages to solve these issues efficiently.
 SciPy library has amazingly fast computational power and easy to use.
 It can operate an array of NumPy libraries and has also optimized the functions used
in NumPy.
 After GNU Scientific library, SciPy is one of the most used scientific libraries.

3. PANDAS
Pandas is an open-source Python package that provides high-performance, easy-to-use
data structures and data analysis tools for the labeled data in Python programming
language. Pandas stand for Python Data Analysis Library. Pandas is a perfect tool for data
wrangling or munging. It is designed for quick and easy data manipulation, reading,
aggregation, and visualization. Pandas take data in a CSV or TSV file or a SQL database
and create a Python object with rows and columns called a data frame. The data frame is
very similar to a table in statistical software, say Excel or SPSS.

 Indexing, manipulating, renaming, sorting, merging data frame


 Update, Add, Delete columns from a data frame
 Impute missing files, handle missing data or NANs
 Plot data with histogram or box plot

4. STATSMODELS
Statsmodels is built for hardcore statistics. The core of the Statsmodels Library is
production ready”. Traditional models like robust linear models, generalized linear model
(GLM) etc. have all been around for a long time and have been validated against “R &
Stata”. It also contains the time series analysis section, which includes vector
autoregression (VAR), AR & ARMA.
 Linear/ Multiple regression – Linear regression is a statistical method for modeling
the linear relationship between a dependent variable and one or more explanatory
variables.
 Logistic regression – The logistic model is used in statistics to model the
likelihood of a specific event/class occurring such as win/lose, pass/fail, etc.
 Time series analysis – It refers to the analysis of time series data to retrieve
meaningful statistics and many other data characteristics
 Statistical tests – Refers to the many statistical tests that can be done using the
Statsmodels Library.
5. JUPYTER
Project Jupyter is a suite of software products used in interactive computing. Packages
under Jupyter project include
Jupyter notebook − A web based interface to programming environments of Python,
Julia, R and many others
QtConsole − Qt based terminal for Jupyter kernels similar to IPython
nbviewer − Facility to share Jupyter notebooks
JupyterLab − Modern web based integrated interface for all products.

 Offers a powerful interactive Python shell.


 Acts as a main kernel for Jupyter notebook and other front end tools of Project
Jupyter.
 Possesses object introspection ability. Introspection is the ability to check
properties of an object during runtime.
 Syntax highlighting.
 Stores the history of interactions.
 Tab completion of keywords, variables and function names.
 Magic command system useful for controlling Python environment and
performing OS tasks.
PYTHON INSTALLATION
 Open the python official web site. (https://www.python.org/)
 Downloads ==> Windows ==> Select Recent Release. (Requires Windows 10 or above
versions)
 Install "python-3.10.6-amd64.exe"

PACKAGE INSTALLATION
Open command prompt and enter the following code to check whether the python was installed
properly or not, “python –version”. If installation is proper it returns the version of python

Enter the following code to check whether the python package manager was installed properly
or not, “pip –version”.

If installation is proper it returns the version of python package manager

 Enter the following code to install the Numpy library: pip install numpy
 Enter the following code to install the SciPy library: pip install scipy
 Enter the following code to install the Statsmodels library: pip install statsmodels
 Enter the following code to install the Pandas library: pip install Pandas
 Enter the following code to install the Jupyter: pip install Jupyter
OUTPUT:
RESULT:

Thus the features of Numpy, Scipy, Jupyter, Statsmodels and Pandas packages are
downloaded, installed and explored successfully.
EX.NO.2A WORKING WITH NUMPY

AIM:

To perform basic NumPy operations in python for

 Creating different types of NumPy arrays and displaying basic information such as the
Data type, Shape, Size, and Strides
 Creating an array using built-in NumPy functions
 Performing file Operations with NumPy arrays.

ALGORITHM:

Step 1: Start the program

Step 2: Import the NumPy library.

Step 3: Define a 1-dimensional array, 2-dimensional array, and 3-dimensional array.

Step 4: Print the memory address, the shape, the data type, and the stride of the array.

Step 5: Then create an array using built-in NumPy functions.

Step 6: Perform the file operations with NumPy arrays.

Step 7: Display the output.


PROGRAM:

1. Creating Arrays:

 0-D Arrays
Each value in an array is a 0-D array.

import numpy as np
arr = np.array(42)
print(arr)
 1-D Arrays
An array that has 0-D arrays as its elements is called 1-D array.

import numpy as np
arr = np.array([1, 2,3, 4, 5])
print(arr)
 2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array.

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
 3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.

import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
2. Array Dimensions:
import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim) print(b.ndim) print(c.ndim) print(d.ndim)
3. Access 2-D Arrays:
To access elements from 2-D arrays we can use comma separated integers
representing the dimension and the index of the element.

import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('2nd element on 1st row: ', arr[0, 1])

4. Access 3-D Arrays:


To access elements from 3-D arrays we can use comma separated integers
representing the dimensions and the index of the element.

import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr[0, 1, 2])

5. Array Slicing:
Slicing in python means taking elements from one given index to another given index.
We pass slice instead of index like this: [start:end]. We can also define the step, like
this: [start:end:step].

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5:2])

6. Data Types:
NumPy has some extra data types, and refer to data types with one character, like i for
integers, u for unsigned integers etc.

import numpy as np
arr = np.array([1, 2, 3, 4], dtype='S')
print(arr)
print(arr.dtype)
7. Copy & View:
import numpy as np
arr = np.array([1, 2, 3, 4, 5]) x = arr.copy()
arr[0] = 42
print(arr)
print(x)

8. Make a view:
import numpy as np
arr = np.array([1, 2, 3, 4, 5]) x = arr.view()
arr[0] = 42
print(arr) print(x)

9. Array Shape & Reshaping:


Array Shape NumPy arrays have an attribute called shape that returns a tuple with
each index having the number of corresponding elements.
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)

10. Array Reshaping:


Reshaping means changing the shape of an array. By reshaping we can add or remove
dimensions or change number of elements in each dimension.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2, 3, 2) print(newarr)

11. Array Iterating:


Iterating means going through elements one by one. As we deal with multi-
dimensional arrays in numpy, we can do this using basic for loop of python.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
for x in arr:
print(x)
12. Joining Array:
Joining means putting contents of two or more arrays in a single array.
import numpy as np
arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)

13. Splitting Array:


Splitting is reverse operation of Joining. Joining merges multiple arrays into one and
Splitting breaks one array into multiple.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6]) newarr = np.array_split(arr, 3)
print(newarr)

14. Searching Arrays:


We can search an array for a certain value, and return the indexes that get a match. To
search an array, use the where() method.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4]) x = np.where(arr == 4)
print(x)

15. Sorting:
Sorting means putting elements in an ordered sequence. Ordered sequence is any
sequence that has an order corresponding to elements, like numeric or alphabetical,
ascending or descending. The NumPy ndarray object has a function called sort(), that
will sort a specified array.
import numpy as np
arr = np.array([3, 2, 0, 1]) print(np.sort(arr))
16. Filtering Arrays:
Getting some elements out of an existing array and creating a new array out of them is
called filtering. In NumPy, you filter an array using a boolean index list.
import numpy as np
arr = np.array([41, 42, 43, 44]) x = [True, False, True, False] newarr = arr[x]
print(newarr)
OUTPUT:

RESULT:

Thus the Python program to execute the basic operation of Numpy array has
been executed successfully.
EX.NO.2B ARITHMETIC OPERATION USING NUMPY ARRAYS

AIM:

To perform different Arithmetic operations using NumPy arrays in Python.

ALGORITHM:

Step1: Start the program

Step2: Importing the NumPy array as np

Step3: Assign the matrix1 element in variables in ‘a’ Step4: Assign the matrix2 element
in variables in ‘b’

Step4: Calculate the arithmetic operations using NumPy packages.

Step5: Display results of addition, subtraction, multiplication and division

Step 6: Stop the program


PROGRAM:

import numpy as np

a = np.array([[1,2,3], [4,5,6], [7,8,9]])

print("The first matrix value is ::>",a)

b = np.array([[2,3,4],[5,6,7], [8,9,10]])

print("The second matrix value is ::>",b)

mul= np.multiply(a,b)

add= np.add(a,b)

sub=np.subtract(a,b)

div=np.divide(a,b)

print("Addition Matrix Resultant is ::>",add)

print("Subtraction Matrix Resultant is ::>",sub)

print("Division Matrix Resultant is ::>",div)

print("Multiplication Matrix Resultant is ::>",mul)


OUTPUT:

RESULT:

Thus the Python program to perform Arithmetic operation using NumPy arrays has
been executed and verified successfully.
EX.NO.3 WORKING WITH PANDAS DATA FRAMES

AIM:

To perform merge datasets and check uniqueness using Pandas Dataframes in python.

ALGORITHM:

Step 1: Start the program

Step 2: Importing the pandas as pd

Step 3: Create a dataframe as df with name, date of birth and age

Step 4: Create a new data frames by dropping some values.

Step 4: Check if merge keys are unique in both datasets and print their values

Step 5: Check if merge keys are unique in left dataset and print their values.

Step 6: Check if merge keys are unique in right dataset and print their values.

Step 7: Stop the program


PROGRAM:

import pandas as pd

df = pd.DataFrame({ 'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha

Hinton', 'Syed Wharton'],

'Date_Of_Birth ': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],

'Age': [18.5, 21.2, 22.5, 22, 23]})

print("Original DataFrame:")

print(df)

df1 = df.copy(deep = True)

df = df.drop([0, 1])

df1 = df1.drop([2])

print("\nNew DataFrames:")

print(df) print(df1)

print('\n"one_to_one”: check if merge keys are unique in both left and right datasets:"')

df_one_to_one = pd.merge(df, df1, validate = "one_to_one")

print(df_one_to_one)

print('\n"one_to_many” or “1:m”: check if merge keys are unique in left dataset:')

df_one_to_many = pd.merge(df, df1, validate = "one_to_many")

print(df_one_to_many)

print('“many_to_one” or “m:1”: check if merge keys are unique in right dataset:')

df_many_to_one = pd.merge(df, df1, validate = "many_to_one")

print(df_many_to_one)
RESULT:

Thus the Python program to perform merge datasets and check uniqueness using
Pandas Dataframes was executed and verified successfully.
EX.NO.4 DESCRIPTIVE ANALYTICS

AIM:

To read data from text files, excel and the web and exploring various commands for

doing descriptive analytics on the iris data set.

ALGORITHM:

Step 1: Start the program

Step 2: Importing relevant libraries

Step 3: Data has been stored inside a csv file namely ‘iris”and loading the text file ,csv
files and excel files to read the data.

Step 4: Gaining information from data for exploratory data analysis.

Step 5: Uni-variate analysis and Bi-variate analysis define the comparison between
various species based on sepal length and width.

Step 6: Box plots to know about distribution of data.

Step 7: Plotting the histogram & probability density function (pdf) with each feature as
a variable on x-axis and it’s histogram and corresponding kernel density plot
on y-axis.

Step 8: For pre-processing the data, load the standard scalars

Step 9:For transforming the data label encoder is loaded from sklearn.

Step 10: Model selection is analysed on the support vector machine.

Step 11: Predicting the data’s by using kneighbors classification algorithm and find
their accuracy.

Step 12: Stop the program


PROGRAM:

#DATA COLLECT

import pandas as pd

import numpy as np

importmatplotlib.pyplot as plt

importseaborn as sns

dataset=pd.read_csv("iris.txt")

dataset.head()

dataset=pd.read_excel("iris.xlsx")

dataset.head()

dataset=pd.read_csv("iris.csv")

dataset.head()

dataset.info()

dataset.Species.unique()

#EDA

dataset.describe()

dataset.corr()

dataset.Species.value_counts()

sns.FacetGrid(dataset,hue="Species",size=6).map(plt.scatter,"Sepal.Length","Sepal.Width")

add_legend()

sns.FacetGrid(dataset,hue="Species",size=6).map(plt.scatter,"Petal.Length","Petal.Widh")

add_legend()

sns.pairplot(dataset,hue="Species")

plt.hist(dataset["Sepal.Length"],bin=25);

sns.FacetGrid(dataset,hue="Species",size=6).map(sns.displot,"Sepal.Width").add_legend();

sns.boxplot(x='Species',y='Petal.Length',data=dataset)
#PREPROCESSING

fromsklearn.preprocessing import StandardScaler

ss=StandardScaler()

x=dataset.drop(['Species'],axis=1) y=dataset['Species']

scaler=ss.fit(x)

x_stdscaler=scaler.transform(x) x_stdscaler

fromsklearn.preprocessing import LabelEncoder

le=LabelEncoder()

y=le.fit_transform(y)

#SPLITTING

From sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=42)

x_train.value_counts

#MODEL SELECTION

From sklearn.svm import SVC

svc=SVC(kernel="linear")

svc.fit(x_train,y_train)

y_pred=svc.predict(x_test)

y_pred

fromsklearn.metrics import accuracy_score

accuracy_score(y_pred,y_test)

#PREDICTION

fromsklearn.neighbors import KNeighborsClassifier

knn=KNeighborsClassifier(n_neighbors=3)

knn.fit(x_train,y_train)

KNeighborsClassifier(n_neighbors=3)

y_pred=knn.predict(x_test)

accuracy_score(y_pred,y_test)
OUTPUT:

DATASET HEADS:

Unnamed Sepal. Sepal.


Petal.Length Petal.Width Species
:0 Length Width

0 1 5.1 3.5 1.4 0.2 setosa

1 2 4.9 3.0 1.4 0.2 setosa

2 3 4.7 3.2 1.3 0.2 setosa

3 4 4.6 3.1 1.5 0.2 setosa

4 5 5.0 3.6 1.4 0.2 setosa

DATASET INFORMATION:
<class
'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype

0 Unnamed: 0 150 non-null int64


1 Sepal.Length 150 non-null float64
2 Sepal.Width 150 non-null float64
3 Petal.Length 150 non-null float64
4 Petal.Width 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB
DATASET UNIQUE:

array(['setosa', 'versicolor', 'virginica'], dtype=object)

DATASET SPECIES VALUE COUNTS:

setosa 50

versicolor 50

virginica 50

Name: Species, dtype: int64

DATASET DESCRIPTION:

Unnamed: 0 Sepal.Length Sepal.Width Petal.Length Petal.Width

150.0000
count 150.000000 150.000000 150.000000 150.000000
00

mean 75.500000 5.843333 3.057333 3.758000 1.199333

std 43.445368 0.828066 0.435866 1.765298 0.762238

min 1.000000 4.300000 2.000000 1.000000 0.100000

25% 38.250000 5.100000 2.800000 1.600000 0.300000

50% 75.500000 5.800000 3.000000 4.350000 1.300000

75% 112.750000 6.400000 3.300000 5.100000 1.800000

max 150.000000 7.900000 4.400000 6.900000 2.500000

DATASET CORRELATION:
Sepal.Length
Unnamed: 0 sepal.Width Petal.Length Petal.Width

Unnamed: 0 1.000000 0.716676 -0.402301 0.882637 0.900027

Sepal.Length 0.716676 1.000000 -0.117570 0.871754 0.817941

Sepal.Width -0.402301 -0.117570 1.000000 -0.428440 -0.366126

Petal.Length 0.882637 0.871754 -0.428440 1.000000 0.962865

Petal.Width 0.900027 0.817941 0.366126 0.962865 1.000000

SCATTER PLOT:
PAIRPLOT:

HISTOGRAM:
BOXPLOT:

PREPROCESSING:

array([[-1.72054204e+00, -9.00681170e-01, 1.01900435e+00,

-1.34022653e+00, -1.31544430e+00],

[-1.69744751e+00, -1.14301691e+00, -1.31979479e-01,

-1.34022653e+00, -1.31544430e+00],

[-1.67435299e+00, -1.38535265e+00, 3.28414053e-01,

-1.39706395e+00, -1.31544430e+00],

[-1.65125846e+00, -1.50652052e+00, 9.82172869e-02,

-1.28338910e+00, -1.31544430e+00],

[-1.58197489e+00, -1.50652052e+00, 7.88807586e-01, [-2.42492502e-01, -2.94841818e-01, -


3.62176246e-01, 7.62758269e-01, 7.90670654e-01]])
SPLITTING:

bound method DataFrame.value_counts of Unnamed: 0

Sepal.LengthSepal.WidthPetal.LengthPetal.Width

81 82 5.5 2.4 3.7 1.0

133 134 6.3 2.8 5.1 1.5

137 138 6.4 3.1 5.5 1.8

75 76 6.6 3.0 4.4 1.4

109 110 7.2 3.6 6.1 2.5

.. ... ... ... ... ...

71 72 6.1 2.8 4.0 1.3

106 107 4.9 2.5 4.5 1.7

14 15 5.8 4.0 1.2 0.2

92 93 5.8 2.6 4.0 1.2

102 103 7.1 3.0 5.9 2.1

[105 rows x 5 columns]>

MODEL SELECTION:

1.0

PREDICTION:

1.0
RESULT:

Thus the python program to perform Descriptive analytics on the iris data set has been

executed successfully.
UNIVARIATE ANALYSIS FOR INDIANS DIABETES
EX.NO.5A
DATASET

AIM:

To perform the Univariate analysis for Indians Diabetes data set using Python.

ALGORITHM:

Step 1: Start

Step 2: Importing the python library packages.

Step 3: Download and read the Indians Diabetes data set as CSV format

Step 4: Find the mean values from 0:5 data’s and particular value for skin data’s.

Step 5: Find the median values from 0:5 data’s and particular value for skin data’s.

Step 6: Find the standard deviation values from 0:5 data’s and particular value for skin
data’s.

Step 7: Find the mean values from 0:5 data’s and particular value for skin data’s.

Step 8: Find the mode, variance, skewness and kurtosis by using the commands.

Step 9: Plot the mean, median values in density graph.

Step 10: End


PROGRAM:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

df=pd.read_csv("diabetes_csv.csv")

df.head()

df.skin.value_counts()

df.mean(axis = 0)

print(df.loc[:,'skin'].mean())

df.mean(axis = 1)[0:5]

df.median()

print(df.loc[:,'skin'].median())

df.median(axis = 1)[0:5] df.mode()

df.std() print(df.loc[:,'skin'].std())

df.std(axis = 1)[0:5]

df.var()

print(df.skew())

df.describe()

df.describe(include='all')

print(df.kurtosis())

norm_data = pd.DataFrame(np.random.normal(size=100000)) norm_data.plot(kind="density",


figsize=(10,10));

# Plot black line at mean

plt.vlines(norm_data.mean(), ymin=0, ymax=0.4,linewidth=5.0); # Plot red line at median

plt.vlines(norm_data.median(), ymin=0, ymax=0.4, linewidth=2.0, color="red");


OUTPUT:

HEAD DATA’S:

preg Plas pres skin insu mass pedi age class

0 6 148 72 35 0 33.6 0.627 50 tested_positive

1 1 85 66 29 0 26.6 0.351 31 tested_negative

2 8 183 64 0 0 23.3 0.672 32 tested_positive

3 1 89 66 23 94 28.1 0.167 21 tested_negative

4 0 137 40 35 168 43.1 2.288 33 tested_positive

FREQUENCY:

0 227
32 31
30 27
27 23
23 22
33 20
28 20
18 20
31 19
19 18
39 18
29 17
40 16
25 16

MEAN:

20.536458333333332

0 43.153375

1 29.868875
2 38.871500

3 40.283375

4 57.298500

dtype: float64

MODE:
preg plas pres skin insu mass pedi age class

0 1.0 99 70.0 0.0 0.0 32.0 0.254 22.0 tested_negative

1 NaN 100 NaN NaN NaN NaN 0.258 NaN NaN

MEDIAN:

23.0

0 34.30

1 27.80

2 15.65

3 25.55

4 37.50

dtype: float64

STANDARD DEVIATION:

15.952217567727677

0 49.397286

1 31.519803

2 62.253392

3 37.591100

4 61.533847
VARIANCE:

preg 11.354056

plas 1022.248314

pres 374.647271

skin 254.473245

insu 13281.180078

mass 62.159984

pedi 0.109779

age 138.303046

dtype: float64

SKEWNESS:

preg 0.901674

plas 0.173754

pres -1.843608

skin 0.109372

insu 2.272251

dtype: float64

KURTOSIS:

preg 0.159220

plas 0.640780

pres 5.180157

skin -0.520072

insu 7.214260

mass 3.290443

pedi 5.594954

age 0.643159

dtype: float64
GRAPH:

RESULT:

Thus the Univariate analysis is performed in the Indians Diabetes data set.
UNIVARIATE ANALYSIS FOR PIMA INDIANS DIABETES
EX.NO.5B
DATA SET

AIM:

To find the Frequency, Mean, Median, Mode, Variance, Standard Deviation, Skewness
and Kurtosis for the Pima Indians Diabetes data set.

ALGORITHM:

Step 1: Start

Step 2: Importing the python library packages.

Step 3: Download and read the Pima Indians Diabetes data set as CSV format

Step 4: Find the mean values from 0:5 data’s and particular value for column 35 data’s.

Step 5: Find the median values from 0:5 data’s and particular value for column 35
data’s.

Step 6: Find the standard deviation values from 0:5 data’s and particular value for
column 35 data’s.

Step 7: Find the mean values from 0:5 data’s and particular value for column 35 data’s.

Step 8: Find the mode, variance, skewness and kurtosis by using the commands.

Step 9: Plot the mean, median values in density graph.

Step 10: End


PROGRAM:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

df=pd.read_csv("pima-indians-diabetes.csv")

df.head()

df.mean(axis = 0)

print(df.loc[:,'35'].mean())

df.mean(axis = 1)[0:5]

df.median()

print(df.loc[:,'33.6'].median())

df.median(axis = 1)[0:5] df.mode()

df.std()

print(df.loc[:,'35'].std())

df.std(axis = 1)[0:5] df.var()

print(df.skew())

print(df.kurtosis())

norm_data = pd.DataFrame(np.random.normal(size=100000))
norm_data.plot(kind="density",figsize=(10,10));

# Plot black line at mean

plt.vlines(norm_data.mean(),ymin=0, ymax=0.4,linewidth=5.0); # Plot red line at median

plt.vlines(norm_data.median(), ymin=0, ymax=0.4, linewidth=2.0,color="red");


OUTPUT:

HEAD DATA’S:

6 148 72 35 0 33.6 0.627 50 1

0 1 85 66 29 0 26.6 0.351 31 0

1 8 183 64 0 0 23.3 0.672 32 1

2 1 89 66 23 94 28.1 0.167 21 0

3 0 137 40 35 168 43.1 2.288 33 1

4 5 116 74 0 0 25.6 0.201 30 0


MEAN:

20.517601043024772

0 26.550111

1 34.663556

2 35.807444

3 51.043111

4 27.866778

dtype: float64

MODE:

6 148 72 35 0 33.6 0.627 50 1

0 1.0 99 70.0 0.0 0.0 32.0 0.254 22.0 0.0

1 NaN 100 NaN NaN NaN NaN 0.258 NaN NaN


MEDIAN:

32.0

0 26.6

1 8.0

2 23.0

3 35.0

4 5.0

dtype: float64

STANDARD DEVIATION:

15.954059060433842

0 31.119744

1 59.585320

2 37.639873

3 60.541569

4 41.114755

dtype: float64

VARIANCE:

6 11.362809

148 1022.622445

72 375.125415

35 254.532001

0 13290.194335

33.6 62.237755

0.627 0.109890

50 138.116452

1 0.227226

dtype: float64
SKEWNESS:

6 0.903976

148 0.176412

72 -1.841911

35 0.112058

0 2.270630

33.6 -0.427950

0.627 1.921190

50 1.135165

1 0.638949

dtype: float64

KURTOSIS:

6 0.161293

148 0.642992

72 5.168578

35 -0.518325

0 7.205266

33.6 3.282498

0.627 5.593374

50 0.660872

1 -1.595913

dtype: float64
GRAPH:

RESULT:

Thus the Univariate analysis are performed in the Pima Indians Diabetes data set.
BIVARIATE ANALYSIS: LINEAR AND LOGISTIC
EX.NO.5C
REGRESSION MODELING

AIM:

To perform Bivariate analysis are Linear and logistic regression modeling using the
diabetes data set from UCI.

ALGORITHM:

Step 1: Start the program.

Step 2: Import the Seaborn ,Mapplotlib , Numpy Pandas library packages.

Step 3: Get the diabetes dataset and read this file as csv format.

Step 4: Display the data’s of dataset by using head command.

Step 5: Description of dataset are displayed using DESCR command.

Step 6: Split the data into training and testing data by importing train_test_split from
sklearn.model_selection

Step 7: Linear Regression are imported from sklearn.linear_model for calculating the
coefficient

Step 8: Mean square error and r2 score values are calculated

Step 9: Scatter plots are plotted for predicted and actual value.

Step 10: Linear regression line is plotted by importing the mapplotlib.

Step 11: For logistic regression the modelling score is calculated for test dataset.

Step 12: Stop the program.


PROGRAM:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn import datasets

%matplotlib inline

diabetes=pd.read_csv("C:\\Users\\KSK\\Documents\\diabetes.csv")

diabetes.head()

diabetes = datasets.load_diabetes()

print(diabetes.DESCR)

diabetes.feature_names

# Now we will split the data into the independent and independent variable

X = diabetes.data[:,np.newaxis,3]

Y = diabetes.target

#We will split the data into training and testing data fromsklearn.model_selection

import train_test_split x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.3)

# Linear Regression

fromsklearn.linear_model import LinearRegression

reg=LinearRegression()

reg.fit(x_train,y_train)

y_pred = reg.predict(x_test)

Coef=reg.coef_

print(Coef)

fromsklearn.metrics import mean_squared_error, r2_score

MSE=mean_squared_error(y_test,y_pred)

R2=r2_score(y_test,y_pred) print(R2,MSE)

frommatplotlib.pyplot

import * importmatplotlib.pyplot as plt


plt.scatter(y_pred, y_test)

plt.title('Predicted data vs Real Data')

plt.xlabel('y_pred') plt.ylabel('y_test')

plt.show() plt.scatter(x_test, y_test)

plt.plot(x_test,y_pred,linewidth=2)

plt.title('Linear Regression')

plt.xlabel('y_pred')

plt.ylabel('y_test')

plt.show()

model = LogisticRegression()

model.fit(x_train,y_train)

y_predict=model.predict(x_test)

model_score = model.score(x_test,y_test)

print(model_score)

print(metrics.confusion_matrix(y_test, y_predict))
OUTPUT:

DIABETES DESCRIPTION:

Diabetes dataset

Ten baseline variables, age, sex, body mass index, average blood

Pressure, and six blood serum measurements were obtained for each of n = 442 diabetes
patients, as well as the response of interest, a

Quantitative measure of disease progression one year after baseline.

**Data Set Characteristics: **

: Number of Instances: 442

: Number of Attributes: First 10 columns are numeric predictive values

: Target: Column 11 is a quantitative measure of disease progression one year after


baseline

: Attribute Information:

- Age age in years

- Sex

- bmi body mass index

- bp average blood pressure

- s1 tc, total serum cholesterol

- s2 ldl, low-density lipoproteins

- s3 hdl, high-density lipoproteins

- s4 tch, total cholesterol / HDL

- s5 ltg, possibly log of serum triglycerides level

- s6 glu, blood sugar level

COEFFICIENT VALUE:

[731.87600042]
MEAN SQUARE ERROR AND R2 VALUE:

0.16465773342986756 & 4765.090270861111

PREDICTED DATA VS REAL DATA:

LINEAR REGRESSION:
MODEL SCORE FOR LOGISTIC REGRESSION:

0.007518796992481203

CONFUSION MATRIX FOR LOGISTIC REGRESSION:

[[130 17]

[ 38 46]]

RESULT:

Thus Bivariate analysis is Linear and logistic regression modeling using the diabetes
Data set from UCI is performed successfully.
EX.NO.5D MULTIPLE REGRESSION ANALYSIS

AIM:

To perform Multiple regression analysis using the diabetes data set from UCI.

ALGORITHM:

Step 1: Start the program

Step 2: Import the Statsmodel ,Mapplotlib , Numpy Pandas library packages.

Step 3: Get the diabetes dataset and read this file as csv format.

Step 4 : Display the data’s of dataset by using head command.

Step 5: Importing the anova linear model from statsmodel

Step 6: Multiple linear regression using age and bmi as a predictor

Step 7: add an intercept (beta_0) to our model

Step 8: Make the predictions by the model and printout the statistics

Step 9: Stop the Program.


PROGRAM:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

from sklearn import datasets %matplotlib inline

diabetes=pd.read_csv("C:\\Users\\KSK\\Documents\\FDS LAb\\diabetes.csv")

diabetes.head()

importstatsmodels.api as sm

fromstatsmodels.stats.anova import anova_lm

X = diabetes[["Age", "BMI"]]## the input variables

y = diabetes["Glucose"] ## the output variables, the one you want to predict

X = sm.add_constant(X) ## let's add an intercept (beta_0) to our model

# Note the difference in argument order model2 = sm.OLS(y, X).fit()

predictions = model2.predict(X) # make the predictions by the model # Print out the
statistics

model2.summary()
OUTPUT:

HEAD DATA’S:

Blood Skin DiabetesPedigree


Pregnancies Glucose Insulin BMI Age Outcome
Pressure Thickness
Function

0 6 148 72 35 0 33.6 0.627 50 1

1 1 85 66 29 0 26.6 0.351 31 0

2 8 183 64 0 0 23.3 0.672 32 1

3 1 89 66 23 94 28.1 0.167 21 0

4 0 137 40 35 168 43.1 2.288 33 1


OLS Regression Results

Dep. Variable: Glucose R-squared: 0.114

Model: OLS Adj. R-squared: 0.112

Method: Least Squares F-statistic: 49.33

Date: Tue, 08 Nov 2022 Prob (F-statistic): 7.05e-21

Time: 22:28:35 Log-Likelihood: -3703.7

No. Observations: 768 AIC: 7413.

Df Residuals: 765 BIC: 7427.

Df Model: 2

Covariance Type: nonrobust

coef std err t P>|t| [0.025 0.975]

const 70.2952 5.402 13.013 0.000 59.691 80.899

Age 0.6955 0.093 7.514 0.000 0.514 0.877

BMI 0.8589 0.138 6.220 0.000 0.588 1.130

Omnibus: 18.855 Durbin-Watson: 1.836

Prob(Omnibus): 0.000 Jarque-Bera (JB): 38.868

Skew: -0.007 Prob(JB): 3.63e-09

Kurtosis: 4.102 Cond. No. 235.


RESULT:

Thus multiple regression analysis using the Pima Indian diabetes data set from UCI is
performed successfully.
APPLY AND EXPLORE NORMAL CURVES PLOTTING
EX.NO.6A
FUNCTIONS ON UCI DATA SETS

AIM:

To Apply and explore Normal curves plotting functions on Titanic data sets.

ALGORITHM:

Step 1: Start the program

Step 2: Import the Seaborn ,Mapplotlib , Numpy Pandas library packages.

Step 3: Get the Titanic dataset and read this file as csv format.

Step 4: Display the data’s of dataset by using head command.

Step 5: Mean and standard deviation values are calculated for plotting the normal
curve.

Step 6: Stop the program.


PROGRAM:

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv")

df.head()

mean = df.loc[:,'Fare'].mean()

sd = df.loc[:,'Fare'].std()

plt.plot(x_axis, norm.pdf(x_axis, mean, sd))

plt.show()
OUTPUT:

NORMAL CURVE:

RESULT:

Thus the Normal curves plotting functions are explored on Titanic data sets
successfully.
APPLY AND EXPLORE DENSITY AND CONTOUR
EX.NO.6B
PLOTTING FUNCTIONS ON UCI DATA SETS

AIM:

To Apply and explore Density and Contour plotting functions on Titanic data sets.

ALGORITHM:

Step 1: Start the program

Step 2: Import the Seaborn ,Mapplotlib , Numpy Pandas library packages.

Step 3: Get the Titanic dataset and read this file as csv format.

Step 4: Display the data’s of dataset by using head command.

Step 5: The density plots are plotted for the Fare data’s in titanic dataset.

Step 6: Contour plots are plotted for Fare and Parch data’s in titanic dataset.

Step 7: Stop the program.


PROGRAM:

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv")

df.head()

sns.distplot(df["Fare"]) sns.distplot(df["Age"])

plt.contour(df[["Fare","Parch"]])
OUTPUT:

DENSITY PLOT:

CONTOUR PLOT:
RESULT:

Thus the Density and Contour plotting functions are explored on Train data sets
successfully.
APPLY AND EXPLORE CORRELATION AND SCATTER
EX.NO.6C
PLOTTING FUNCTIONS ON UCI DATA SETS.

AIM:

To Apply and explore correlation and scatter plotting functions on Titanic data sets.

ALGORITHM:

Step 1: Start the program

Step 2: Import the Seaborn ,Mapplotlib , Numpy Pandas library packages.

Step 3: Get the Titanic dataset and read this file as csv format.

Step 4: Display the data’s of dataset by using head command.

Step 5: The scatters are plotted for the Fare data’s in titanic dataset.

Step 6: Correlation values are calculated and plotted by using heap map.

Step 7: Stop the program.


PROGRAM:

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv") df.head()

plt.figure(figsize=(8,8))

sn.scatterplot(x="Age", y="Fare", hue="Sex", data=df) plt.show()

df.corr()

# plotting correlation heatmap

dataplot = sns.heatmap(df.corr(), cmap="YlGnBu", annot=True) # displaying heatmap

plt.show()
OUTPUT:

SCATTER PLOT:

HEAP MAP:
RESULT:

Thus the correlation and scatter plotting functions are explored on Train data sets
successfully.
APPLY AND EXPLORE HISTOGRAM PLOTTING
EX.NO.6D
FUNCTIONS ON UCI DATA SETS.

AIM:

To Apply and explore histogram plotting functions on Titanic data sets.

ALGORITHM:

Step 1: Start the program

Step 2: Import the Seaborn ,Mapplotlib , Numpy Pandas library packages.

Step 3: Get the Titanic dataset and read this file as csv format.

Step 4: Display the data’s of dataset by using head command.

Step 5: The histograms are plotted for the Fare data’s in titanic dataset by using the hits
commands.

Step 6: Stop the program.


PROGRAM:

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv")

df.head()

plt.hist(df["Fare"])
OUTPUT:

HISTOGRAM:

array([732., 106., 31., 2., 11., 6., 0., 0., 0., 3.]),

array([ 0. , 51.23292, 102.46584, 153.69876, 204.93168, 256.1646 ,

307.39752, 358.63044, 409.86336, 461.09628, 512.3292 ]),

<BarContainer object of 10 artists>)


RESULT:

Thus the histogram plotting functions are explored on Train data sets successfully.
APPLY AND EXPLORE THREE DIMENSIONAL
EX.NO.6E
PLOTTING FUNCTIONS ON UCI DATA SETS

AIM:

To Apply and explore Three dimensional plotting functions on Titanic data sets.

ALGORITHM:

Step 1: Start the program

Step 2: Import the Seaborn ,Mapplotlib , Numpy Pandas library packages.

Step 3: Get the Titanic dataset and read this file as csv format.

Step 4: Display the data’s of dataset by using head command.

Step 5: Three dimensional diagram is plotted for Age, Fare and Parch by importing the
3d plot.

Step 6: Stop the program.


PROGRAM:

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

frommpl_toolkits import mplot3d


df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv") df.head()

%matplotlib inline

fig = plt.figure(figsize=(8,8)) ax = plt.axes(projection='3d') ax =


plt.axes(projection='3d') zline = np.linspace(0, 15, 1000) xline = np.sin(zline)

yline = np.cos(zline) ax.plot3D(xline, yline, zline, 'gray') zdata = df[["Fare"]]

xdata = df[["Age"]]

ydata = df[["Parch"]]

ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Greens');


OUTPUT:

THREE DIMENSIONAL LINES:

THREE DIMENSIONAL SCATTERPLOT:


RESULT:

Thus the Three Dimensional plotting functions are explored on Train data sets
successfully.
EX.NO.7 VISUALIZING GEOGRAPHIC DATA WITH BASEMAP

AIM:

To visualizing the Geographic Data with Basemap by using python library packages.

ALGORITHM:

Step 1: Start the program

Step 2: Installed and import the base map toolkit packages.

Step 3: Ortho projections are shown by 8x8 size.

Step 4:Etopo image which shows topographical features both on land and under the
ocean is used as the map background.

Step 5: Cylindrical projections, which lines of constant latitude and longitude are
mapped to horizontal and vertical lines are shown Step 6: Moll is constructed s to preserve area
across the map to show pseudo cylindrical projections.

Step 7: Perspective projection ortho shows one side of the globe as seen from a viewer
at a very long distance.

Step 8: Conic projection projects the map onto a single cone, which is then unrolled
using ICC.

Step 9: Stop the program.


%matplotlib inline import numpy as np

import matplotlib.pyplot as plt

frommpl_toolkits.basemap i

mport Basemap plt.figure(figsize=(8, 8))

m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=-100)


m.bluemarble(scale=0.5);

fig = plt.figure(figsize=(8, 8))

m = Basemap(projection='lcc', resolution=None, width=8E6, height=8E6,

lat_0=45, lon_0=-100,) m.etopo(scale=0.5, alpha=0.5) x, y = m(-122.3, 47.6)

plt.plot(x, y, 'ok', markersize=5) plt.text(x, y, ' Seattle', fontsize=12);

fig = plt.figure(figsize=(8, 6), edgecolor='w')

m = Basemap(projection='cyl', resolution=None, llcrnrlat=-90, urcrnrlat=90,

llcrnrlon=-180, urcrnrlon=180, ) draw_map(m)

fig = plt.figure(figsize=(8, 6), edgecolor='w')

m = Basemap(projection='moll', resolution=None, lat_0=0, lon_0=0)

draw_map(m)

fig = plt.figure(figsize=(8, 8))

m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=0)

draw_map(m);

fig = plt.figure(figsize=(8, 8))

m = Basemap(projection='lcc', resolution=None, lon_0=0, lat_0=50, lat_1=45,


lat_2=55, width=1.6E7, height=1.2E7)

draw_map(m)
OUTPUT:

ORTHO PROJECTION:

MAPPING LONGITUDE AND LATITUDE:


CYLINDRICAL PROJECTIONS:

PSEUDO-CYLINDRICAL PROJECTIONS:
PERSPECTIVE PROJECTION:

CONIC PROJECTION:
RESULT:

Thus the Geographic Data with base map has been visualized by using python library
packages.

You might also like