0% found this document useful (0 votes)
6 views

IML LabManual (1)

Uploaded by

midivy41
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

IML LabManual (1)

Uploaded by

midivy41
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Apollo Institute of Engineering and

Technology

Lab Manual
D.I Semester: - V

Subject: - Introduction to Machine Learning


(4350702)
CERTIFICATE

This is to certify that Mr./Ms.


Enrollment No. Semester
Branch has satisfactorily completed his/her
term work in Course of “Introduction to Machine learning” with
subject code 4350702 for the academic term 2023-2024.

Staff In-charge Date of Submission Head of Department


INDEX

Page
Sr. No. Practical Aim Date Sign
No.

Explore any one machine learning tool like Jupyter


1.
Notebook..
Write a NumPy program to implement the following
operation.
● to convert a list of numeric values into aone-
2.
dimensional NumPy array
● to create a 3x3 matrix with values ranging from 2
to 10
Write a NumPy program to implement the following
operation.
3. ● to split an array of 14 elements into 3 arrays, each
with 2, 4, and 8 elements in the original order
● to stack arrays horizontally (column wise)
Write a NumPy program to implement the following
operation.
● to add, subtract, multiply, divide arguments
4.
element-wise
● to round elements of the array to the nearest
integer
Write a NumPy program to implement the following
operation.
● to find the maximum and minimum value of a
5.
given flattened array
● to compute the mean, standard deviation, and
variance of a given array along the second axis
Write a Pandas program to implement the following
operation.
6. ● to convert a NumPy array to a Pandas series
● to convert the first column of a DataFrame as a
Series
Write a Pandas program to implement the following
operation.
● to create a dataframe from a dictionary and display
7.
it.
● to sort the DataFrame first by 'name' in ascending
order.
Write a Pandas program to create a line plot of the
8. opening, closing stock prices of a given company
between two specific dates.
Write a Pandas program to create a plot of Open,
High, Low, Close, Adjusted Closing prices and
9.
Volume of given company between two specific
dates.
Write a Pandas program to implement the following
operation.
10. ● to find and drop the missing values from the given
dataset.
● to remove the duplicates from the given dataset.
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

PRACTICAL-1

AIM: Explore any one machine learning tool such as Jupyter Notebook..
DATE:

The Jupyter Notebook is an open-source web platform that allows developers to create and share
documents that consist of narrative text, live code, visualizations, and equations. The platform is
based on data visualization, data cleaning and transformation, machine learning (ML), numerical
simulation, and statistical modeling.

The Most Important Features of Jupyter Notebook

1. Exploratory Data Analysis (EDA) :


Jupyter Notebook allows developers to see the code results in-line without depending on other
parts of the code. In Jupyter Notebook, every cell of the code can be viewed at any time to come
up with a result. Due to this, the Notebook enables in-line printing of the outcome, which
becomes very useful for the exploratory data analysis (EDA) process. Compared to other
standard IDEs, this function isn’t available in alternatives such as VSCode or PyCharm.

2. Jupyter Notebook is Language-Independent:


Jupyter Notebook is language-independent and platform-independent due to its representation in
JSON format. Another significant reason is that the Jupyter Notebook supports multiple
programming languages and converts the code to various file formats such as PDF, Markdown,
HTML, and more.

3. Live Interactions with Code:


Jupyter Notebooks leverage the “ipywidgets” packages, which offer standard user interfaces (UI)
for exploring code and data interactivity. This makes it possible for users to edit code and send it
for a re-run, making the environment code non-static. It also allows developers to control input
sources for code and give feedback in the browser.

AIET(455) Page 1
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

4. Easy Caching in Built-in Cell:


It usually becomes challenging to maintain the state of execution of every cell. Therefore,
Jupyter makes this process extremely simple by automatically executing the tasks. The Jupyter
software does this by catching the outcomes of each running cell irrespective of whether it’s a
code training an ML model or a code downloading large volumes of training data from a remote
server.
5. Data Visualization:
As a component, the shared Jupyter Notebook supports data visualizations, including rendering a
few data sets such as charts and graphics. These data sets are primarily generated from codes
through modules such as Bokeh, Matplotlib, or Plotly. Moreover, Jupyter Notebook allows
machine learning developers to narrate visualizations along with sharing the code and data sets.

6. Jupyter Notebook Helps Document Code Samples:


Jupyter Notebook makes it easy for developers to explain their codes by providing notes all
along the way. Additionally, the Notebook allows developers to add notes interactively even with
a fully-functional code.

AIET(455) Page 2
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

PRACTICAL-2

AIM: Write a NumPy program to implement the following operation.


DATE:

Numpy is a general-purpose array-processing package. It provides a high-performance


multidimensional array object, and tools for working with these arrays. It is the fundamental
package for scientific computing with Python.

Creating a Numpy Array


Arrays in Numpy can be created by multiple ways, with various number of Ranks, defining the
size of the Array. Arrays can also be created with the use of various data types such as lists,
tuples, etc. The type of the resultant array is deduced from the type of the elements in the
sequences.

1. to convert a list of numeric values into a one-dimensional NumPy array

In Python, the simplest way to convert a list to a NumPy array is by using numpy.array()
function. It takes an argument and returns a NumPy array as a result. It creates a new copy in
memory and returns a new array.

# importing library

import numpy

# initializing list

lst = [1, 7, 0, 6, 2, 5, 6]

# converting list to array

AIET(455) Page 3
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

arr = numpy.array(lst)

# displaying list

print ("List: ", lst)

# displaying array

print ("Array: ", arr)

Output:

List: [1, 7, 0, 6, 2, 5, 6]

Array: [1 7 0 6 2 5 6]

2.to create a 3x3 matrix with values ranging from 2 to 10

Matrix is nothing but a rectangular arrangement of data or numbers. In other words, it is a


rectangular array of data or numbers.

import numpy as np

import pandas as pd

x = np.arange(2, 11).reshape(3,3)

print(x)

Output:

[[ 2 3 4]

AIET(455) Page 4
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

[ 5 6 7]

[ 8 9 10]]

AIET(455) Page 5
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

PRACTICAL-3

AIM: Write a NumPy program to implement the following operation.


DATE:

1. to split an array of 14 elements into 3 arrays, each with 2, 4, and 8 elements in the
original order

Splitting is the reverse operation of Joining.


Joining merges multiple arrays into one and Splitting breaks one array into multiple.
We use array_split() for splitting arrays, we pass it the array we want to split and the number of
splits.

import numpy as np
import pandas as pd
x = np.arange(1, 15)
print("Original array:",x)
print("After splitting:")
print(np.split(x, [2, 6]))

Explanation:
x = np.arange(1, 15): This line creates a 1D array x containing the integers 1 to 14.

print(np.split(x, [2, 6])): The np.split() function is used to split the array x into multiple
subarrays. The split indices are provided as a list [2, 6]. This means that the array x will be split
into three subarrays: from the beginning to index 2 (exclusive), from index 2 to index 6
(exclusive), and from index 6 to the end.

AIET(455) Page 6
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

OUTPUT:

Original array: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14]

After splitting:

[array([1, 2]), array([3, 4, 5, 6]), array([ 7, 8, 9, 10, 11, 12, 13, 14])]

2. to stack arrays horizontally (column wise)

stack() is used for joining multiple NumPy arrays. Unlike, concatenate(), it joins arrays along a
new axis. It returns a NumPy array.

to join 2 arrays, they must have the same shape and dimensions. (e.g. both (2,3)–> 2 rows,3
columns)

stack() creates a new array which has 1 more dimension than the input arrays. If we stack 2 1-D
arrays, the resultant array will have 2 dimensions.

there are 2 possible axis options :0 and 1. axis=0 means 1D input arrays will be stacked
row-wise. axis=1 means 1D input arrays will be stacked column-wise.

import numpy as np

# input array

a = np.array([1, 2, 3])

b = np.array([4, 5, 6])

# Stacking 2 1-d arrays

c = np.stack((a, b),axis=0)

AIET(455) Page 7
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

print(c)

Output:

array([[1, 2, 3],

[4, 5, 6]])

AIET(455) Page 8
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

PRACTICAL-4

AIM: Write a NumPy program to implement the following operation.


DATE:

1. to add, subtract, multiply, divide arguments element-wise

Arithmetic operations are possible only if the array has the same structure and dimensions. We
carry out the operations following the rules of array manipulation. We have both functions and
operators to perform these functions.

To perform each operation, we can either use the associated operator or built-in functions. For
example, to perform addition, we can either use the + operator or the add() built-in function.

import numpy as np

import pandas as pd

import numpy as np

print("Add:")

print(np.add(1.0, 4.0))

print("Subtract:")

print(np.subtract(1.0, 4.0))

print("Multiply:")

print(np.multiply(1.0, 4.0))

print("Divide:")

print(np.divide(1.0, 4.0))

OUTPUT:

Add:

AIET(455) Page 9
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

5.0

Subtract:

-3.0

Multiply:

4.0

Divide:

0.25

2. to round elements of the array to the nearest integer

#numpy.rint() function of Python that can convert the elements of an array to the nearest integer.

import numpy as np

import pandas as pd

x = np.array([-.7, -1.5, -1.7, 0.3, 1.5, 1.8, 2.0])

print("Original array:")

print(x)

x = np.rint(x)

print("Round elements of the array to the nearest integer:")

print(x)

OUTPUT

Original array:

AIET(455) Page 10
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

[-0.7 -1.5 -1.7 0.3 1.5 1.8 2. ]

Round elements of the array to the nearest integer:

[-1. -2. -2. 0. 2. 2. 2.]

AIET(455) Page 11
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

PRACTICAL-5

AIM: Write a NumPy program to implement the following operation.


DATE:

1. to find the maximum and minimum value of a given flattened array


we create a single-dimensional NumPy array of integers. Now try to find the maximum element.
To do this we have to use the numpy.max(“array name”) function.

import numpy as np

import pandas as pd

a = np.arange(4).reshape((2,2))

print("Original flattened array:")

print(a)

print("Maximum value of the above flattened array:")

print(np.amax(a))

print("Minimum value of the above flattened array:")

print(np.amin(a))

OUTPUT:

Original flattened array:


[[0 1]
[2 3]]
Maximum value of the above flattened array:
3

AIET(455) Page 12
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

Minimum value of the above flattened array:


0

2. to compute the mean, standard deviation, and variance of a given array along the second
axis

In NumPy, we can compute the mean, standard deviation, and variance of a given array along the
second axis by two approaches: first is by using inbuilt functions and second is by the formulas
of the mean, standard deviation, and variance.

import numpy as np

import pandas as pd

x = np.arange(6)

print("\nOriginal array:")

print(x)

r1 = np.mean(x)

r2 = np.average(x)

assert np.allclose(r1, r2)

print("\nMean: ", r1)

r1 = np.std(x)

r2 = np.sqrt(np.mean((x - np.mean(x)) ** 2 ))

assert np.allclose(r1, r2)

print("\nstd: ", 1)

r1= np.var(x)

AIET(455) Page 13
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

r2 = np.mean((x - np.mean(x)) ** 2 )

assert np.allclose(r1, r2)

print("\nvariance: ", r1)

OUTPUT

Original array:

[0 1 2 3 4 5]

Mean: 2.5

std: 1

variance: 2.916666666666666

AIET(455) Page 14
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

PRACTICAL-6

AIM: Write a Pandas program to implement the following operation.


DATE:

1. to convert a NumPy array to a Pandas series

Assuming we have a one-dimensional Numpy array with few values, and in the output, we will
see a converted pandas Series object from the numpy array. To convert a Numpy array to a
Pandas Series, we can use the pandas. Series() method.

import numpy as np

import pandas as pd

np_array = np.array([10, 20, 30, 40, 50])

print("NumPy array:")

print(np_array)

new_series = pd.Series(np_array)

print("Converted Pandas series:")

print(new_series)

OUTPUT

NumPy array:

[10 20 30 40 50]

Converted Pandas series:

0 10

1 20

AIET(455) Page 15
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

2 30

3 40

4 50

dtype: int64

2. To convert the first column of a DataFrame as a Series

It is possible in pandas to convert columns of the pandas Data frame to series. Sometimes there is
a need to converting columns of the data frame to another type like series for analyzing the data
set.

# Importing pandas module

import pandas as pd

# Creating a dictionary

dit = {'August': [10, 25, 34, 4.85, 71.2, 1.1],

'September': [4.8, 54, 68, 9.25, 58, 0.9],

'October': [78, 5.8, 8.52, 12, 1.6, 11],

'November': [100, 5.8, 50, 8.9, 77, 10] }

# Converting it to data frame

df = pd.DataFrame(data=dit)

# Original DataFrame

AIET(455) Page 16
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

df

OUTPUT

AIET(455) Page 17
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

PRACTICAL-7

AIM: Write a Pandas program to implement the following operation.


DATE:

1. to create a dataframe from a dictionary and display it.

Python dict (dictionary) which is a key-value pair can be used to create a pandas DataFrame, In
real-time, mostly we create a pandas DataFrame by reading a CSV file or from other sources
however sometimes you may need to create it from a dict (dictionary) object.

# import pandas library

import pandas as pd

# dictionary with list object in values

details = {

'Name' : ['Ankit', 'Aishwarya', 'Shaurya', 'Shivangi'],

'Age' : [23, 21, 22, 21],

'University' : ['BHU', 'JNU', 'DU', 'BHU'],

# creating a Dataframe object

df = pd.DataFrame(details)

df

AIET(455) Page 18
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

OUTPUT

2. to sort the DataFrame first by 'name' in ascending order.

# importing pandas library

import pandas as pd

# creating and initializing a nested list

age_list = [['Afghanistan', 1952, 8425333, 'Asia'],

['Australia', 1957, 9712569, 'Oceania'],

['Brazil', 1962, 76039390, 'Americas'],

['China', 1957, 637408000, 'Asia'],

['France', 1957, 44310863, 'Europe'],

['India', 1952, 3.72e+08, 'Asia'],

AIET(455) Page 19
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

['United States', 1957, 171984000, 'Americas']]

# creating a pandas dataframe

df = pd.DataFrame(age_list, columns=['Country', 'Year',

'Population', 'Continent'])

df

OUTPUT

AIET(455) Page 20
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

PRACTICAL-8

AIM: Write a Pandas program to create a line plot of the opening, closing stock prices of a given
company between two specific dates.
DATE:
A line plot is a graphical display that visually represents the correlation between certain variables
or changes in data over time using several points, usually ordered in their x-axis value, that are
connected by straight line segments. The independent variable is represented in the x-axis while
the y-axis represents the data that is changing depending on the x-axis variable, aka the
dependent variable.

To generate a line plot with pandas, we typically create a DataFrame* with the dataset to be
plotted. Then, the plot.line() method is called on the DataFrame.

import pandas as pd

import matplotlib.pyplot as plt

df = pd.read_csv("alphabet_stock_data.csv")

start_date = pd.to_datetime('2020-4-1')

end_date = pd.to_datetime('2020-4-30')

df['Date'] = pd.to_datetime(df['Date'])

new_df = (df['Date']>= start_date) & (df['Date']<= end_date)

df1 = df.loc[new_df]

df2 = df1.set_index('Date')

plt.figure(figsize=(6,6))

plt.suptitle('Trading Volume of Alphabet Inc. stock,\n01-04-2020 to 30-04-2020', fontsize=16,


color='black')

plt.xlabel("Date",fontsize=12, color='black')

AIET(455) Page 21
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

plt.ylabel("Trading Volume", fontsize=12, color='black')

df2['Volume'].plot(kind='bar');

plt.show()

OUTPUT

AIET(455) Page 22
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

PRACTICAL-9

AIM: Write a Pandas program to create a plot of Open, High, Low, Close, Adjusted Closing
prices and Volume of given company between two specific dates.

DATE:

What is Time Series Data

Time series data is a sequence of data points in chronological order that is used by businesses to
analyze past data and make future predictions. These data points are a set of observations at
specified times and equal intervals, typically with a datetime index and corresponding value.
Common examples of time series data in our day-to-day lives include:

Measuring weather temperatures

Measuring the number of taxi rides per month

Predicting a company’s stock prices for the next day

import pandas as pd

import matplotlib.pyplot as plt

df = pd.read_csv("alphabet_stock_data.csv")

start_date = pd.to_datetime('2020-4-1')

end_date = pd.to_datetime('2020-9-30')

df['Date'] = pd.to_datetime(df['Date'])

new_df = (df['Date']>= start_date) & (df['Date']<= end_date)

df1 = df.loc[new_df]

stock_data = df1.set_index('Date')

AIET(455) Page 23
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

stock_data.plot(subplots = True, figsize = (8, 8));

plt.legend(loc = 'best')

plt.suptitle('Open,High,Low,Close,Adj Close prices & Volume of Alphabet Inc., From


01-04-2020 to 30-09-2020', fontsize=12, color='black')

plt.show()

OUTPUT

AIET(455) Page 24
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

PRACTICAL-10

AIM: Write a Pandas program to implement the following operation.


DATE:

1. to find and drop the missing values from the given dataset.

In order to drop null values from a dataframe, we used dropna() function. This function drops
Rows/Columns of datasets with Null values in different ways.

# importing pandas as pd

import pandas as pd

# importing numpy as np

import numpy as np

# dictionary of lists

dict = {'First Score':[100, 90, np.nan, 95],

'Second Score': [30, np.nan, 45, 56],

'Third Score':[52, 40, 80, 98],

'Fourth Score':[np.nan, np.nan, np.nan, 65]}

# creating a dataframe from dictionary

df = pd.DataFrame(dict)

# using dropna() function

AIET(455) Page 25
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

df.dropna()

OUTPUT

2. to Remove Duplicates from Given Dataset

The drop_duplicates() method removes duplicate rows. Use the subset parameter if only some
specified columns should be considered when looking for duplicates.

import pandas as pd

# World alcohol consumption data

w_a_con = pd.read_csv('world_alcohol.csv')

print("World alcohol consumption sample data:")

print(w_a_con.head())

print("\nAfter removing the duplicates of WHO region column:")

print(w_a_con.drop_duplicates('WHO region'))

AIET(455) Page 26
Enrollment No.224550307069 Introduction to Machine Learning (4350702)

OUTPUT

World alcohol consumption sample data:

Year WHO region ... Beverage Types Display Value

0 1986 Western Pacific ... Wine 0.00

1 1986 Americas ... Other 0.50

2 1985 Africa ... Wine 1.62

3 1986 Americas ... Beer 4.27

4 1987 Americas ... Beer 1.98

[5 rows x 5 columns]

After removing the duplicates of WHO region column:

Year WHO region ............. Beverage Types Display Value

0 1986 Western Pacific ... Wine 0.00

1 1986 Americas ... Other 0.50

2 1985 Africa ... Wine 1.62

13 1984 Eastern Mediterranean ... Other 0.00

18 1984 Europe ... Spirits 1.62

20 1986 South-East Asia ... Wine 0.00

[6 rows x 5 columns]

AIET(455) Page 27

You might also like