III-II AIDS R22 ML
III-II AIDS R22 ML
MACHINE LEARNING
LAB MANUAL (R22)
B.Tech III Year – II Semester
ACADEMIC YEAR (2024-2025)
Department of AI&DS
VISION
To empower female students with professional education using creative & innovative technical
practices of global competence and research aptitude to become competitive engineers with ethical
values and entrepreneurial skills.
MISSION
To impart value based professional education through creative and innovative teaching-learning
process to face the global challenges of the new era technology.
To inculcate research aptitude and to bring out creativity in students by imparting engineering
knowledge imbibing interpersonal skills to promote innovation, research and entrepreneurship.
Vision:
To be a leading department of Artificial Intelligence and Data Science that provides
cutting-edge education, research, and innovation in the field, and prepares graduates
to become globally competitive professionals, researchers, and entrepreneurs.
Mission:
DM1: Providing comprehensive education and training in the principles, tools, and
applications of Artificial Intelligence and Data Science, to prepare graduates for a
wide range of careers and research opportunities.
PSO2: Ability to advanced techniques such as machine learning, deep learning, and
natural language processing to design and develop intelligent systems that can learn
from data and adapt to changing environments.
Program Outcomes :
PO2 : Problem Analysis: Identify, formulate, review research literature, and analyze
complex engineering problems reaching substantiated conclusions using first
principles of mathematics, natural sciences and Engineering sciences.
PO5 : Modern Tool Usage: Create, select and apply appropriate techniques,
resources and modern engineering and IT tools including prediction and modeling to
complex engineering activities with an understanding of the limitations.
PO6 : The Engineer and Society: Apply reasoning informed by the contextual
knowledge to assess societal, health, safety, legal and cultural issues and the
consequent responsibilities relevant to the professional engineering practice.
PO8 : Ethics: : Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.
PO12 : Life-Long Learning: Recognize the need for and have the preparation and
ability to engage in independent and lifelong learning in the broadest context of
technological change.
Course Structure
Course Code
Programme B.Tech III-II
Course Structure
Practical
L T P Credits
0 0 3 1.5
COURSE OBJECTIVES
S. NO. Course Objectives
The objective of this lab is to get an overview of the various machine learning
1
techniques and
can demonstrate them using python.
Select data, model selection, model complexity and identify the trends
CO2
ML LAB
SCHEME OF EVALUATION
Total Marks For Each Student to Evaluate In Lab :100 Marks
Code:
data = [12, 15, 14, 10, 8, 15, 16, 21, 18, 18]
# Central Tendency
mean = np.mean(data)
median = np.median(data)
mode = stats.mode(data)
# Measures of Dispersion
variance = np.var(data, ddof=1) # Sample variance
std_deviation = np.std(data, ddof=1) # Sample standard deviation
Output:
Descriptive statistics is about describing and summarizing data. It uses two main approaches:
There are many Python statistics libraries out there for you to work with, but in this tutorial,
you’ll be learning about some of the most popular and widely used ones:
Python’s statistics is a built-in Python library for descriptive You can use it if your
datasets are not too large or if you can’t rely on importing other libraries.
NumPy is a third-party library for numerical computing, optimized for working with
single- and multi-dimensional Its primary type is the array type called ndarray. This
library contains many routines for statistical analysis.
SciPy is a third-party library for scientific computing based on NumPy. It offers
additional functionality compared to NumPy, including scipy.stats for statistical
pandas is a third-party library for numerical computing based on It excels in handling
labeled one-dimensional (1D) data with Series objects and two- dimensional (2D) data
with DataFrame objects.
Matplotlib is a third-party library for data visualization. It works well in combination
with NumPy, SciPy, and pandas.
Note that, in many cases, Series and DataFrame objects can be used in place of NumPy arrays.
Often, you might just pass them to a NumPy or SciPy statistical function. In addition, you can
get the unlabeled data from a Series or DataFrame as a np.ndarray object by calling .values or
.to_numpy().
The built-in Python statistics library has a relatively small number of the most important
statistics functions. The official documentation is a valuable resource to find the details. If
you’re limited to pure Python, then the Python statistics library might be the right choice.
A good place to start learning about NumPy is the official User Guide, especially the quickstart
and basics sections. The official reference can help you refresh your memory on specific
NumPy concepts. While you read this tutorial, you might want to check out the statistics
section and the official scipy.stats reference as well.
ii)Math
To carry out calculations with real numbers, the Python language contains many additional
functions collected in a library (module) called math.
To use these functions at the beginning of the program, you need to connect the math library,
which is done by the command
import math
Python provides various operators for performing basic calculations, such as * for
multiplication,% for a module, and / for the division. If you are developing a program in
Python to perform certain tasks, you need to work with trigonometric functions, as well as
complex numbers. Although you cannot use these functions directly, you can access them by
turning on the math module math, which gives access to hyperbolic, trigonometric and
logarithmic functions for real numbers. To use complex numbers, you can use the math
module cmath. When comparing math vs numpy, a math library is more lightweight and can be
used for extensive computation as well.
The Python Math Library is the foundation for the rest of the math libraries that are written on
top of its functionality and functions defined by the C standard. Please refer to the python
math examples for more information.
This part of the mathematical library is designed to work with numbers and their
representations. It allows you to effectively carry out the necessary transformations with
support for NaN (not a number) and infinity and is one of the most important sections of the
Python math library. Below is a short list of features for Python 3rd version. A more detailed
description can be found in the documentation for the math library.
math.ceil(x) – return the ceiling of x, the smallest integer greater than or equal to x
math.comb(n, k) – return the number of ways to choose k items from n items without
repetition and without order
math.copysign(x, y) – return float with the magnitude (absolute value) of x but the sign of
1. On platforms that support signed zeros, copysign (1.0, -0.0) returns -1.0
math.fabs(x) – return the absolute value of x
math.floor(x) – return the floor of x, the largest integer less than or equal to x
math.frexp(x) – return the mantissa and exponent of x as the pair (m, e). m is a float and e is
an integer such that x == m * 2**e exactly
math.isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0) – return True if the values a and b are close
to each other and False otherwise
math.isfinite(x) – return True if x is neither infinity nor a NaN, and False otherwise (note that
0.0 is considered finite)
math.isqrt(n) – return the integer square root of the nonnegative integer n. This is the floor of
the exact square root of n, or equivalently the greatest integer a such that a² ≤ n
math.modf(x) – return the fractional and integer parts of x. Both results carry the sign of x
and are floats
math.perm(n, k=None) – return the number of ways to choose k items from n items without
repetition and with order
math.prod(iterable, *, start=1) – calculate the product of all the elements in the input
iterable. The default start value for the product is 1
The power and logarithmic functions section are responsible for exponential calculations,
which is important in many areas of mathematics, engineering, and statistics. These
functions can work with both natural logarithmic and exponential functions, logarithms
modulo two, and arbitrary bases.
math.exp(x) – return e raised to the power x, where e = 2.718281… is the base of natural
logarithms
math.expm1(x) – return e raised to the power x, minus 1. Here e is the base of natural
logarithms. math.log(x[, base]) – With one argument, return the natural logarithm of x (to base
e). With two arguments, return the logarithm of x to the given base, calculated as
log(x)/log(base)
math.log1p(x) – return the natural logarithm of 1+x (base e). The result is calculated in a way
that is accurate for x near zero
math.log2(x) – return the base-2 logarithm of x. This is usually more accurate than log(x, 2)
math.log10(x) – return the base-10 logarithm of x. This is usually more accurate than log(x,
10)
Trigonometric functions, direct and inverse, are widely represented in the Python
Mathematical Library. They work with radian values, which is import ant. It is also possible to
carry out calculations with Euclidean functions.
math.atan2(y, x) – return atan(y / x), in radians. The result is between -pi and pi
math.dist(p, q) – return the Euclidean distance between two points p and q, each given as a
sequence (or iterable) of coordinates. The two points must have the same dimension
Angular conversion
Converting degrees to radians and vice versa is a fairly common function and therefore the
developers have taken these actions to the Python library. This allows you to write compact
and understandable code.
Hyperbolic functions
Hyperbolic functions are analogs of trigonometric functions that are based on hyperbolas
instead of circles.
Special functions
The special functions section is responsible for error handling and gamma functions. This is a
necessary function and it was decided to implement it in the standard Python mathematical
library.
math.lgamma(x) – Return the natural logarithm of the absolute value of the Gamma function
at x
Constants
The constant section provides ready-made values for basic constants and writes them with the
necessary accuracy for a given hardware platform, which is important for Python’s portability
as a cross-platform language. Also, the very important values infinity and “not a number” are
defined in this section of the Python library.
math.inf – a floating-point positive infinity. (For negative infinity, use -math.inf.) Equivalent
to the output of float(‘inf’)
iii)Scipy
SciPy is a library for the open-source Python programming language, designed to perform
scientific and engineering calculations.
weave – C / C ++ integration
The SciPy ecosystem includes general and specialized tools for data management and
computation, productive experimentation, and high-performance computing. Below, we
overview some key packages, though there are many more relevant packages.
Main components of ScyPy
scikit-learn is a collection of algorithms and tools for machine learning h5py and
PyTables can both access data stored in the HDF5 format
IPython, a rich interactive interface, letting you quickly process data and test ideas
The Jupyter notebook provides IPython functionality and more in your web browser, allowing
you to document your computation in an easily reproducible form
Cython extends Python syntax so that you can conveniently build C extensions, either to speed
up critical code or to integrate with C/C++ libraries
Dask, Joblib or IPyParallel for distributed processing with a focus on numeric data
Quality assurance:
nose, a framework for testing Python code, being phased out in preference for pytest
numpydoc, a standard, and library for documenting Scientific Python libraries SciPy provides
a very wide and sought-after feature set:
Constants (scipy.constants)
Interpolation (scipy.interpolate)
In this tutorial, Basic functions — SciPy v1.4.1 Reference Guide, you can find how to
calculate polynomials, their derivatives, and integrals. Yes, by one line of code SciPy
calculates derivative and integral in symbolic form. Imagine how many lines of code you
would need to do this without SciPy. This is why this library is valuable in Python:
>>> p = poly1d([3,4,5])
>>> print(p) 2
3x+4x+5
>>> print(p*p)
4 3 2
9 x + 24 x + 46 x + 40 x + 25
>>> print(p.integ(k=6))
3 2
1x+2x+5x+6
>>> print(p.deriv()) 6 x + 4
Applications:
In early 2005, programmer and data scientist Travis Oliphant wanted to unite the community
around one project and created the NumPy library to replace the Numeric and NumArra y
libraries. NumPy was created based on the Numeric code. The Numeric code was rewritten to
be easier to maintain, and new features could be added to the library. NumArray features have
been added to NumPy. NumPy was originally part of the SciPy library. To allow other projects
to use the NumPy library, its code was placed in a separate package.
The source code for NumPy is publicly available. NumPy is licensed under the BSD license.
As described in the NumPy documentation, “NumPy gives you an enormous range of fast and
efficient ways of creating arrays and manipulating numerical data inside them. While a Python
list can contain different data types within a single list, all of the elements in a NumPy array
should be homogenous. The mathematical operations that are meant to be performed on arrays
would be extremely inefficient if the arrays weren’t homogenous.” Numpy provides the
following features to the user:
Array objects
Constants
Universal functions (ufunc)
Routine
Packaging (numpy.distutils)
NumPy Distutils – Users Guide
NumPy C-API
NumPy internals
NumPy and SWIG
NumPy basics:
Data types
Array creation
I/O with NumPy
Indexing
Broadcasting
Byte-swapping
Structured arrays
Writing custom array containers
Subclassing ndarray
One of the main objects of NumPy is ndarray. It allows you to create multidimensional data
arrays of the same type and perform operations on them with great speed. Unlike sequences in
Python, arrays in NumPy have a fixed size, the elements of the array must be of the same type.
You can apply various mathematical operations to arrays, which are performed more
efficiently than for Python sequences. The next example shows how to work with linear
algebra with NumPy. It is really simple and easy-to-understand for Python users.
[ 3. 4.]]
[ 2., 4.]])
[ 1.5, -0.5]])
>>> u
[ 0., 1.]])
>>> j @ j
[ 0., -1.]])
[ 4.]])
>>> np.linalg.eig(j)
(array([ 0.+1.j, 0.-1.j]), array([[ 0.70710678+0.j,0.70710678-0.j], [ 0.00000000-0.70710678j,
0.00000000+0.70710678j]]))
Applications:
. Pandas
Pandas is primarily designed to perform data manipulation and analysis. It is known
that dataset preparation is essential before the training phase. The Pandas library
comes in handy in such a scenario as it provides a variety of data structures,
functions, and components that help in data extraction and preparation tasks. Data
preparation refers to data organization, wherein various methods are employed to a
group, combine, reshape, and filter out different datasets.
Key advantages of the Pandas library include:
Valid data frames: While the Pandas library has more utility for data
analysis, it is also used to handle machine learning operations through
data frames. Data frames refer to two-dimensional data similar to what is
used in SQL tables or spreadsheets. It enables programmers to get an
overview of the data, thereby improving the software product’s quality.
Easy dataset handling: The Pandas library is typically helpful for
professionals intending to handle (structure, sort, reshape, filter) large
datasets with ease.
. Matplotlib
Similar to Pandas library, Matplolib is not a machine learning heavy library. It is
typically used for data visualization where developers can derive insights from the
visualized data patterns. Some of its modules, such as Pyplot, provide functionalities
to control line styles, manage fonts, and others while plotting 2D graphs and plots.
The features offered by Matplotlib are in line with those of MATLAB, and all the
Python packages are freely available in this library.
Key reasons for the popularity of Matplotlib include:
Wide range of plotting tools: Using the Matplotlib library, plotting
various 2D charts, 3D diagrams, histograms, error charts, bar charts,
and graphs is possible. It allows experts to perform detailed data
analysis.
Builds reliable ML models: Several plots allow thorough data analysis,
which further ensures that the developers have enough relevant data to
build reliable ML models.
4. Simple Linear Regression
Code:
# Sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([1.5, 3.7, 2.6, 4.9, 6.3])
# Model
model = LinearRegression()
model.fit(X, y)
# Predictions
predictions = model.predict(X)
print(f"Coefficients: {model.coef_}")
print(f"Intercept: {model.intercept_}")
print(f"Predictions: {predictions}")
Output:
Coefficients: [1.16]
Intercept: 0.8999999999999995
Predictions: [2.06 3.22 4.38 5.54 6.7 ]
5. Multiple Linear Regression for House Price Prediction
Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Sample data
data = {'Size': [1400, 1600, 1700, 1875],
'Bedrooms': [3, 3, 3, 4],
'Age': [10, 15, 20, 10],
'Price': [245000, 312000, 279000, 308000]}
df = pd.DataFrame(data)
y_pred = model.predict(X_test)
print(f"Coefficients: {model.coef_}")
print(f"Intercept: {model.intercept_}")
print(f"Mean Squared Error: {mean_squared_error(y_test, y_pred)}")
Output:
Code:
# Load data
data = load_iris()
X, y = data.data, data.target
# Hyperparameter tuning
param_grid = {'max_depth': [3, 5, None], 'min_samples_split': [2, 5, 10]}
grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5)
grid_search.fit(X, y)
Output:
Code:
python
CopyEdit
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
data = load_iris()
X, y = data.data, data.target
Output:
Accuracy: 1.0
8.Implementation of Logistic Regression using sklearn
Code:
data = load_iris()
X, y = data.data, data.target
Output:
Accuracy: 1.0
9.Implementation of K-Means Clustering
Code:
# Sample data
X = np.array([[1, 2], [1, 4], [1, 0],
[4, 2], [4, 4], [4, 0]])
# Plotting
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300, c='red',
label='Centroids')
plt.legend()
plt.show()