0% found this document useful (0 votes)
840 views87 pages

Ai ML Lab Manual

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
840 views87 pages

Ai ML Lab Manual

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

Artificial and Machine Learning Laboratory Lab Manual

JSS MAHAVIDYAPEETHA
JSS ACADEMY OF TECHNICAL EDUCATION
Department of Information Science and Engineering
JSSATE Campus, Dr. Vishnuvardhan Road, Bangaluru – 560 060
Phone: 080-28611902, 28612797 Fax: 080-28612706 www.jssateb.ac.in

VII SEMESTER
[As per Choice Based Credit System (CBCS) scheme]
(Effective from the academic year 2018 -2019)

Artificial Intelligence and Machine Learning Laboratory


[18CSL76]

Dept. of ISE, JSSATEB 1


Artificial and Machine Learning Laboratory Lab Manual

JSS MAHAVIDYAPEETHA
JSS ACADEMY OF TECHNICAL EDUCATION
Department of Information Science and Engineering
JSSATE Campus, Dr. Vishnuvardhan Road, Bangalore – 560 060
Phone: 080-28611902, 28612797 Fax: 080-28612706 www.jssateb.ac.in

VII SEMESTER
Artificial Intelligence and Machine Learning Laboratory
[18CSL76]

Compiled By:

Dr Malini M Patil Mrs Nagashree S


Associate Professor Assistant Professor,
Dept. of Information Science and Dept of Information Science &
Engineering Engineering, JSSATEB
JSSATEB

Signature of the Faculty Signature of the HOD

Dept. of ISE, JSSATEB 2


Artificial and Machine Learning Laboratory Lab Manual

JSS MAHAVIDYAPEETHA
JSS ACADEMY OF TECHNICAL EDUCATION
Department of Information Science and Engineering
JSSATE Campus, Dr.Vishnuvardhan Road, Bangalore – 560 060
Phone: 080-28611902, 28612797 Fax: 080-28612706 www.jssateb.ac.in

VISION
To emerge as a centre for achieving academic excellence, by producing competent professionals
to meet the global challenges in the field of Information science and Technology

MISSION
M1: To prepare the students as competent professionals to meet the advancements in the
industry and academia by imparting quality technical education.

M2: To enrich the technical ability of students to face the world with confidence, commitment
and teamwork.

M3: To inculcate and practice strong techno-ethical values to serve the society.

Program Educational Objectives (PEOs):


PEO1: To demonstrate analytical and technical problem-solving abilities.

PEO2: To be conversant in the developments of Information Science and Engineering, leading


towards the employability and higher studies.

PEO3: To engage in research and development leading to new innovations and products.

Program Specific Outcomes (PSOs):

PSO1: Apply the mathematical concepts for solving engineering problems by using appropriate
Programming constructs.

PSO2: Adaptability to software development methodologies.

PSO3: Demonstrate the knowledge towards the domain specific initiatives of Information
Science and Engineering

Dept. of ISE, JSSATEB 3


Artificial and Machine Learning Laboratory Lab Manual

Program Outcomes (POs):


Information Science and Engineering Graduates will be able to:
PO1 Apply the knowledge of mathematics, science, engineering fundamentals, and an
Engineering specialization to the solution of complex engineering problems.
PO2 Identify, formulate, review research literature, and analyze complex engineering
problems reaching substantiated conclusions using first principles of mathematics,
natural sciences, and engineering sciences.
PO3 Design solutions for complex engineering problems and design system components or
processes that meet the specified needs with appropriate consideration for the public
health and safety, and the cultural, societal, and environmental considerations.
PO4 Use research-based knowledge and research methods including design of experiments,
analysis and interpretation of data, and synthesis of the information to provide valid
conclusions.
PO5 Create, select, and apply appropriate techniques, resources, and modern engineering
and IT tools including prediction and modeling to complex engineering activities with
an understanding of the limitations.
PO6 Apply reasoning informed by the contextual knowledge to assess societal, health, safety,
legal and cultural issues and the consequent responsibilities relevant to the
professional engineering practice.
PO7 Understand the impact of the professional engineering solutions in societal and
environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.
PO8 Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practice.
PO9 Function effectively as an individual, and as a member or leader in diverse teams, and in
multidisciplinary settings.
PO10 Communicate effectively on complex engineering activities with the engineering
community and with society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and
receive clear instructions.
PO11 Demonstrate knowledge and understanding of the engineering and management
principles and apply these to one’s own work, as a member and leader in a team, to
manage projects and in multidisciplinary environments.
PO12 Recognize the need for, and have the preparation and ability to engage in independent
and life-long learning in the broadest context of technological change.

Dept. of ISE, JSSATEB 4


Artificial and Machine Learning Laboratory Lab Manual

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING LABORATORY


[As per Choice Based Credit System (CBCS) scheme]
(Effective from the academic year 2018 -2019)

SEMESTER – VII
Subject Code 18CSL76 IA Marks 40
Number of Lecture Hours/Week 01I + 02P Exam Marks 60
Total Number of Lecture Hours 40 Exam Hours 03
CREDITS – 02

Course objectives: This course will enable students to


1. Make use of Data sets in implementing the machine learning algorithms
2. Implement the machine learning concepts and algorithms in any suitable language of choice.

Description (If any):


1. The programs can be implemented in either JAVA or Python.
2. For Problems 1 to 6 and 10, programs are to be developed without using the built-in classes
or APIs of Java/Python.
3. Data sets can be taken from standard repositories
(https://archive.ics.uci.edu/ml/datasets.html) or constructed by the students.

Laboratory Experiments:
1. Implement A* algorithm.
2. Impement AO* algorithm.
3. For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of
all hypotheses consistent with the training examples.
4. Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to
classify a new sample.
5. Build an Artificial Neural Network by implementing the Back propagation algorithm
and test the same using appropriate data sets.
6. Write a program to implement the naïve Bayesian classifier for a sample training data
set stored as a .CSV file. Compute the accuracy of the classifier, considering few test data
sets.
7. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set
for clustering using k-Means algorithm. Compare the results of these two algorithms
and comment on the quality of clustering. You can add Java/Python ML library
classes/API in the program.
8. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data
set. Print both correct and wrong predictions. Java/Python ML library classes can be
used for this problem.
9. Implement the non-parametric Locally Weighted Regression algorithm in order to fit
data points. Select appropriate data set for your experiment and draw graphs.

Dept. of ISE, JSSATEB 5


Artificial and Machine Learning Laboratory Lab Manual

Laborotary Outcomes:
1. Implement and demonstrate AI and ML algorithms
2. Evaluate different algorithms

Conduction of Practical Examination:


1. Experiment Distribution
• For laboratories having only one part: Students are allowed to pick one experiment
from the lot with equal opportunity.
• For laboratories having only part A and Part B : Students are allowed to pick one
experiment from part A and one experiment from PART B with equal opportunity.
2. Change of experiment is allowed only once and marks allotted to the procedure part to
be made zero for the changed program only.
3. Marks distribution: Procedure + Conduction + Viva: 15 + 70 +15 (100)

Note: In the examination each student picks one question from a lot of all the 10
questions.

Dept. of ISE, JSSATEB 6


Artificial and Machine Learning Laboratory Lab Manual

COURSE OUTCOMES
SUBJECT: Artificial Intelligence and Machine Learning Laboratory
SUB CODE: 18CSL76 SEM:7 COURSE CODE: C413

At the end of the course the student will be able to:

CO.NO. COURSE OUTCOMES BLL


C413.1 Make use of BFS Control strategy to implement the A* and AO* algorithms L3
C413.2 Build supervised classification models to predict the outcome L3

C413.3 Analyze & compare the unsupervised machine learning algorithms L4

CO-PO Mapping (High: 3, Medium:2, Low:1)

PO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
CO
C413.1 2 - 1 - 3 - - 1 1 1 - 1
C413.2 2 1 2 1 3 1 - 2 1 1 - 2
C413.3 2 1 2 1 3 1 - 1 1 1 - 2
AVG 2 1 1.67 1 3 1 - 1.3 1 1 - 1.67

CO-PSO Mapping (High: 3, Medium: 2, Low: 1)

PSO PSO1 PSO2 PSO3


CO

C413.1 2 - -
C413.2 2 1 1
C413.3 2 1 1
AVG 2 1 1

Dept. of ISE, JSSATEB 7


Artificial and Machine Learning Laboratory Lab Manual

Justification and rationale of mapping: CO AND PO


SUBJECT: Machine Learning Laboratory SUB CODE: 18CSL76
Cos : COURSE OUTCOMES,POs: PROGRAM OUTCOMES , CRLs : CORRELATION LEVELS
COs POs CRLs Justification
Apply the knowledge of mathematics & engineering to solve problems using BFS
PO1 2 algorithms.
PO3 1 For performing BFS tasks, analysis of problem & dataset is essential.
For programs of BFS, usage of modern tools from python language is chosen
C413.1 PO5 3 such as Anaconda Navigator & Jupyter.
Apply rules to avoid plagiarism while writing report, record or programs.
PO8 1 Acknowledge the sources from which help was taken to prepare any content
related to BFS methods.
Every individual executes the lab cycle programs provided by VTU and
PO9 1 formulate the solution for real world problems in the form of projects in groups
or teams.
Make use of professional and personal development skills for effective
PO10 1 communication during lab sessions, test and exam.
Machine learning is a part of Artificial intelligence which covers data analysis &
PO12 1 feature engineering for any real-world application.
Apply the knowledge of mathematics & engineering to solve problems using
PO1 2 Supervised learning algorithms.
Making use of 1st principle of mathematics & higher order statistics to solve
PO2 1 problems of classification.
For performing supervised learning tasks, analysis of problem & dataset is
PO3 2 essential.
For any given supervised problem conduct investigation, synthesize data and
PO4 1 provide valid conclusion in the form of prediction accuracy, precision, recall etc.
For programs of supervised learning, usage of modern tools from python
PO5 3 language is chosen such as Anaconda Navigator, Jupyter & graphical
representation of results.
C413.2 By making use of supervised machine learning algorithms predictions can be
PO6 1 made for test samples to identify type of disease.
Apply rules to avoid plagiarism while writing report, record or programs.
PO8 2 Acknowledge the sources from which help was taken to prepare any content
related to concept learning.
Every individual executes the lab cycle programs provided by VTU and
PO9 1 formulate the solution for real world problems in the form of projects in groups
or teams.
Make use of professional and personal development skills for effective
PO10 1 communication during lab sessions, test and exam.
Machine learning is a part of Artificial intelligence which covers data analysis &
PO12 2 feature engineering for any real world supervised problems.
Apply the knowledge of mathematics & engineering to solve problems using
PO1 2 Unsupervised learning algorithms.
Making use of 1st principle of mathematics & higher order statistics to solve
PO2 1 problems of clustering & regression.
C413.3 For performing unsupervised learning tasks, analysis of problem & dataset is
PO3 2 essential.
For any given Unsupervised problem conduct investigation, synthesize data and
PO4 1 provide valid conclusion in the form of prediction accuracy, precision, recall etc.
PO5 3 For programs of Unsupervised learning, usage of modern tools from python

Dept. of ISE, JSSATEB 8


Artificial and Machine Learning Laboratory Lab Manual

language is chosen such as Anaconda Navigator, Jupyter & graphical


representation of results.
By making use of unsupervised machine learning algorithms grouping can be
PO6 1 made without any class labels for medical or any data.
Apply rules to avoid plagiarism while writing report, record or programs.
PO8 1 Acknowledge the sources from which help was taken to prepare any content
related to concept learning.
Every individual executes the lab cycle programs provided by VTU and
PO9 1 formulate the solution for real world problems in the form of projects in groups
or teams.
Make use of professional and personal development skills for effective
PO10 1 communication during lab sessions, test and exam.
Machine learning is a part of Artificial intelligence which covers data analysis &
PO12 2 feature engineering for any real world unsupervised problem.

Justification and rationale of mapping: CO AND PSO

Cos: COURSE OUTCOMES, PSOs: PROGRAM SPECIFIC OUTCOMES CRLs : CORRELATION LEVELS

Cos POs CRLs Justification


C413.1 PSO1 2 Apply the algorithmic and programming constructs to write programs on concept
learning.
C413.2 PSO1 2 Apply the algorithmic and programming constructs to write programs on
Supervised learning.
PSO2 1 Supervised learning has been successfully applied in many areas of software
engineering ranging from behaviour extraction, to testing, to bug fixing.
PSO3 1 Supervised machine learning algorithms can be used for classification and
regression tasks in information processing.
C413.3 PSO1 2 Apply the algorithmic and programming constructs to write programs on
Unsupervised learning.
PSO2 1 The field of software engineering provide basis where many software
development tasks could be formulated as learning problems and approached in
terms of unsupervised learning algorithms.
PSO3 1 Unsupervised Machine Learning tries to find hidden structure/ pattern in
unlabeled data, which can be used to build various applications.

Dept. of ISE, JSSATEB 9


Artificial and Machine Learning Laboratory Lab Manual

CONTENTS

Sl.No. Contents Page No.


1. Software Requirement Specifications

2. Instructions for setting Python in windows Environment for


conducting experiments

3. Overview of AI and ML

4. About Dataset

5. Lab Cycle Programs

Program-1: A* Algorithm

Program-2: AO* Algorithm

Program-3: Candidate Elimination Algorithm

Program-4: ID3 algorithm

Program-5: Back propagation algorithm

Program-6: Naïve Bayesian classifier

Program-7: EM & k-Means algorithm

Program-8: k-Nearest Neighbour algorithm

Program-9: Locally Weighted Regression algorithm

6. Viva Questions

7. Artifical Intellgence and Machine Learning Glossary

Dept. of ISE, JSSATEB 10


Artificial and Machine Learning Laboratory Lab Manual

Instructions to write the record

1. Write the complete question with question number


2. Write Algorithm / method / formulas on the blank sheet
3. Write the table with atleast 6 rows along with headings for dataset
4. Write the program with proper indentation and spacing
5. Use comments wherever essential to make the code understandable for reader
6. Write all intermediate and final results or output at appropriate place
7. If any graphical representation is obtained, take the printout of same & attach
8. All confusion matrix & classification reports to be written on blank sheet.
9. Mention experiment number & date for each program. Submit the record after filling the
index sheet & front page.

Dept. of ISE, JSSATEB 11


Artificial and Machine Learning Laboratory Lab Manual

1. Software Requirement specification

1.1 Recommended System Requirements

➢ Processors:
o Intel® Core™ i5 processor 4300M at 2.60 GHz or 2.59 GHz (1 socket, 2 cores, 2 threads
per core), 8 GB of DRAM
o Intel® Xeon® processor E5-2698 v3 at 2.30 GHz (2 sockets, 16 cores each, 1 thread per
core), 64 GB of DRAM
o Intel® Xeon Phi™ processor 7210 at 1.30 GHz (1 socket, 64 cores, 4 threads per core), 32
GB of DRAM, 16 GB of MCDRAM (flat mode enabled)
➢ Disk space: 2 to 3 GB
➢ Operating systems: Windows® 10, macOS*, and Linux*

1.2 Minimum System Requirements

➢ Processors: Intel Atom® processor or Intel® Core™ i3 processor


➢ Disk space: 1 GB
➢ Operating systems: Windows* 7 or later, macOS, and Linux
➢ Python* versions: 2.7.X, 3.6.X
➢ Included development tools: conda*, conda-env, Jupyter Notebook* (IPython)
➢ Compatible tools: Microsoft Visual Studio*, PyCharm*
➢ Included Python packages: NumPy, SciPy, scikit-learn*, pandas, Matplotlib, Numba*, Intel®
Threading Building Blocks, pyDAAL, Jupyter, mpi4py, PIP*, and others.

1.3 Software

➢ PIP and NumPy: Installed with PIP, Ubuntu*, Python 3.6.2, NumPy 1.13.1, scikit-learn 0.18.2
➢ Windows: Python 3.6.2, PIP and NumPy 1.13.1, scikit-learn 0.18.2
➢ Intel® Distribution for Python* 2018

1.4 Some Open-Source tools / Software’s

Weka, CNTK, KNIME, RapidMiner, Deeplearning4j, R, Mahout, H2O, GNU, Octave, MOA (Massive Online
Analysis), Tanagra, Orange, Python, Shogun, TensorFlow, Torch etc.

1.5 Some Proprietary softwares


MATLAB, Neural Designer, Mathematica, Google Prediction API, Oracle Data Mining, NeuroSolutions,
Amazon Machine Learning, Microsoft Azure Machine Learning, STATISTICA Data Miner, Splunk, SAS,
Enterprise Miner etc.

Dept. of ISE, JSSATEB 12


Artificial and Machine Learning Laboratory Lab Manual

2. Instructions for setting Python in windows Environment for conducting experiments

Step 1: Download Python from the Internet

Step 2: Double click on downloaded Python .exe file

Dept. of ISE, JSSATEB 13


Artificial and Machine Learning Laboratory Lab Manual

Step 3: Double click on Install Now

Dept. of ISE, JSSATEB 14


Artificial and Machine Learning Laboratory Lab Manual

Steps for installing anaconda in Windows platform

Step 1: Download Anaconda from the internet

Dept. of ISE, JSSATEB 15


Artificial and Machine Learning Laboratory Lab Manual

Step 2: Click on Download

Step 3: Double click on downloaded anaconda .exe file

Dept. of ISE, JSSATEB 16


Artificial and Machine Learning Laboratory Lab Manual

Step 4: Click on Next

Step 5: Click on I Agree

Dept. of ISE, JSSATEB 17


Artificial and Machine Learning Laboratory Lab Manual

Step 6: Click on Next

Step 7: Install anaconds3 by setting a path

Dept. of ISE, JSSATEB 18


Artificial and Machine Learning Laboratory Lab Manual

Step 8: Continue the same

Step 9: Continue the same click finish .

Dept. of ISE, JSSATEB 19


Artificial and Machine Learning Laboratory Lab Manual

3. Overview of AI and ML
Machine learning is a branch of artificial intelligence that allows computer systems to learn directly from
examples, data, and experience. Through enabling computers to perform specific tasks intelligently,
machine learning systems can carry out complex processes by learning from data, rather than following
pre-programmed rules.

Recent years have seen exciting advances in machine learning, which have raised its capabilities across a
suite of applications. Increasing data availability has allowed machine learning systems to be trained on
a large pool of examples, while increasing computer processing power has supported the analytical
capabilities of these systems. Within the field itself there have also been algorithmic advances, which
have given machine learning greater power. As a result of these advances, systems which only a few
years ago performed at noticeably below-human levels can now outperform humans at some specific
tasks.

Many people now interact with systems based on machine learning every day, for example in image
recognition systems, such as those used on social media; voice recognition systems, used by virtual
personal assistants; and recommender systems, such as those used by online retailers. As the field
develops further, machine learning shows promise of supporting potentially transformative advances in
a range of areas, and the social and economic opportunities which follow are significant. In healthcare,
machine learning is creating systems that can help doctors give more accurate or effective diagnoses for
certain conditions. In transport, it is supporting the development of autonomous vehicles, and helping to
make existing transport networks more efficient. For public services it has the potential to target
support more effectively to those in need, or to tailor services to users. And in science, machine learning
is helping to make sense of the vast amount of data available to researchers today, offering new insights
into biology, physics, medicine, the social sciences, and more.

The word ‘Machine’ in Machine Learning means computer, as you would expect. So how does a machine
learn?
Given data, we can do all kind of magic with statistics: so can computer algorithms. These algorithms can
solve problems including prediction, classification and clustering. A machine learning algorithm will
learn from new data.

Definition “A computer program is said to 'learn' from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves
with experience E. “

The following components are part of any learning problem:


Task
The behaviour or task that is being improved

Data
The experiences that are used to improve performance in the task

Measure of improvement
How the improvement is measured - for example, new skills that were not present initially, increasing
accuracy in prediction, or improved speed

Dept. of ISE, JSSATEB 20


Artificial and Machine Learning Laboratory Lab Manual

Figure 1. Timeline of machine learning

Data : Facts and statistics collected together for reference or analysis. It is a set of values of
qualitative or quantitative variables. Numbers, characters or images that designate an attribute of a
phenomenon

Facts :
• Facts are statements that can be proved true or false.
• Facts tell what actually happened.
• Facts tell what is happening now.
• Facts state something that can be easily observed or verified.

Opinions:
• Opinions are statements that cannot be proved true or false because they express a person's thoughts,
beliefs, feelings, or estimates.
• Opinions express worth or value.
• Opinions express what the author or speaker think should or should not be thought or done.
• Opinions are based on what seems true or probable.

Example 1:
Fact: We use AI in many ways today from computer games to digital personal assistants to self driving
cars.
Opinion: Greater use of AI will be even more beneficial to humanity.
In this example, the opinion speculates about a future outcome that cannot yet be known.

Example 2:
Fact: Some jobs have been lost through automation in the past.
Opinion: The use of intelligent machines will replace human jobs and drive down wages for human
workers.
In this example, the author is generalizing without substantiating evidence

Dept. of ISE, JSSATEB 21


Artificial and Machine Learning Laboratory Lab Manual

4. About Dataset
Before going deeply into machine learning, we first describe the notation of dataset, which will
be used through the whole semester as well as for the lab sessions. There are two general
dataset types. The high-quality, real-world, and well understood machine learning datasets that
you can use to practice applied machine learning. This database is called the UCI machine
learning repository and you can use it to structure a self-study program and build a solid
foundation in machine learning.

Why Do We Need Practice Datasets?


If you are interested in practicing applied machine learning, you need datasets on which to
practice.
This problem can stop you dead.

➢ Which dataset should you use?


➢ Should you collect your own or use one off the shelf?
➢ Which one and why?

What is the UCI Machine Learning Repository?


The UCI Machine Learning Repository is a database of machine learning problems that you can
access for free. It is hosted and maintained by the Center for Machine Learning and Intelligent
Systems at the University of California, Irvine. It was originally created by David Aha as a
graduate student at UC Irvine. For more than 25 years it has been the go-to place for machine
learning researchers and machine learning practitioners that need a dataset.

UCI Machine Learning Repository

Each dataset gets its own webpage that lists all the details known about it including any
relevant publications that investigate it. The datasets themselves can be downloaded as ASCII
files, often the useful CSV format..

Dept. of ISE, JSSATEB 22


Artificial and Machine Learning Laboratory Lab Manual

Benefits of the Repository

Some beneficial features of the library include:


✓ Almost all datasets are drawn from the domain (as opposed to being synthetic), meaning
that they have real-world qualities.
✓ Datasets cover a wide range of subject matter from biology to particle physics.
✓ The details of datasets are summarized by aspects like attribute types, number of instances,
number of attributes and year published that can be sorted and searched.
✓ Datasets are well studied which means that they are well known in terms of interesting
properties and expected “good” results. This can provide a useful baseline for comparison.
✓ Most datasets are small (hundreds to thousands of instances) meaning that you can readily
load them in a text editor or MS Excel and review them, you can also easily model them
quickly on your workstation.

For each problem, the student is advised to work on it systematically from end-to-end, for
example, go through the following steps in the applied machine learning process:
1. Define the problem
2. Prepare data
3. Evaluate algorithms
4. Improve results
5. Write-up results
Select a systematic and repeatable process that you can use to deliver results consistently.

Dept. of ISE, JSSATEB 23


Artificial and Machine Learning Laboratory Lab Manual

5. Lab Cycle Programs


1. Implement A* Algorithm
A * algorithm is a searching algorithm that searches for the shortest path between the initial
and the final state. It is used in various applications, such as maps. In maps the A* algorithm is
used to calculate the shortest distance between the source (initial state) and the destination
(final state). A* algorithm is very useful in graph traversals as well. In the following slides, you
will see how the algorithm moves to reach its goal state.
Consider the following graph-

Figure 2: Input graph for A* algorithm


The numbers written on edges represent the distance between the nodes.
The numbers written on nodes represent the heuristic value.
Find the most cost-effective path to reach from start state A to final state J using A* Algorithm.
Solution-
Step-01:
• We start with node A.
• Node B and Node F can be reached from node A.
A* Algorithm calculates f(B) and f(F).
• f(B) = 6 + 8 = 14
• f(F) = 3 + 6 = 9
Since f(F) < f(B), so it decides to go to node F.
Path- A → F
Step-02:

Dept. of ISE, JSSATEB 24


Artificial and Machine Learning Laboratory Lab Manual

Node G and Node H can be reached from node F.


A* Algorithm calculates f(G) and f(H).
• f(G) = (3+1) + 5 = 9
• f(H) = (3+7) + 3 = 13
Since f(G) < f(H), so it decides to go to node G.

Path- A → F → G
Step-03:
Node I can be reached from node G.
A* Algorithm calculates f(I).
f(I) = (3+1+3) + 1 = 8
It decides to go to node I.

Path- A → F → G → I
Step-04:
Node E, Node H and Node J can be reached from node I.
A* Algorithm calculates f(E), f(H) and f(J).
• f(E) = (3+1+3+5) + 3 = 15
• f(H) = (3+1+3+2) + 3 = 12
• f(J) = (3+1+3+3) + 0 = 10
Since f(J) is least, so it decides to go to node J.

Path- A → F → G → I → J
This is the required shortest path from node A to node J.

Dept. of ISE, JSSATEB 25


Artificial and Machine Learning Laboratory Lab Manual

Figure 3: Out Graph A* algorithm (or path A → F →G →I →J)

Important Note-
It is important to note that-
• A* Algorithm is one of the best paths finding algorithms.
• But it does not produce the shortest path always.
• This is because it heavily depends on heuristics.

Code for A* Algorithm

#!/usr/bin/env python
# coding: utf-8

# In[1]:
def aStarAlgo(start_node, stop_node):
open_set = set(start_node)
closed_set = set ()
g = {} #store distance from starting node
parents = {}# parents contains an adjacency map of all nodes

#ditance of starting node from itself is zero


g[start_node] = 0
#start_node is root node i.e it has no parent nodes
#so start_node is set to its own parent node
parents[start_node] = start_node

while len(open_set) > 0:


n = None

Dept. of ISE, JSSATEB 26


Artificial and Machine Learning Laboratory Lab Manual

#node with lowest f() is found


for v in open_set:
if n == None or g[v] + heuristic(v) < g[n] + heuristic(n):
n=v
if n == stop_node or Graph_nodes[n] == None:
pass
else:
for (m, weight) in get_neighbors(n):
#nodes 'm' not in first and last set are added to first
#n is set its parent
if m not in open_set and m not in closed_set:
open_set.add(m)
parents[m] = n
g[m] = g[n] + weight

#for each node m,compare its distance from start i.e g(m) to the
#from start through n node
else:
if g[m] > g[n] + weight:
#update g(m)
g[m] = g[n] + weight
#change parent of m to n
parents[m] = n

#if m in closed set,remove and add to open


if m in closed_set:
closed_set.remove(m)
open_set.add(m)

if n == None:
print('Path does not exist!')
return None

# if the current node is the stop_node


# then we begin reconstructin the path from it to the start_node
if n == stop_node:
path = []

while parents[n] != n:
path.append(n)
n = parents[n]

path.append(start_node)
path.reverse()
print('Path found: {}'.format(path))
return path

Dept. of ISE, JSSATEB 27


Artificial and Machine Learning Laboratory Lab Manual

# remove n from the open_list, and add it to closed_list


# because all of his neighbors were inspected
open_set.remove(n)
closed_set.add(n)

print('Path does not exist!')


return None

#define fuction to return neighbor and its distance


#from the passed node
def get_neighbors(v):
if v in Graph_nodes:
return Graph_nodes[v]
else:
return None
#for simplicity we ll consider heuristic distances given
#and this function returns heuristic distance for all nodes
def heuristic(n):
H_dist = {
'A': 10,
'B': 8,
'C': 5,
'D': 7,
'E': 3,
'F': 6,
'G': 5,
'H': 3,
'I': 1,
'J': 0
}

return H_dist[n]
#Describe your graph here
Graph_nodes = {
'A': [('B', 6), ('F', 3)],
'B': [('C', 3), ('D', 2)],
'C': [('D', 1), ('E', 5)],
'D': [('C', 1), ('E', 8)],
'E': [('I', 5), ('J', 5)],
'F': [('G', 1),('H', 7)] ,
'G': [('I', 3)],
'H': [('I', 2)],
'I': [('E', 5), ('J', 3)],

}
aStarAlgo('A', 'J')

Dept. of ISE, JSSATEB 28


Artificial and Machine Learning Laboratory Lab Manual

2. Implement AO* algorithm

AO* Algorithm
AO* Algorithm basically based on problemdecompositon (Breakdown problem into small
pieces)When a problem can be divided into a set of sub problems, where each sub problem can
be solved separately and a combination of these will be a solution, AND-OR graphs or AND -
OR trees are used for representing the solution. The decomposition of the problem or problem
reduction generates AND arcs.

AND-OR Graph

Figure 3:AND OR Graph

The figure shows an AND-OR graph


1. To pass any exam, we have two options, either cheating or hard work.
2. In this graph we are given two choices, first do cheating or (The red line) work hard
and (The arc) pass.
3. When we have more than one choice and we have to pick one, we apply OR
condition to choose one.(That's what we did here).
4. Basically the ARC here denote AND condition.
5. Here we have replicated the arc between the work hard and the pass because by
doing the hard work possibility of passing an exam is more than cheating.

How AO* works


Let's try to understand it with the following diagram.
The algorithm always moves towards a lower cost value.
Basically, We will calculate the cost function here (F(n)= G (n) + H (n))
H: heuristic/ estimated value of the nodes. and G: actual cost or edge value (here unit value).
Here we have taken the edges value 1 , meaning we have to focus solely on the heuristic
value.
1. The Purple color values are edge values (here all are same that is one).
2. The Red color values are Heuristic values for nodes.
3. The Green color values are New Heuristic values for nodes.

Dept. of ISE, JSSATEB 29


Artificial and Machine Learning Laboratory Lab Manual

Procedure:
1. In the above diagram we have two ways from A to D or A to B-C (because of and
condition). calculate cost to select a path
2. F(A-D)= 1+10 = 11 and F(A-BC) = 1 + 1 + 6 +12 = 20
3. As we can see F(A-D) is less than F(A-BC) then the algorithm choose the path F(A-D).
4. Form D we have one choice that is F-E.
5. F(A-D-FE) = 1+1+ 4 +4 =10
6. Basically 10 is the cost of reaching FE from D. And Heuristic value of node D also
denote the cost of reaching FE from D. So, the new Heuristic value of D is 10.
7. And the Cost from A-D remain same that is 11.
Suppose we have searched this path and we have got the Goal State, then we will never explore
the other path. (this is what AO* says but here we are going to explore other path as well to see
what happen)

Figure 4: Exploring the paths using AO* algorithm

Let's Explore the other path:


1. In the above diagram we have two ways from A to D or A to B-C (because of and
condition). calculate cost to select a path
2. F(A-D)= 1+10 = 11 and F(A-BC) = 1 + 1 + 6 +12 = 20
3. As we know the cost is more of F(A-BC) but let's take a look
4. Now from B we have two path G and H , let's calculate the cost
5. F(B-G)= 5+1 =6 and F(B-H)= 7 + 1 = 8
6. So, cost from F(B-H) is more than F(B-G) we will take the path B-G.
7. The Heuristic value from G to I is 1 but let's calculate the cost form G to I.

Dept. of ISE, JSSATEB 30


Artificial and Machine Learning Laboratory Lab Manual

8. F(G-I) = 1 +1 = 2. which is less than Heuristic value 5. So, the new Heuristic value
form G to I is 2.
9. If it is a new value, then the cost from G to B must also have changed. Let's see the
new cost form (B to G)
10. F(B-G)= 1+2 =3 . Mean the New Heuristic value of B is 3.
11. But A is associated with both B and C .
12. As we can see from the diagram C only have one choice or one node to explore
that is J. The Heuristic value of C is 12.
13. Cost form C to J= F(C-J) = 1+1= 2 Which is less than Heuristic value
14. Now the New Heuristic value of C is 2.
15. And the New Cost from A- BC that is F(A-BC) = 1+1+2+3 = 7 which is less than
F(A-D)=11.
16. In this case Choosing path A-BC is more cost effective and good than that of A-D.
But this will only happen when the algorithm explores this path as well. But according to the
algorithm, algorithm will not accelerate this path (here we have just did it to see how the
other path can also be correct).
But it is not the case in all the cases that it will happen in some cases that the algorithm will get
optimal solution.

# Recursive implementation of AO* algorithm#

class Graph:
def __init__(self, graph, heuristicNodeList, startNode): #instantiate graph object with graph
topology, heuristic values, start node

self.graph = graph
self.H=heuristicNodeList
self.start=startNode
self.parent={}
self.status={}
self.solutionGraph={}

def applyAOStar(self): # starts a recursive AO* algorithm


self.aoStar(self.start, False)

def getNeighbors(self, v): # gets the Neighbors of a given node


return self.graph.get(v,'')

def getStatus(self,v): # return the status of a given node


return self.status.get(v,0)

def setStatus(self,v, val): # set the status of a given node


self.status[v]=val

Dept. of ISE, JSSATEB 31


Artificial and Machine Learning Laboratory Lab Manual

def getHeuristicNodeValue(self, n):


return self.H.get(n,0) # always return the heuristic value of a
given node

def setHeuristicNodeValue(self, n, value):


self.H[n]=value # set the revised heuristic value of a
given node

def printSolution(self):
print("FOR GRAPH SOLUTION, TRAVERSE THE GRAPH FROM THE START NODE:",self.start)
print("------------------------------------------------------------")
print(self.solutionGraph)
print("------------------------------------------------------------")

def computeMinimumCostChildNodes(self, v): # Computes the Minimum Cost of child


nodes of a given node v
minimumCost=0
costToChildNodeListDict={}
costToChildNodeListDict[minimumCost]=[]
flag=True
for nodeInfoTupleList in self.getNeighbors(v): # iterate over all the set of child node/s
cost=0
nodeList=[]
for c, weight in nodeInfoTupleList:
cost=cost+self.getHeuristicNodeValue(c)+weight
nodeList.append(c)

if flag==True: # initialize Minimum Cost with the cost


of first set of child node/s
minimumCost=cost
costToChildNodeListDict[minimumCost]=nodeList # set the Minimum Cost child node/s
flag=False
else: # checking the Minimum Cost nodes
with the current Minimum Cost
if minimumCost>cost:
minimumCost=cost
costToChildNodeListDict[minimumCost]=nodeList # set the Minimum Cost child node/s
return minimumCost, costToChildNodeListDict[minimumCost] # return Minimum Cost
and Minimum Cost child node/s

def aoStar(self, v, backTracking): # AO* algorithm for a start node and


backTracking status flag

print("HEURISTIC VALUES :", self.H)


print("SOLUTION GRAPH :", self.solutionGraph)

Dept. of ISE, JSSATEB 32


Artificial and Machine Learning Laboratory Lab Manual

print("PROCESSING NODE :", v)


print("-----------------------------------------------------------------------------------------")

if self.getStatus(v) >= 0: # if status node v >= 0, compute Minimum


Cost nodes of v
minimumCost, childNodeList = self.computeMinimumCostChildNodes(v)
self.setHeuristicNodeValue(v, minimumCost)
self.setStatus(v,len(childNodeList))

solved=True # check the Minimum Cost nodes of v are


solved
for childNode in childNodeList:
self.parent[childNode]=v
if self.getStatus(childNode)!=-1:
solved=solved & False

if solved==True: # if the Minimum Cost nodes of v are solved,


set the current node status as solved(-1)
self.setStatus(v,-1)
self.solutionGraph[v]=childNodeList # update the solution graph with the solved nodes which
may be a part of solution

if v!=self.start: # check the current node is the start node for


backtracking the current node value
self.aoStar(self.parent[v], True) # backtracking the current node value with
backtracking status set to true

if backTracking==False: # check the current call is not for backtracking


for childNode in childNodeList: # for each Minimum Cost child node
self.setStatus(childNode,0) # set the status of child node to 0(needs
exploration)
self.aoStar(childNode, False) # Minimum Cost child node is further
explored with backtracking status as false

h1 = {'A': 1, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 5, 'H': 7, 'I': 7, 'J': 1, 'T': 3}
graph1 = {
'A': [[('B', 1), ('C', 1)], [('D', 1)]],
'B': [[('G', 1)], [('H', 1)]],
'C': [[('J', 1)]],
'D': [[('E', 1), ('F', 1)]],
'G': [[('I', 1)]]
}
G1= Graph(graph1, h1, 'A')
G1.applyAOStar()
G1.printSolution()

Dept. of ISE, JSSATEB 33


Artificial and Machine Learning Laboratory Lab Manual

h2 = {'A': 1, 'B': 6, 'C': 12, 'D': 10, 'E': 4, 'F': 4, 'G': 5, 'H': 7} # Heuristic values of Nodes
graph2 = { # Graph of Nodes and Edges
'A': [[('B', 1), ('C', 1)], [('D', 1)]], # Neighbors of Node 'A', B, C &
D with repective weights
'B': [[('G', 1)], [('H', 1)]], # Neighbors are included in a
list of lists
'D': [[('E', 1), ('F', 1)]] # Each sublist indicate a "OR"
node or "AND" nodes
}

G2 = Graph(graph2, h2, 'A') # Instantiate Graph object with graph, heuristic values
and start Node
G2.applyAOStar() # Run the AO* algorithm

Output:

Dept. of ISE, JSSATEB 34


Artificial and Machine Learning Laboratory Lab Manual

Dept. of ISE, JSSATEB 35


Artificial and Machine Learning Laboratory Lab Manual

3. For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of all
hypotheses consistent with the training examples.

Dataset: weather.csv

Outlook Temperature Humidity Wind Play


sunny hot high weak no
sunny hot high strong no
overcast hot high weak yes
rainy mild high weak yes
rainy cool normal weak yes
rainy cool normal strong no
overcast cool normal strong yes
sunny mild high weak no
sunny cool normal weak yes
rainy mild normal weak yes
sunny mild normal strong yes
overcast mild high strong yes
overcast hot normal weak yes
rainy mild high strong no
Algorithm:
G: maximally general hypotheses in H
S: maximally specific hypotheses in H
For each training example d= <x, c(x)>

Case 1 : If d is a positive example


Remove from G any hypothesis that is inconsistent with d
For each hypothesis s in S that is not consistent with d
• Remove s from S.
• Add to S all minimal generalizations h of s such that
• h consistent with d
• Some member of G is more general than h
• Remove from S any hypothesis that is more general than another hypothesis in S
Case 2: If d is a negative example
Remove from S any hypothesis that is inconsistent with d
For each hypothesis g in G that is not consistent with d
• Remove g from G.
• Add to G all minimal specializations h of g such that
o h consistent with d
o Some member of S is more specific than h

Dept. of ISE, JSSATEB 36


Artificial and Machine Learning Laboratory Lab Manual

• Remove from G any hypothesis that is less general than another hypothesis in G
Program:
import csv
hypo=[]
data=[]
temp=[]
gen=['?','?','?','?','?','?']
sef=[]
with open('weather2.csv') as csv_file:
fd = csv.reader(csv_file)
print("\nThe given training examples are:")
for line in fd:
print(line)
temp.append(line)
if line[-1]== "Yes":
data.append(line)
print("\nThe positive examples are: Enjoy swimming");
for line in data:
print(line);
row= len(data);
col=len(data[0]);
print("\nThe final specific output......................");
for j in range(col-1):
hypo.append(data[0][j]);
for i in range(row):
for j in range(col-1):
if (hypo[j]!=data[i][j]):
hypo[j]='?'
print(hypo)
print("\nThe final Genralize output..................");
row=len(temp)
col=len(temp)
for i in range(row):
if temp[i][-1]=="No":
for j in range(col-1):
if temp[i][j] !=hypo[j]:
gen[j]=hypo[j]
print(gen)
gen[j]='?'

print("\nThe positive examples are: Enjoy swimming");


for line in data:
print(line);
row= len(data);
col=len(data[0]);

Dept. of ISE, JSSATEB 37


Artificial and Machine Learning Laboratory Lab Manual

print("\nThe final specific output......................");


for j in range(col-1):
hypo.append(data[0][j]);
for i in range(row):
for j in range(col-1):
if (hypo[j]!=data[i][j]):
hypo[j]='?'

print(hypo)
print("\nThe final Genralize output..................");
row=len(temp)
col=len(temp)
for i in range(row):
if temp[i][-1]=="No":
for j in range(col-1):
if temp[i][j] !=hypo[j]:
gen[j]=hypo[j]
print(gen)
gen[j]='?'

output:-
The given training examples are:
['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same', 'Yes']
['Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same', 'Yes']
['Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change', 'No']
['Sunny', 'Warm', 'High', 'Strong', 'Cool', 'Change', 'Yes']

The positive examples are: Enjoy swimming


['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same', 'Yes']
['Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same', 'Yes']
['Sunny', 'Warm', 'High', 'Strong', 'Cool', 'Change', 'Yes']

The final specific output......................


['Sunny', 'Warm', '?', 'Strong', '?', '?']

The final Genralize output..................


['Sunny', '?', '?', '?', '?', '?']
['?', 'Warm', '?', '?', '?', '?']
['?', '?', '?', '?', '?', '?']

Dept. of ISE, JSSATEB 38


Artificial and Machine Learning Laboratory Lab Manual

4. Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to
classify a new sample.

Decision Tree is one of the most powerful and popular algorithm. Decision-tree algorithm falls
under the category of supervised learning algorithms. It works for both continuous as well as
categorical output variables. Decision tree algorithms are a method for approximating
discrete-valued target functions, in which the learned function is represented by a decision
tree. These kinds of algorithms are famous in inductive learning and have been
successfully applied to a broad range of tasks. Decision trees classify instances by sorting
them down the tree from the root to some leaf node, which provides the classification of the
instance. Each node in the tree specifies a test of some attribute of the instance and each
branch descending from that node corresponds to one of the possible values for this
attribute. Types of decision trees are CART, C4.5, & ID3.

ID3 is a non-incremental algorithm, meaning it derives its classes from a fixed set of training instances.
An incremental algorithm revises the current concept definition, if necessary, with a new sample. The
classes created by ID3 are inductive, that is, given a small set of training instances, the specific classes
created by ID3 are expected to work for all future instances. The distribution of the unknowns must be
the same as the test cases. Induction classes cannot be proven to work in every case since they may
classify an infinite number of instances. Note that ID3 (or any inductive algorithm) may misclassify
data.To imagine, think of decision tree as if or else rules where each if-else condition leads to certain
answer at the end. You might have seen many online games which asks several question and lead to
something that you would have thought at the end. A classic famous example where decision tree is used
is known as Play Tennis.

Figure 5. Decision Tree

Each nonleaf node is connected to a test that splits its set of possible answers into subsets
corresponding to different test results.

Dept. of ISE, JSSATEB 39


Artificial and Machine Learning Laboratory Lab Manual

Each branch carries a particular test result's subset to another node.


Each node is connected to a set of possible answers.

Algorithm:
ID3 (Examples, Target_Attribute, Attributes)
Create a root node for the tree
If all examples are positive, Return the single-node tree Root, with label = +.
If all examples are negative, Return the single-node tree Root, with label = -.
If number of predicting attributes is empty, then Return the single node tree Root,
with label = most common value of the target attribute in the examples.
Otherwise Begin
A ← The Attribute that best classifies examples.
Decision Tree attribute for Root = A.
For each possible value, vi, of A,
Add a new tree branch below Root, corresponding to the test A = vi.
Let Examples(vi) be the subset of examples that have the value vi for A
If Examples(vi) is empty
Then below this new branch add a leaf node with label = most common target value in
the examples
Else below this new branch add the subtree ID3 (Examples(vi), Target_Attribute,
Attributes – {A})
End
Return Root

Dataset:
outlook temperature humidity wind play
sunny hot high weak no
sunny hot high strong no
overcast hot high weak yes
rainy mild high weak yes
rainy cool normal weak yes
rainy cool normal strong no
overcast cool normal strong yes
sunny mild high weak no
sunny cool normal weak yes
rainy mild normal weak yes
sunny mild normal strong yes
overcast mild high strong yes
overcast hot normal weak yes
rainy mild high strong no

Dept. of ISE, JSSATEB 40


Artificial and Machine Learning Laboratory Lab Manual

Entropy : It is a measure in the information theory, which characterizes the impurity of


an arbitrary collection of examples. If the target attribute takes on c different values, then
the entropy S relative to this c-wise classification is defined as

where pi is the proportion/probability of S belonging to class i. Logarithm is base 2 because


entropy is a measure of the expected encoding length measured in bits. For e.g. if training data
has 14 instances with 9 positive and 5 negative instances, the entropy is calculated as

Entropy(Decision) = – p(Yes) . log2p(Yes) – p(No) . log2p(No)

Entropy(Decision) = – (9/14) . log2(9/14) – (5/14) . log2(5/14) = 0.940

A key point to note here is that the more uniform is the probability distribution, the greater is
its entropy.

Information gain; It measures the expected reduction in entropy by partitioning the examples
according to this attribute. The information gain, Gain(S, A) of an attribute A, relative to the
collection of examples S, is defined as

Where Values (A) is the set of all possible values for attribute A, and Sv, is the subset of S for
which the attribute A has value v. We can use this measure to rank attributes and build the
decision tree where at each node is located the attribute with the highest information gain
among the attributes not yet considered in the path from the root.

Now, we need to find the most dominant factor for decisioning.

Wind factor on decision


Gain(Decision, Wind) = Entropy(Decision) – ∑ [ p(Decision|Wind) . Entropy(Decision|Wind) ]

Wind attribute has two labels: weak and strong. We would reflect it to the formula.

Gain(Decision, Wind) = Entropy(Decision) – [ p(Decision|Wind=Weak) .


Entropy(Decision|Wind=Weak) ] – [ p(Decision|Wind=Strong) .
Entropy(Decision|Wind=Strong) ]

Now, we need to calculate (Decision|Wind=Weak) and (Decision|Wind=Strong) respectively.

Dept. of ISE, JSSATEB 41


Artificial and Machine Learning Laboratory Lab Manual

Program:

import csv
import pprint
from math import *

lines=list(csv.reader(open('weather.csv','r')))
data=lines.pop(0)
print(data)
print()
print(lines)

def entropy(pos,neg):
if pos==0 or neg==0:
return 0
tot=pos+neg
return -pos/tot*log(pos/tot,2)-neg/tot*log(neg/tot,2)

def gain(lines,attr,pos,neg):
d,E,acu={},entropy(pos,neg),0
for i in lines:
if i[attr] not in d:
d[i[attr]]={}
d[i[attr]][i[-1]]=1+d[i[attr]].get(i[-1],0)
for i in d:
tot=d[i].get('yes',0)+d[i].get('no',0)
acu+=tot/(pos+neg)*entropy(d[i].get('yes',0),d[i].get('no',0))
return E-acu

def build(lines,data):
pos=len([x for x in lines if x[-1]=='yes'])
sz=len(lines[0])-1
neg=len(lines)-pos
if neg==0 or pos==0:
return 'yes' if neg==0 else 'no'
root=max([[gain(lines,i,pos,neg),i]for i in range(sz)])[1]
fin,res={},{}
uniq_attr=set([x[root] for x in lines])
print(">>>",uniq_attr)

for i in uniq_attr:
res[i]=build([x[:root]+x[root+1:] for x in lines if x[root]==i],data[:root]+data[root+1:])
fin[data[root]]=res
return fin

tree=build(lines,data)

Dept. of ISE, JSSATEB 42


Artificial and Machine Learning Laboratory Lab Manual

pprint.pprint(tree)

def classify(instance,tree,default=None):
attribute=next(iter(tree))
if instance[attribute] in tree[attribute].keys():
result=tree[attribute][instance[attribute]]
if isinstance(result,dict):
return classify(instance,result)
else:
return result
else:
return default

import pandas as pd
df_new=pd.read_csv('test.csv')
df_new['predicted']=df_new.apply(classify,axis=1 ,args=(tree,'?'))
print(df_new)

Output:

[['outlook', 'temperature', 'humidity', 'wind', 'play'],


['sunny', 'hot', 'high', 'weak', 'no'],
['sunny', 'hot', 'high', 'strong', 'no'],
['overcast', 'hot', 'high', 'weak', 'yes'],
['rainy', 'mild', 'high', 'weak', 'yes'],
['rainy', 'cool', 'normal', 'weak', 'yes'],
['rainy', 'cool', 'normal', 'strong', 'no'],
['overcast', 'cool', 'normal', 'strong', 'yes'],
['sunny', 'mild', 'high', 'weak', 'no'],
['sunny', 'cool', 'normal', 'weak', 'yes'],
['rainy', 'mild', 'normal', 'weak', 'yes'],
['sunny', 'mild', 'normal', 'strong', 'yes'],
['overcast', 'mild', 'high', 'strong', 'yes'],
['overcast', 'hot', 'normal', 'weak', 'yes'],
['rainy', 'mild', 'high', 'strong', 'no']]

['outlook', 'temperature', 'humidity', 'wind', 'play']


[['sunny', 'hot', 'high', 'weak', 'no'], ['sunny', 'hot', 'high', 'strong', 'no'], ['overcast', 'hot', 'high',
'weak', 'yes'], ['rainy', 'mild', 'high', 'weak', 'yes'], ['rainy', 'cool', 'normal', 'weak', 'yes'], ['rainy',
'cool', 'normal', 'strong', 'no'], ['overcast', 'cool', 'normal', 'strong', 'yes'], ['sunny', 'mild', 'high',
'weak', 'no'], ['sunny', 'cool', 'normal', 'weak', 'yes'], ['rainy', 'mild', 'normal', 'weak', 'yes'],
['sunny', 'mild', 'normal', 'strong', 'yes'], ['overcast', 'mild', 'high', 'strong', 'yes'], ['overcast', 'hot',
'normal', 'weak', 'yes'], ['rainy', 'mild', 'high', 'strong', 'no']]

Dept. of ISE, JSSATEB 43


Artificial and Machine Learning Laboratory Lab Manual

>>> {'overcast', 'rainy', 'sunny'}


>>> {'strong', 'weak'}
>>> {'high', 'normal'}
{'outlook': {'overcast': 'yes',
'rainy': {'wind': {'strong': 'no', 'weak': 'yes'}},
'sunny': {'humidity': {'high': 'no', 'normal': 'yes'}}}}

outlook temperature humidity wind predicted


0 overcast cool normal strong yes

Dept. of ISE, JSSATEB 44


Artificial and Machine Learning Laboratory Lab Manual

5. Build an Artificial Neural Network by implementing the Backpropagation algorithm


and test the same using appropriate data sets.

Artificial Neural Networks Overview


Artificial neural networks (ANNs) are statistical models directly inspired by, and partially modeled on
biological neural networks. They are capable of modelling and processing nonlinear relationships
between inputs and outputs in parallel. The related algorithms are part of the broader field of machine
learning, and can be used in many applications as discussed. Artificial neural networks are characterized
by containing adaptive weights along paths between neurons that can be tuned by a learning algorithm
that learns from observed data in order to improve the model. In addition to the learning algorithm
itself, one must choose an appropriate cost function.

The cost function is what’s used to learn the optimal solution to the problem being solved. This involves
determining the best values for all of the tuneable model parameters, with neuron path adaptive weights
being the primary target, along with algorithm tuning parameters such as the learning rate. It’s usually
done through optimization techniques such as gradient descent or stochastic gradient descent.

These optimization techniques basically try to make the ANN solution be as close as possible to the
optimal solution, which when successful means that the ANN is able to solve the intended problem with
high performance.

Dept. of ISE, JSSATEB 45


Artificial and Machine Learning Laboratory Lab Manual

Algorithm:
Phase 1: propagation
Each propagation involves the following steps:

1. Propagation forward through the network to generate the output value(s)


2. Calculation of the cost (error term)
3. Propagation of the output activations back through the network using the training pattern target to
generate the deltas (the difference between the targeted and actual output values) of all output and
hidden neurons.
Phase 2: weight update
For each weight, the following steps must be followed:

1. The weight's output delta and input activation are multiplied to find the gradient of the weight.
2. A ratio (percentage) of the weight's gradient is subtracted from the weight.
This ratio (percentage) influences the speed and quality of learning; it is called the learning rate. The greater
the ratio, the faster the neuron trains, but the lower the ratio, the more accurate the training is. The sign of
the gradient of a weight indicates whether the error varies directly with, or inversely to, the weight.
Therefore, the weight must be updated in the opposite direction, "descending" the gradient.
Learning is repeated (on new batches) until the network performs adequately.

Consider the below Neural Network:

The above network contains the following:

• two inputs
• two hidden neurons
• two output neurons
• two biases

Dept. of ISE, JSSATEB 46


Artificial and Machine Learning Laboratory Lab Manual

Below are the steps involved in Backpropagation:

• Step – 1: Forward Propagation


• Step – 2: Backward Propagation
• Step – 3: Putting all the values together and calculating the updated weight value

Step – 1: Forward Propagation

We will start by propagating forward.

We will repeat this process for the output layer neurons, using the output from the hidden layer
neurons as inputs.

Now, let’s see what is the value of the error:

Dept. of ISE, JSSATEB 47


Artificial and Machine Learning Laboratory Lab Manual

Step – 2: Backward Propagation

Now, we will propagate backwards. This way we will try to reduce the error by changing the
values of weights and biases.

Consider W5, we will calculate the rate of change of error w.r.t change in weight W5.

Since we are propagating backwards, first thing we need to do is, calculate the change in total
errors w.r.t the output O1 and O2.

Now, we will propagate further backwards and calculate the change in output O1 w.r.t to its
total net input.

Let’s see now how much does the total net input of O1 changes w.r.t W5?

Dept. of ISE, JSSATEB 48


Artificial and Machine Learning Laboratory Lab Manual

Step – 3: Putting all the values together and calculating the updated weight value

Now, let’s put all the values together:

Let’s calculate the updated value of W5:

• Similarly, we can calculate the other weight values as well.


• After that we will again propagate forward and calculate the output. Again, we will
calculate the error.
• If the error is minimum we will stop right there, else we will again propagate backwards
and update the weight values.
• This process will keep on repeating until error becomes minimum.

Program:

import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
# Features(Hours Slept, Hours Studied)
y = np.array(([92], [86], [89]), dtype=float)
# Labels(Marks obtained)
X = X/np.amax(X,axis=0)# Normalize
y = y/100

def sigmoid(x):
return 1/(1 + np.exp(-x))

def sigmoid_grad(x):
return x * (1 - x)

epoch=1000 #Setting training iterations


eta =0.2 #Setting learning rate (eta)
input_neurons = 2 #number of features in data set
hidden_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer

# Weight and bias - Random initialization

Dept. of ISE, JSSATEB 49


Artificial and Machine Learning Laboratory Lab Manual

wh=np.random.uniform(size=(input_neurons,hidden_neurons)) # 2x3
bh=np.random.uniform(size=(1,hidden_neurons)) # 1x3
wout=np.random.uniform(size=(hidden_neurons,output_neurons)) # 1x1
bout=np.random.uniform(size=(1,output_neurons))

for i in range(epoch):
#Forward Propogation
h_ip=np.dot(X,wh) + bh # Dot product + Bias
h_act = sigmoid(h_ip) # Activation function
o_ip=np.dot(h_act,wout) + bout
output = sigmoid(o_ip)

#Backpropagation
# Error at Output layer
Eo = y-output # Error at o/p
outgrad = sigmoid_grad(output)
d_output = Eo* outgrad # Errj=Oj(1-Oj)(Tj-Oj)

# Error at Hidden layer


Eh = d_output.dot(wout.T) # .T means transpose
hiddengrad = sigmoid_grad(h_act)
# How much hidden layer weights contributed to error
d_hidden = Eh * hiddengrad
wout += h_act.T.dot(d_output) *eta
# Dot product of next layer error and current layer op
wh += X.T.dot(d_hidden) *eta

print("Normalized Input: \n" + str(X))


print("\nActual Output: \n" + str(y))
print("\nPredicted Output: \n" ,output)

Output:

Normalized Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.83940487]
[0.82214181]
[0.84026615]]

Dept. of ISE, JSSATEB 50


Artificial and Machine Learning Laboratory Lab Manual

6.. Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.

Naive Bayesian classifier is a statistical method that can predict class membership
probabilities such as the probability that a given tuple belongs to a particular class.
Bayesian classifier is based on the Bayes theorem and it assumes that the effect of an
attribute value on a given class is independent of the values of the other attributes. This
assumption is called class conditional independence. It is made to simplify the computations
involved and in this sense, is called as "naive".

The naive Bayesian classifier is fast and incremental can deal with discrete and continuous
attributes, has excellent performance in real-life problems. In this paper, the algorithm
of the naive Bayesian classifier is deployed successively enabling it to solve
classification problems while retaining all advantages of naive Bayesian classifier. The
comparison of performance in various domains ofmaterials classes confirms the
advantages of successive learning and suggests its application to other learning
algorithms.

Abstractly, naive Bayes is a conditional probability model: given a problem instance to be


classified, represented by a vector representing some n features (independent variables), it
assigns to this instance probabilities

for each of K possible outcomes or classes Ck

The problem with the above formulation is that if the number of features n is large or if a
feature can take on a large number of values, then basing such a model on probability tables is
infeasible. We therefore reformulate the model to make it more tractable. Using Bayes'
theorem, the conditional probability can be decomposed as

In plain English, using Bayesian probability terminology, the above equation can be written as

In Gaussian Naive Bayes, continuous values associated with each feature are assumed to be
distributed according to a Gaussian distribution.

Dept. of ISE, JSSATEB 51


Artificial and Machine Learning Laboratory Lab Manual

Working

Step 1: Convert the data set into a frequency table

Step 2: Create Likelihood table by finding the probabilities

Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class.
The class with the highest posterior probability is the outcome of prediction.

Pros:

1) It is easy and fast to predict class of test data set.


2) It also perform well in multi class prediction
3) When assumption of independence holds, a Naive Bayes classifier performs better compare
to other models like logistic regression and you need less training data.
4) It perform well in case of categorical input variables compared to numerical variable(s). For
numerical variable, normal distribution is assumed (bell curve, which is a strong
assumption).

Cons:

1) If categorical variable has a category (in test data set), which was not observed in training
data set, then model will assign a 0 (zero) probability and will be unable to make a
prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing
technique. One of the simplest smoothing techniques is called Laplace estimation.

2) On the other side naive Bayes is also known as a bad estimator, so the probability outputs
from predict_proba are not to be taken too seriously.

3) Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it
is almost impossible that we get a set of predictors which are completely independent.

----------------------------------Explanation-------------------------------------
It is a classification technique based on Bayes‟ Theorem with an assumption of
independence among predictors. In simple terms, a Naive Bayes classifier assumes that
the presence of a particular feature in a class is unrelated to the presence of any other
feature. For example, a fruit may be considered to be an apple if it is red, round, and
about 3 inches in diameter. Even if these features depend on each other or upon the
existence of the other features, all of these properties independently contribute to the
probability that this fruit is an apple and that is why it is known as „Naive‟.

1) Handling Of Data:
➢ Load the data from the CSV file and split in to training and test data set.
➢ Training data set can be used to by Naïve Bayes to make predictions.
➢ And Test data set can be used to evaluate the accuracy of the model.

Dept. of ISE, JSSATEB 52


Artificial and Machine Learning Laboratory Lab Manual

Feature vectors represent the frequencies with which certain events have been generated by a
multinomial distribution.

2) Summarize Data:
The summary of the training data collected involves the mean and the standard deviation for
each attribute, by class value.
➢ These are required when making predictions to calculate the probability of
specific attribute values belonging to each class value.
➢ summary data can be break down into the following sub-tasks:

a) Separate Data By Class:The first task is to separate the training dataset instances by class
value so that we can calculate statistics for each class. We can do that by creating a map of each
class value to a list of instances that belong to that class and sort the entire dataset of instances
into the appropriate lists.

b) Calculate Mean: We need to calculate the mean of each attribute for a class value. The mean
is the central middle or central tendency of the data, and we will use it as the middle of
our gaussian distribution when calculating probabilities.

3) Calculate Standard Deviation: We also need to calculate the standard deviation of each
attribute for a class value. The standard deviation describes the variation of spread of the
data, and we will use it to characterize the expected spread of each attribute in our Gaussian
distribution when calculating probabilities.

4) Summarize Dataset: For a given list of instances (for a class value) we can calculate the
mean and the standard deviation for each attribute.
The zip function groups the values for each attribute across our data instances into
their own lists so that we can compute the mean and standard deviation values for the
attribute.

5) Summarize Attributes By Class: We can pull it all together by first separating our
training dataset into instances grouped by class. Then calculate the summaries for each
attribute.

3) Make Predictions:
• Making predictions involves calculating the probability that a given data instance
belongs to each class,
• then selecting the class with the largest probability as the prediction.
• Finally, estimation of the accuracy of the model by making predictions for each data
instance in the test dataset.

4) Evaluate Accuracy: The predictions can be compared to the class values in the test dataset
and a classification\accuracy can be calculated as an accuracy ratio between 0& and
100%.Dataset-This problem is comprised of 768 observations of medical details for Pima
indians patents. The records describe instantaneous measurements taken from the patient

Dept. of ISE, JSSATEB 53


Artificial and Machine Learning Laboratory Lab Manual

such as their age, the number of times pregnant and blood workup. All patients are
women aged 21 or older. All attributes are numeric, and their units vary from attribute to
attribute.
Program:

import csv
import random
import math

def loadCsv(filename):
lines = csv.reader(open(filename, "r"))
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset

def splitDataset(dataset, splitRatio):


trainSize = int(len(dataset) * splitRatio)
trainSet = []
copy = list(dataset)
while len(trainSet) < trainSize:
index = random.randrange(len(copy))
trainSet.append(copy.pop(index))
return [trainSet, copy]

def separateByClass(dataset):
separated = {}
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated

def mean(numbers):
return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
return math.sqrt(variance)

def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)]
del summaries[-1]
return summaries

Dept. of ISE, JSSATEB 54


Artificial and Machine Learning Laboratory Lab Manual

def summarizeByClass(dataset):
separated = separateByClass(dataset)
summaries = {}
for classValue, instances in separated.items():
summaries[classValue] = summarize(instances)
return summaries

def calculateProbability(x, mean, stdev):


exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent

def calculateClassProbabilities(summaries, inputVector):


probabilities = {}
for classValue, classSummaries in summaries.items():
probabilities[classValue] = 1
for i in range(len(classSummaries)):
mean, stdev = classSummaries[i]
x = inputVector[i]
probabilities[classValue] *= calculateProbability(x, mean, stdev)
return probabilities

def predict(summaries, inputVector):


probabilities = calculateClassProbabilities(summaries, inputVector)
bestLabel, bestProb = None, -1
for classValue, probability in probabilities.items():
if bestLabel is None or probability > bestProb:
bestProb = probability
bestLabel = classValue
return bestLabel

def getPredictions(summaries, testSet):


predictions = []
for i in range(len(testSet)):
result = predict(summaries, testSet[i])
predictions.append(result)
return predictions

def getAccuracy(testSet, predictions):


correct = 0
for i in range(len(testSet)):
if testSet[i][-1] == predictions[i]:
correct += 1
return (correct/float(len(testSet))) * 100.0

def main():

Dept. of ISE, JSSATEB 55


Artificial and Machine Learning Laboratory Lab Manual

filename = 'diabetes.csv'
splitRatio = 0.87
dataset = loadCsv(filename)
trainingSet, testSet = splitDataset(dataset, splitRatio)
print("----------------------------------Output of naïve Bayesian classifier\n")
print('Spliting {} rows into training={} and testing={} rows'.format(len(dataset),
len(trainingSet), len(testSet)))
# prepare model
summaries = summarizeByClass(trainingSet)
# test model
predictions = getPredictions(summaries, testSet)
accuracy = getAccuracy(testSet, predictions)
print('Classification Accuracy: {}%'.format(accuracy))
print("-----------------------------------------------------")

main()

Output:

Spliting 768 rows into training=668 and testing=100 rows


Classification Accuracy: 74.0%
-----------------------------------------------------

Dept. of ISE, JSSATEB 56


Artificial and Machine Learning Laboratory Lab Manual

7. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set
for clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in
the program.

Clustering is a technique for finding similarity groups in a data, called clusters. It attempts to
group individuals in a population together by similarity, but not driven by a specific purpose.
Clustering is often called an unsupervised learning, as you don’t have prescribed labels in the
data and no class values denoting a priori grouping of the data instances are given.

Main objectives of clustering are:



ntra-cluster distance is minimized.

nter-cluster distance is maximized.

Types of clustering:
Hierarchical clustering: Also known as 'nesting clustering' as it also clusters to exist within
bigger clusters to form a tree.

Partition clustering: Its simply a division of the set of data objects into non-overlapping clusters
such that each objects is in exactly one subset.

Exclusive Clustering: They assign each value to a single cluster.

Overlapping Clustering: It is used to reflect the fact that an object can simultaneously belong to
more than one group.

Fuzzy clustering: Every objects belongs to every cluster with a membership weight that goes
between 0:if it absolutely doesn't belong to cluster and 1:if it absolutely belongs to the cluster.

Complete clustering: It perform a hierarchical clustering using a set of dissimilarities on 'n'


objects that are being clustered. They tend to find compact clusters of an approaximately equal
diameter.

Dept. of ISE, JSSATEB 57


Artificial and Machine Learning Laboratory Lab Manual

Taxonomy for different type of clustering analysis


EM algorithm
A Gaussian process is a collection of random variables, any finite number of which have a joint
Gaussian distribution. By assuming that the considered system is a Gaussian process,
predictions can be made by computing the conditional distribution p(y(x∗)|all the
observations), y(x∗) being the output for which we seek a prediction. This regression approach
is referred to as Gaussian process regression.

Expectation-Maximization (EM) algorithm is an iterative procedure to estimate the maximum


likelihood of mixture density distribution. Just to give a note about the name, Baum-Welch
Algorithm that are used in Hidden Markov Model (HMM) is a special case of EM.

The Expectation Maximization (EM) algorithm computes ML estimates of unknown parameters


in probabilistic models involving latent variables.

Strategy: Use structure inherent in the probabilistic model to separate the original ML problem
into two closely linked sub problems, each of which is hopefully in some sense more tractable
than the original problem.

Dept. of ISE, JSSATEB 58


Artificial and Machine Learning Laboratory Lab Manual

The original theory of EM algorithm was to obtain mixture probability density distribution from
‘incomplete’ data samples.

A general algorithm to deal with hidden data, but we will study it in the context of unsupervised
learning (hidden class labels = clustering) first.
• EM is an optimization strategy for objective functions that can be interpreted as
likelihoods in the presence of missing data.
• EM is much simpler than gradient methods: No need to choose step size.
• EM is an iterative algorithm with two linked steps:
o E-step: fill-in hidden values using inference
o M-step: apply standard MLE/MAP method to completed data
• We will prove that this procedure monotonically improves the likelihood (or leaves it
unchanged). EM always converges to a local optimum of the likelihood.

Techniques to form cluster method: 1. K Means, 2 DBSCAN 3. Heirarchical

intends to partition n objects into k clusters in which each object belongs to the cluster with the
nearest mean. This method produces exactly k different clusters of greatest possible distinction.
The best number of clusters k leading to the greatest separation (distance) is not known as a
priori and must be computed from the data. The objective of K-Means clustering is to minimize
total intra-cluster variance, or, the squared error function:

K-Means is relatively an efficient method. However, we need to specify the number of clusters,
in advance and the final results are sensitive to initialization and often terminates at a local
optimum. Unfortunately there is no global theoretical method to find the optimal number of
clusters. A practical approach is to compare the outcomes of multiple runs with different k and
choose the best one based on a predefined criterion. In general, a large k probably decreases
the error but increases the risk of overfitting.

Dept. of ISE, JSSATEB 59


Artificial and Machine Learning Laboratory Lab Manual

Advantages
1) Fast, robust and easier to understand.
2) Relatively efficient: O(tknd), where n is # objects, k is # clusters, d is # dimension of each
object, and t is # iterations. Normally, k, t, d << n.
3) Gives best result when data set are distinct or well separated from each other.

Disadvantages
1) The learning algorithm requires apriori specification of the number of cluster centers.
2) The use of Exclusive Assignment - If there are two highly overlapping data then k-means
will not be able to resolve that there are two clusters.
3) The learning algorithm is not invariant to non-linear transformations i.e. with different
representation of data we get different results (data represented in form of cartesian co-
ordinates and polar co-ordinates will give different results).
4) Euclidean distance measures can unequally weight underlying factors.
5) The learning algorithm provides the local optima of the squared error function.
6) Randomly choosing of the cluster center cannot lead us to the fruitful result. Pl. refer Fig.
7) Applicable only when mean is defined i.e. fails for categorical data.
8) Unable to handle noisy data and outliers.
9) Algorithm fails for non-linear data set.

Program:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.cluster import KMeans
import sklearn.metrics as sm
import pandas as pd
import numpy as np
%matplotlib inline

iris=datasets.load_iris()
X=pd.DataFrame(iris.data)

Dept. of ISE, JSSATEB 60


Artificial and Machine Learning Laboratory Lab Manual

X.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y=pd.DataFrame(iris.target)
y.columns=['Targets']

plt.figure(figsize=(14,7))

Output: <Figure size 1008x504 with 0 Axes>


<Figure size 1008x504 with 0 Axes>
olormap=np.array(['red','lime','black'])

plt.subplot(1,2,1)
plt.scatter(X.Sepal_Length,X.Sepal_Width,c=colormap[y.Targets],s=40)
plt.title('Sepal')
plt.subplot(1,2,2)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.Targets],s=40)
plt.title('Petal')
Text(0.5,1,'Petal')

model=KMeans(n_clusters=3)
model.fit(X)

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,


n_clusters=3, n_init=10, n_jobs=1, precompute_distances='auto',
random_state=None, tol=0.0001, verbose=0)

model.labels_

Output: array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2,
2, 2, 2, 0, 0, 2, 2, 2, 2, 0, 2, 0, 2, 0, 2, 2, 0, 0, 2, 2, 2, 2,
2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 0, 2, 2, 2, 0, 2, 2, 0])

Dept. of ISE, JSSATEB 61


Artificial and Machine Learning Laboratory Lab Manual

plt.figure(figsize=(14,7))

Output: <Figure size 1008x504 with 0 Axes>


<Figure size 1008x504 with 0 Axes>
colormap=np.array(['red','lime','black'])
plt.subplot(1,2,1)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.Targets],s=40)
plt.title('Real Classification')

Output: Text(0.5,1,'Real Classification')

plt.subplot(1,2,2)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('K Mean Classification')

Output: Text(0.5,1,'K Mean Classification')

sm.accuracy_score(y,model.labels_)

Dept. of ISE, JSSATEB 62


Artificial and Machine Learning Laboratory Lab Manual

Output: 0.24
sm.confusion_matrix(y,model.labels_)

Output: array([[ 0, 50, 0],


[48, 0, 2],
[14, 0, 36]], dtype=int64)
from sklearn import preprocessing
scaler=preprocessing.StandardScaler()
scaler.fit(X)
xsa=scaler.transform(X)
xs=pd.DataFrame(xsa,columns=X.columns)
xs.sample(5)
from sklearn.mixture import GaussianMixture
gmm=GaussianMixture(n_components=3)
gmm.fit(xs)

Output: GaussianMixture(covariance_type='full', init_params='kmeans', max_iter=100,


means_init=None, n_components=3, n_init=1, precisions_init=None,
random_state=None, reg_covar=1e-06, tol=0.001, verbose=0,
verbose_interval=10, warm_start=False, weights_init=None)

GaussianMixture(covariance_type='full', init_params='kmeans',
max_iter=100,means_init=None, n_components=3, n_init=1,
precisions_init=None,random_state=None, reg_covar=1e-06, tol=0.001,
verbose=0,verbose_interval=10, warm_start=False, weights_init=None)
y_cluster_gmm=gmm.predict(xs)
y_cluster_gmm
plt.subplot(1,2,1)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm],s=40)
plt.title('GMM Classification')

Output: Text(0.5,1,'GMM Classification')

Dept. of ISE, JSSATEB 63


Artificial and Machine Learning Laboratory Lab Manual

plt.subplot(1,2,2)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('K Mean Classification')

Output: Text(0.5,1,'K Mean Classification')

sm.accuracy_score(y,y_cluster_gmm)
Output: 0.03333333333333333
sm.confusion_matrix(y,y_cluster_gmm)
Output: array([[ 0, 0, 50],
[45, 5, 0],
[ 0, 50, 0]], dtype=int64)

Dept. of ISE, JSSATEB 64


Artificial and Machine Learning Laboratory Lab Manual

8. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions. Java/Python ML library classes can be used for this
problem.

The KNN algorithm is a robust and versatile classifier that is often used as a benchmark for
more complex classifiers such as Artificial Neural Networks (ANN) and Support Vector
Machines (SVM). Despite its simplicity, KNN can outperform more powerful classifiers and is
used in a variety of applications such as economic forecasting, data compression and genetics.
For example, KNN was leveraged in a 2006 study of functional genomics for the assignment of
genes based on their expression profiles.

Let’s first start by establishing some definitions and notations. We will use x to denote a feature
(aka. predictor, attribute) and y to denote the target (aka. label, class) we are trying to predict.

KNN falls in the supervised learning family of algorithms. Informally, this means that we are
given a labelled dataset consiting of training observations (x,y) and would like to capture the
relationship between x and y. More formally, our goal is to learn a function h:X→Y so that given
an unseen observation x, h(x) can confidently predict the corresponding output y.

The KNN classifier is also a non parametric and instance-based learning algorithm.

Non-parametric means it makes no explicit assumptions about the functional form of h,


avoiding the dangers of mismodeling the underlying distribution of the data. For example,
suppose our data is highly non-Gaussian but the learning model we choose assumes a Gaussian
form. In that case, our algorithm would make extremely poor predictions.
Instance-based learning means that our algorithm doesn’t explicitly learn a model. Instead, it
chooses to memorize the training instances which are subsequently used as “knowledge” for
the prediction phase. Concretely, this means that only when a query to our database is made
(i.e. when we ask it to predict a label given an input), will the algorithm use the training
instances to spit out an answer.
KNN is non-parametric, instance-based and used in a supervised learning setting.

It is worth noting that the minimal training phase of KNN comes both at a memory cost, since
we must store a potentially huge data set, as well as a computational cost during test time since
classifying a given observation requires a run down of the whole data set. Practically speaking,
this is undesirable since we usually want fast responses.

Example of k-NN classification. The test sample (inside circle) should be classified either to the
first class of blue squares or to the second class of red triangles. If k = 3 (outside circle) it is
assigned to the second class because there are 2 triangles and only 1 square inside the inner
circle. If, for example k = 5 it is assigned to the first class (3 squares vs. 2 triangles outside the
outer circle).

Dept. of ISE, JSSATEB 65


Artificial and Machine Learning Laboratory Lab Manual

Minimal training but expensive testing.

How does KNN work?


In the classification setting, the K-nearest neighbor algorithm essentially boils down to forming
a majority vote between the K most similar instances to a given “unseen” observation.
Similarity is defined according to a distance metric between two data points. A popular choice is
the Euclidean distance given by

but other measures can be more suitable for a given setting and include the Manhattan,
Chebyshev and Hamming distance.

More formally, given a positive integer K, an unseen observation x and a similarity metric d,
KNN classifier performs the following two steps:

It runs through the whole dataset computing d between x and each training observation. We’ll
call the K points in the training data that are closest to x the set A. Note that K is usually odd to
prevent tie situations.

It then estimates the conditional probability for each class, that is, the fraction of points in A
with that given class label. (Note I(x) is the indicator function which evaluates to 1 when the
argument x is true and 0 otherwise)

Finally, our input x gets assigned to the class with the largest probability.

KNN searches the memorized training observations for the K instances that most closely
resemble the new instance and assigns to it the their most common class.

Dept. of ISE, JSSATEB 66


Artificial and Machine Learning Laboratory Lab Manual

An alternate way of understanding KNN is by thinking about it as calculating a decision


boundary (i.e. boundaries for more than 2 classes) which is then used to classify new points.

More on K
At this point, you’re probably wondering how to pick the variable K and what its effects are on
your classifier. Well, like most machine learning algorithms, the K in KNN is a hyperparameter
that you, as a designer, must pick in order to get the best possible fit for the data set. Intuitively,
you can think of K as controlling the shape of the decision boundary we talked about earlier.

When K is small, we are restraining the region of a given prediction and forcing our classifier to
be “more blind” to the overall distribution. A small value for K provides the most flexible fit,
which will have low bias but high variance. Graphically, our decision boundary will be more
jagged.

Pros:
• No assumptions about data — useful, for example, for nonlinear data
• Simple algorithm — to explain and understand/interpret
• High accuracy (relatively) — it is pretty high but not competitive in comparison to better
supervised learning models
• Versatile — useful for classification or regression

Cons:
• Computationally expensive — because the algorithm stores all of the training data
• High memory requirement
• Stores all (or almost all) of the training data
• Prediction stage might be slow (with big N)
• Sensitive to irrelevant features and the scale of the data

Applications of KNN
Credit ratings — collecting financial characteristics vs. comparing people with similar financial
features to a database. By the very nature of a credit rating, people who have similar financial
details would be given similar credit ratings. Therefore, they would like to be able to use this
existing database to predict a new customer’s credit rating, without having to perform all the
calculations.

Should the bank give a loan to an individual? Would an individual default on his or her loan? Is
that person closer in characteristics to people who defaulted or did not default on their loans?

In political science — classing a potential voter to a “will vote” or “will not vote”, or to “vote
Democrat” or “vote Republican”.
More advance examples could include handwriting detection (like OCR), image recognition and
even video recognition.

Dept. of ISE, JSSATEB 67


Artificial and Machine Learning Laboratory Lab Manual

Program:
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
from sklearn.model_selection import train_test_split
iris_dataset=load_iris()
print("\nIRIS FEATURES \TARGET NAMES: \n",iris_dataset.target_names)
for i in range(len(iris_dataset.target_names)):
print("\n[{0}]:[{1}]".format(i,iris_dataset.target_names[i]))
print("\nIRIS DATA :\n",iris_dataset["data"])
X_train,X_test,y_train,y_test=train_test_split(iris_dataset["data"],iris_dataset["target"],random_s
tate=0)
print("\nTarget :\n",iris_dataset["target"])
print("\nX TRAIN \n",X_train)
print("\nX TEST \n",X_test)
print("\nY TRAIN \n",y_train)
print("\nY TEST \n",y_test)
kn=KNeighborsClassifier(n_neighbors=1)
kn.fit(X_train,y_train)
x_new=np.array([[5,2.9,1,0.2]])
print("\nXNEW \n",x_new)
prediction=kn.predict(x_new)
print("\nPredicted target value: {}\n".format(prediction))
print("\nPredicted feature name:{}\n".format(iris_dataset["target_names"][prediction]))
i=1
x=X_test[i]
x_new=np.array([x])
print("\nXNEW \n",x_new)
for i in range(len(X_test)):
x=X_test[i]
x_new=np.array([x])
prediction=kn.predict(x_new)
print("\n Actual:[{0}][{1}] \t,
Predicted:{2}{3}".format(y_test[i],iris_dataset["target_names"][y_test[i]],prediction,iris_dataset
["target_names"][prediction]))
print("\nTEST SCORE[ACCURACY]: {:.2f}\n".format(kn.score(X_test,y_test)))

IRIS FEATURES \TARGET NAMES:


['setosa' 'versicolor' 'virginica']
[0]:[setosa]
[1]:[versicolor]
[2]:[virginica]

Target :

Dept. of ISE, JSSATEB 68


Artificial and Machine Learning Laboratory Lab Manual

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0000000000000111111111111111111111111
1111111111111111111111111122222222222
2222222222222222222222222222222222222
2 2]

X TEST
[[5.8 2.8 5.1 2.4]
[6. 2.2 4. 1. ]
[5.5 4.2 1.4 0.2]
[7.3 2.9 6.3 1.8]
[5. 3.4 1.5 0.2]
[6.3 3.3 6. 2.5]
[5. 3.5 1.3 0.3]
[6.7 3.1 4.7 1.5]
[6.8 2.8 4.8 1.4]
[6.1 2.8 4. 1.3]
[6.1 2.6 5.6 1.4]
[6.4 3.2 4.5 1.5]
[6.1 2.8 4.7 1.2]
[6.5 2.8 4.6 1.5]
[6.1 2.9 4.7 1.4]
[4.9 3.1 1.5 0.1]
[6. 2.9 4.5 1.5]
[5.5 2.6 4.4 1.2]
[4.8 3. 1.4 0.3]
[5.4 3.9 1.3 0.4]
[5.6 2.8 4.9 2. ]
[5.6 3. 4.5 1.5]
[4.8 3.4 1.9 0.2]
[4.4 2.9 1.4 0.2]
[6.2 2.8 4.8 1.8]
[4.6 3.6 1. 0.2]
[5.1 3.8 1.9 0.4]
[6.2 2.9 4.3 1.3]
[5. 2.3 3.3 1. ]
[5. 3.4 1.6 0.4]
[6.4 3.1 5.5 1.8]
[5.4 3. 4.5 1.5]
[5.2 3.5 1.5 0.2]
[6.1 3. 4.9 1.8]
[6.4 2.8 5.6 2.2]
[5.2 2.7 3.9 1.4]
[5.7 3.8 1.7 0.3]
[6. 2.7 5.1 1.6]]

Dept. of ISE, JSSATEB 69


Artificial and Machine Learning Laboratory Lab Manual

Y TEST
[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0
1]

XNEW
[[5. 2.9 1. 0.2]]

Predicted target value: [0]


Predicted feature name:['setosa']

XNEW
[[6. 2.2 4. 1. ]]

Actual:[2][virginica] , Predicted:[2]['virginica']

Actual:[1][versicolor] , Predicted:[1]['versicolor']

Actual:[0][setosa] , Predicted:[0]['setosa']

Actual:[2][virginica] , Predicted:[2]['virginica']

Actual:[0][setosa] , Predicted:[0]['setosa']

Actual:[2][virginica] , Predicted:[2]['virginica']

Actual:[0][setosa] , Predicted:[0]['setosa']

Actual:[1][versicolor] , Predicted:[1]['versicolor']

Actual:[1][versicolor] , Predicted:[1]['versicolor']

Actual:[1][versicolor] , Predicted:[1]['versicolor']

Actual:[2][virginica] , Predicted:[2]['virginica']

Actual:[1][versicolor] , Predicted:[1]['versicolor']

Actual:[1][versicolor] , Predicted:[1]['versicolor']

Actual:[1][versicolor] , Predicted:[1]['versicolor']

Actual:[1][versicolor] , Predicted:[1]['versicolor']

Actual:[0][setosa] , Predicted:[0]['setosa']

Dept. of ISE, JSSATEB 70


Artificial and Machine Learning Laboratory Lab Manual

Actual:[1][versicolor] , Predicted:[1]['versicolor']

Actual:[1][versicolor] , Predicted:[1]['versicolor']

Actual:[0][setosa] , Predicted:[0]['setosa']

Actual:[0][setosa] , Predicted:[0]['setosa']

Actual:[2][virginica] , Predicted:[2]['virginica']

Actual:[1][versicolor] , Predicted:[1]['versicolor']

Actual:[0][setosa] , Predicted:[0]['setosa']

Actual:[0][setosa] , Predicted:[0]['setosa']

Actual:[2][virginica] , Predicted:[2]['virginica']

Actual:[0][setosa] , Predicted:[0]['setosa']

Actual:[0][setosa] , Predicted:[0]['setosa']

Actual:[1][versicolor] , Predicted:[1]['versicolor']

Actual:[1][versicolor] , Predicted:[1]['versicolor']

Actual:[0][setosa] , Predicted:[0]['setosa']

Actual:[2][virginica] , Predicted:[2]['virginica']

Actual:[1][versicolor] , Predicted:[1]['versicolor']

Actual:[0][setosa] , Predicted:[0]['setosa']

Actual:[2][virginica] , Predicted:[2]['virginica']

Actual:[2][virginica] , Predicted:[2]['virginica']

Actual:[1][versicolor] , Predicted:[1]['versicolor']

Actual:[0][setosa] , Predicted:[0]['setosa']

Actual:[1][versicolor] , Predicted:[2]['virginica']

TEST SCORE[ACCURACY]: 0.97

Dept. of ISE, JSSATEB 71


Artificial and Machine Learning Laboratory Lab Manual

9. Implement the non-parametric Locally Weighted Regression algorithm in order to


fit data points. Select appropriate data set for your experiment and draw graphs.

Locally weighted Learning (also known as memory-based learning, instance-based learning,


lazy-learning, and closely related to kernel density estimation, similarity searching and case-

Dept. of ISE, JSSATEB 72


Artificial and Machine Learning Laboratory Lab Manual

based reasoning), Locally Weighted Learning is a class of function approximation techniques,


where a prediction is done by using an approximated local model around the current
point of interest.

Locally weighted learning is simple but appealing, both intuitively and statistically. And it has
been around since the turn of the century. When you want to predict what is going to happen in
the future, you simply reach into a database of all your previous experiences, grab some similar
experiences, combine them (perhaps by a weighted average that weights more similar
experiences more strongly) and use the combination to make a prediction, do a regression, or
many other more sophisticated operations. We like this approach to learning, especially for
learning process dynamics or robot dynamics, because it is very flexible (low bias) so provided
we have plenty of data we will eventually get an accurate model.

The cost function in locally weighted linear regression is

where x(i) is the ith instance, y(i) is its corresponding class label, θ are the model parameters,
and is the weight of the ith instance, given by

Here, the ’s are non-negative valuedweights. Intuitively, if ’s large for a particular


value of i, then in picking ɵ, we’ll try hard to make small. If is small,
then the error term will bepretty much ignored in the fit.A fairly standard
choice for the weights is

where τ is the bandwidth, and xx is the query point, which is fixed for a given regression model
and is typically one of the instances.

Locally weighted linear regression is a non-parametric method for fitting data points. What
does that mean?

Instead of fitting a single regression line, you fit many linear regression models. The final
resulting smooth curve is the product of all those regression models.
Obviously, we can't fit the same linear model again and again. Instead, for each linear model we
want to fit, we find a point x and use that for fitting a local regression model.
We find points that closest to x to fit each of our local regression model. That's why you'll see
the algorithm is also known as nearest neighbours algorithm in the literature.
Now, if your data points have the x-values from 1 to 100: [1,2,3 ... 98, 99, 100]. The algorithm
would fit a linear model for 1,2,3...,98,99,100. That means, you'll have 100 regression models.

Again, when we fit each of the model, we can't just use all the data points in the sample. For
each of the model, we find the closest points and use that for fitting. For example, if the

Dept. of ISE, JSSATEB 73


Artificial and Machine Learning Laboratory Lab Manual

algorithm wants to fit for x=50, it will put higher weight on [48,49,50,51,52] and less weight on
[45,46,47,53,54,55]. When it tries to fit for x=95, the points [92,93,95,96,97] will have higher
weight than any other data points.

Linear Regression only give you a overall prediction (a line !!), so it won’t helpful in real world
data. Locally weighted linear regression comes to some bias into our estimator. The bias can be
computed in many ways. In this case, we would like to use RBF equation to set up the bias. RBF
equation, also call RBF kernel, is a way to calculate the distance between one points to others.
The feature of RBF is to give stronger biases to the data points, which is near the data set we are
interesting in.

The code provide here contains few steps:

1. Take a data point, calculate the bias with others using RBF kernel. The equation of RBF can be
found here. I define bias as Wei

2. take biases Wi into normal equation, so the normal equation with local weighted:

If anything seems right, you will get a weight W. This W is specificity for that point.

3. Finally, use the W and the data point to test how good the estimator is.

run step 1 ~ 3 for every points and store into a vector and print it.

LWR uses only the points in a short distance to one another to estimate polynomial expression
of each point subject to smoothing. This method is effective with diverse time series data that
result from the data's varying linearity, periodicity and non‐linearity. While this method is quite
similar to multiple linear regression, the analysis takes place solely with the values of the
nearby data according to the conditions. Because this is a non‐parametric method, no
assumption was presented in the model.

Program:

import numpy as np
from ipywidgets import interact
from bokeh.plotting import figure, show, output_notebook
from bokeh.layouts import gridplot
from bokeh.io import push_notebook
output_notebook()

def local_regression(x0, X, Y, tau):


x0 = np.r_[1, x0]
X = np.c_[np.ones(len(X)), X]

Dept. of ISE, JSSATEB 74


Artificial and Machine Learning Laboratory Lab Manual

# fit model: normal equations with kernel


xw = X.T * radial_kernel(x0, X, tau)
beta = np.linalg.pinv(xw @ X) @ xw @ Y
# predict value
return x0 @ beta

def radial_kernel(x0, X, tau):


return np.exp(np.sum((X - x0) ** 2, axis=1) / (-2 * tau * tau))
n = 1000
# generate dataset
X = np.linspace(-3, 3, num=n)
Y = np.log(np.abs(X ** 2 - 1) + .5)
# jitter X
X += np.random.normal(scale=.1, size=n)

def plot_lwr(tau):
# prediction
domain = np.linspace(-3, 3, num=300)
prediction = [local_regression(x0, X, Y, tau) for x0 in domain]
plot = figure(plot_width=400, plot_height=400)
plot.title.text = 'tau=%g' % tau
plot.scatter(X, Y, alpha=.3)
plot.line(domain, prediction, line_width=2, color='red')
return plot

show(gridplot([
[plot_lwr(10.), plot_lwr(1.)],
[plot_lwr(0.1), plot_lwr(0.01)]
]))

show(plot, notebook_handle=True)

interact(interactive_update, tau=(0.01, 3., 0.01))

Dept. of ISE, JSSATEB 75


Artificial and Machine Learning Laboratory Lab Manual

6. Viva Questions

1. Explain what is an algorithm in computing?


2. How do you handle missing or corrupted data in a dataset?
3. What is machine learning?
4. How do you know which Machine Learning model you should use?
5. What is batch normalization and why does it work?
6. How do deductive and inductive machine learning differ?
7. What is supervised versus unsupervised learning?
8. How do bias and variance play out in machine learning?
9. What’s the trade-off between bias and variance?
10. What is gradient descent?
11. What is the difference between stochastic gradient descent (SGD) and gradient descent
(GD)?
12. Explain over- and under-fitting and how to combat them?
13. If you split your data into train/test splits, is it still possible to overfit your model?
14. How much data should you allocate for your training, validation, and test sets?
15. How do you combat the curse of dimensionality?
16. What are some methods of reducing dimensionality?
17. What are 3 ways of reducing dimensionality?
18. What is regularization, why do we use it, and give some examples of common methods?
19. Show how the CANDIDATE – ELIMINATION algorithm is used to learn from training
examples and hypothesize new instances in Version Space?
20. Explain Principal Component Analysis (PCA)?
21. What is the ROC Curve and what is AUC (a.k.a. AUROC)?
22. Why is Area Under ROC Curve (AUROC) better than raw accuracy as an out-of- sample
evaluation metric?
23. How can you choose a classifier based on training set size?
24. Explain Kmeans clustering algorithm?
25. What is decision tree classification?

Dept. of ISE, JSSATEB 76


Artificial and Machine Learning Laboratory Lab Manual

26. What are the advantages and disadvantages of decision trees?


27. How do classification and regression differ?
28. How do you choose an algorithm for a classification problem?
29. What are the advantages and disadvantages of neural networks?
30. Why is ReLU better and more often used than Sigmoid in Neural Networks?
31. What is a recommendation system?
32. What are some key business metrics for (Retail bank | e-Commerce site)?
33. Why are ensemble methods superior to individual models?
34. How can you help our marketing team be more efficient?
35. Show how the FIND–S algorithm can be used to classify new instances of target concepts.
Run the experiments to deduce instances and hypothesis consistently?
36. Demonstrate the use of ID3 algorithm for learning Boolean-valued functions for
classifying the training examples by searching through the space of a decision tree?
37. Demonstrate the back-propagation algorithm for learning the task of classification
involving applications like face-recognition?
38. Show the application of Naive Bayes algorithm for learning and classifying text
documents?
39. Demonstrate the use K-nearest neighbor algorithm for unsupervised learning task with
the help of a suitable example?

Dept. of ISE, JSSATEB 77


Artificial and Machine Learning Laboratory Lab Manual

7. AI and ML Glossary
accuracy
The fraction of predictions that a classification model got right. In multi-class classification,
accuracy is defined as follows:
Accuracy=Correct PredictionsTotal Number Of Examples
In binary classification, accuracy has the following definition:
Accuracy=True Positives+True NegativesTotal Number Of Examples
See true positive and true negative.

activation function
A function (for example, ReLU or sigmoid) that takes in the weighted sum of all of the inputs
from the previous layer and then generates and passes an output value (typically nonlinear) to
the next layer.

AUC (Area under the ROC Curve)


An evaluation metric that considers all possible classification thresholds.
The Area Under the ROC curve is the probability that a classifier will be more confident that a
randomly chosen positive example is actually positive than that a randomly chosen negative
example is positive.

backpropagation
The primary algorithm for performing gradient descent on neural networks. First, the output
values of each node are calculated (and cached) in a forward pass. Then, the partial
derivative of the error with respect to each parameter is calculated in a backward pass through
the graph.

baseline
A simple model or heuristic used as reference point for comparing how well a model is
performing. A baseline helps model developers quantify the minimal, expected performance on
a particular problem.

batch
The set of examples used in one iteration (that is, one gradient update) of model training.
See also batch size.

batch size
The number of examples in a batch. For example, the batch size of SGD is 1, while the batch size
of a mini-batch is usually between 10 and 1000. Batch size is usually fixed during training and
inference; however, TensorFlow does permit dynamic batch sizes.

bias (math)
An intercept or offset from an origin. Bias (also known as the bias term) is referred to
as b or w0 in machine learning models. For example, bias is the b in the following formula:
y′=b+w1x1+w2x2+…wnxn

Dept. of ISE, JSSATEB 78


Artificial and Machine Learning Laboratory Lab Manual

Not to be confused with bias in ethics and fairness or prediction bias.

binary classification
A type of classification task that outputs one of two mutually exclusive classes. For example, a
machine learning model that evaluates email messages and outputs either "spam" or "not
spam" is a binary classifier.

candidate sampling
A training-time optimization in which a probability is calculated for all the positive labels, using,
for example, softmax, but only for a random sample of negative labels. For example, if we have
an example labeled beagle and dog candidate sampling computes the predicted probabilities
and corresponding loss terms for the beagle and dog class outputs in addition to a random
subset of the remaining classes (cat, lollipop, fence). The idea is that the negative classes can
learn from less frequent negative reinforcement as long as positive classes always get proper
positive reinforcement, and this is indeed observed empirically. The motivation for candidate
sampling is a computational efficiency win from not computing predictions for all negatives.

categorical data
Features having a discrete set of possible values. For example, consider a categorical feature
named house style, which has a discrete set of three possible values: Tudor, ranch, colonial. By
representing house style as categorical data, the model can learn the separate impacts
of Tudor, ranch, and colonial on house price.
Sometimes, values in the discrete set are mutually exclusive, and only one value can be applied
to a given example. For example, a car maker categorical feature would probably permit only a
single value (Toyota) per example. Other times, more than one value may be applicable. A single
car could be painted more than one different color, so a car color categorical feature would
likely permit a single example to have multiple values (for example, red and white).
Categorical features are sometimes called discrete features.
Contrast with numerical data.

centroid
The center of a cluster as determined by a k-means or k-median algorithm. For instance, if k is
3, then the k-means or k-median algorithm finds 3 centroids.

class
One of a set of enumerated target values for a label. For example, in a binary
classification model that detects spam, the two classes are spam and not spam. In a multi-class
classification model that identifies dog breeds, the classes would be poodle, beagle, pug, and so
on.
classification model
A type of machine learning model for distinguishing among two or more discrete classes. For
example, a natural language processing classification model could determine whether an input
sentence was in French, Spanish, or Italian. Compare with regression model.

Dept. of ISE, JSSATEB 79


Artificial and Machine Learning Laboratory Lab Manual

classification threshold
A scalar-value criterion that is applied to a model's predicted score in order to separate
the positive class from the negative class. Used when mapping logistic regression results
to binary classification. For example, consider a logistic regression model that determines the
probability of a given email message being spam. If the classification threshold is 0.9, then
logistic regression values above 0.9 are classified as spam and those below 0.9 are classified
as not spam.

clustering
Grouping related examples, particularly during unsupervised learning. Once all the examples
are grouped, a human can optionally supply meaning to each cluster.
Many clustering algorithms exist. For example, the k-means algorithm clusters examples based
on their proximity to acentroid, as in the following diagram:
treeheighttree widthcentroidcluster 1cluster 2
A human researcher could then review the clusters and, for example, label cluster 1 as "dwarf
trees" and cluster 2 as "full-size trees."
As another example, consider a clustering algorithm based on an example's distance from a
center point, illustrated as follows:
cluster 1cluster 2cluster 3

collaborative filtering
Making predictions about the interests of one user based on the interests of many other users.
Collaborative filtering is often used in recommendation systems.

confirmation bias
#fairness
The tendency to search for, interpret, favor, and recall information in a way that confirms one's
preexisting beliefs or hypotheses. Machine learning developers may inadvertently collect or
label data in ways that influence an outcome supporting their existing beliefs. Confirmation bias
is a form of implicit bias.
Experimenter's bias is a form of confirmation bias in which an experimenter continues
training models until a preexisting hypothesis is confirmed.

confusion matrix
An NxN table that summarizes how successful a classification model's predictions were; that
is, the correlation between the label and the model's classification. One axis of a confusion
matrix is the label that the model predicted, and the other axis is the actual label. N represents
the number of classes. In a binary classification problem, N=2. For example, here is a sample
confusion matrix for a binary classification problem:
Tumor Non-Tumor (predicted)
(predicted)
Tumor (actual) 18 1
Non-Tumor 6 452
(actual)

Dept. of ISE, JSSATEB 80


Artificial and Machine Learning Laboratory Lab Manual

The preceding confusion matrix shows that of the 19 samples that actually had tumors, the
model correctly classified 18 as having tumors (18 true positives), and incorrectly classified 1
as not having a tumor (1 false negative). Similarly, of 458 samples that actually did not have
tumors, 452 were correctly classified (452 true negatives) and 6 were incorrectly classified (6
false positives).
The confusion matrix for a multi-class classification problem can help you determine mistake
patterns. For example, a confusion matrix could reveal that a model trained to recognize
handwritten digits tends to mistakenly predict 9 instead of 4, or 1 instead of 7.
Confusion matrices contain sufficient information to calculate a variety of performance metrics,
including precision and recall.

continuous feature
A floating-point feature with an infinite range of possible values. Contrast with discrete
feature
DataFrame
A popular datatype for representing data sets in Pandas. A DataFrame is analogous to a table.
Each column of the DataFrame has a name (a header), and each row is identified by a number.

data set
A collection of examples.

decision boundary
The separator between classes learned by a model in a binary class or multi-class
classification problems. For example, in the following image representing a binary
classification problem, the decision boundary is the frontier between the orange class and the
blue class:

discrete feature
A feature with a finite set of possible values. For example, a feature whose values may only
be animal, vegetable, or mineral is a discrete (or categorical) feature. Contrast with continuous
feature

dynamic model
A model that is trained online in a continuously updating fashion. That is, data is continuously
entering the model.

ensemble
A merger of the predictions of multiple models. You can create an ensemble via one or more of
the following:
• different initializations
• different hyperparameters
• different overall structure
Deep and wide models are a kind of ensemble.

Dept. of ISE, JSSATEB 81


Artificial and Machine Learning Laboratory Lab Manual

epoch
A full training pass over the entire data set such that each example has been seen once. Thus, an
epoch represents N/batch size training iterations, where N is the total number of examples.

Estimator
An instance of the tf.Estimator class, which encapsulates logic that builds a TensorFlow graph
and runs a TensorFlow session. You may create your own custom Estimators (as
described here) or instantiate premade Estimatorscreated by others.

false negative (FN)


An example in which the model mistakenly predicted the negative class. For example, the
model inferred that a particular email message was not spam (the negative class), but that
email message actually was spam.

false positive (FP)


An example in which the model mistakenly predicted the positive class. For example, the
model inferred that a particular email message was spam (the positive class), but that email
message was actually not spam.

false positive rate (FP rate)


The x-axis in an ROC curve. The FP rate is defined as follows:
False Positive Rate=False PositivesFalse Positives+True Negatives

feature
An input variable used in making predictions.

feature cross
A synthetic feature formed by crossing (multiplying or taking a Cartesian product of)
individual features. Feature crosses help represent nonlinear relationships.

feature set
The group of features your machine learning model trains on. For example, postal code,
property size, and property condition might comprise a simple feature set for a model that
predicts housing prices.

gradient
The vector of partial derivatives with respect to all of the independent variables. In machine
learning, the gradient is the the vector of partial derivatives of the model function. The gradient
points in the direction of steepest ascent.

gradient descent
A technique to minimize loss by computing the gradients of loss with respect to the model's
parameters, conditioned on training data. Informally, gradient descent iteratively adjusts
parameters, gradually finding the best combination of weights and bias to minimize loss.

Dept. of ISE, JSSATEB 82


Artificial and Machine Learning Laboratory Lab Manual

heuristic
A practical and nonoptimal solution to a problem, which is sufficient for making progress or for
learning from.

hidden layer
A synthetic layer in a neural network between the input layer (that is, the features) and
the output layer (the prediction). A neural network contains one or more hidden layers.

hyperplane
A boundary that separates a space into two subspaces. For example, a line is a hyperplane in
two dimensions and a plane is a hyperplane in three dimensions. More typically in machine
learning, a hyperplane is the boundary separating a high-dimensional space. Kernel Support
Vector Machines use hyperplanes to separate positive classes from negative classes, often in a
very high-dimensional space.

iteration
A single update of a model's weights during training. An iteration consists of computing the
gradients of the parameters with respect to the loss on a single batch of data.

learning rate
A scalar used to train a model via gradient descent. During each iteration, the gradient
descent algorithm multiplies the learning rate by the gradient. The resulting product is called
the gradient step Learning rate is a key hyperparameter.

least squares regression


A linear regression model trained by minimizing L2 Loss.

linear regression
A type of regression model that outputs a continuous value from a linear combination of input
features.

logistic regression
A model that generates a probability for each possible discrete label value in classification
problems by applying a sigmoid function to a linear prediction. Although logistic regression is
often used in binary classification problems, it can also be used in multi-class classification
problems (where it becomes called multi-class logistic regression or multinomial
regression).

Mean Squared Error (MSE)


The average squared loss per example. MSE is calculated by dividing the squared loss by the
number of examples. The values that TensorFlow Playground displays for "Training loss" and
"Test loss" are MSE

Dept. of ISE, JSSATEB 83


Artificial and Machine Learning Laboratory Lab Manual

metric
A number that you care about. May or may not be directly optimized in a machine-learning
system. A metric that your system tries to optimize is called an objective.

neural network
A model that, taking inspiration from the brain, is composed of layers (at least one of which
is hidden) consisting of simple connected units or neurons followed by nonlinearities.

neuron
A node in a neural network, typically taking in multiple input values and generating one
output value. The neuron calculates the output value by applying an activation
function (nonlinear transformation) to a weighted sum of input values.

normalization
The process of converting an actual range of values into a standard range of values, typically -1
to +1 or 0 to 1. For example, suppose the natural range of a certain feature is 800 to 6,000.
Through subtraction and division, you can normalize those values into the range -1 to +1.

numpy
An open-source math library that provides efficient array operations in Python. pandas is built
on numpy.

outliers
Values distant from most other values. In machine learning, any of the following are outliers:
• Weights with high absolute values.
• Predicted values relatively far away from the actual values.
• Input data whose values are more than roughly 3 standard deviations from the mean.
Outliers often cause problems in model training.

output layer
The "final" layer of a neural network. The layer containing the answer(s).

overfitting
Creating a model that matches the training data so closely that the model fails to make correct
predictions on new data.

pandas
A column-oriented data analysis API. Many ML frameworks, including TensorFlow, support
pandas data structures as input. See pandas documentation.

parameter
A variable of a model that the ML system trains on its own. For example, weights are
parameters whose values the ML system gradually learns through successive training
iterations. Contrast with hyperparameter.

Dept. of ISE, JSSATEB 84


Artificial and Machine Learning Laboratory Lab Manual

performance
Overloaded term with the following meanings:
• The traditional meaning within software engineering. Namely: How fast (or efficiently) does
this piece of software run?
• The meaning within ML. Here, performance answers the following question: How correct is
this model? That is, how good are the model's predictions?

precision
• A metric for classification models. Precision identifies the frequency with which a model
was correct when predicting the positive class. That is:
• Precision=True PositivesTrue Positives+False Positives

prediction
• A model's output when provided with an input example.

recall
A metric for classification models that answers the following question: Out of all the possible
positive labels, how many did the model correctly identify? That is:
Recall=True PositivesTrue Positives+False Negatives

Rectified Linear Unit (ReLU)


An activation function with the following rules:
• If input is negative or zero, output is 0.
• If input is positive, output is equal to input.

regression model
A type of model that outputs continuous (typically, floating-point) values. Compare
with classification models, which output discrete values, such as "day lily" or "tiger lily."

regularization
The penalty on a model's complexity. Regularization helps prevent overfitting. Different kinds
of regularization include:
• L1 regularization
• L2 regularization
• dropout regularization
• early stopping (this is not a formal regularization method, but can effectively limit
overfitting)

regularization rate
A scalar value, represented as lambda, specifying the relative importance of the regularization
function. The following simplified loss equation shows the regularization rate's influence:
minimize(loss function + λ(regularization function))
Raising the regularization rate reduces overfitting but may make the model less accurate.

Dept. of ISE, JSSATEB 85


Artificial and Machine Learning Laboratory Lab Manual

ROC (receiver operating characteristic) Curve


A curve of true positive rate vs. false positive rate at different classification thresholds. See
also AUC.

scikit-learn
A popular open-source ML platform. See www.scikit-learn.org.

semi-supervised learning
Training a model on data where some of the training examples have labels but others don’t. One
technique for semi-supervised learning is to infer labels for the unlabeled examples, and then to
train on the inferred labels to create a new model. Semi-supervised learning can be useful if
labels are expensive to obtain but unlabeled examples are plentiful.

sigmoid function
A function that maps logistic or multinomial regression output (log odds) to probabilities,
returning a value between 0 and 1.

sparsity
The number of elements set to zero (or null) in a vector or matrix divided by the total number
of entries in that vector or matrix. For example, consider a 10x10 matrix in which 98 cells
contain zero. The calculation of sparsity is as follows:
sparsity=98100=0.98
Feature sparsity refers to the sparsity of a feature vector; model sparsity refers to the
sparsity of the model weights.

squared loss
The loss function used in linear regression. (Also known as L2 Loss.) This function calculates
the squares of the difference between a model's predicted value for a labeled example and the
actual value of the label. Due to squaring, this loss function amplifies the influence of bad
predictions. That is, squared loss reacts more strongly to outliers than L1loss.

static model
A model that is trained offline.

step
A forward and backward evaluation of one batch.

step size
Synonym for learning rate.

stochastic gradient descent (SGD)


A gradient descent algorithm in which the batch size is one. In other words, SGD relies on a
single example chosen uniformly at random from a data set to calculate an estimate of the
gradient at each step.

Dept. of ISE, JSSATEB 86


Artificial and Machine Learning Laboratory Lab Manual

target
Synonym for label.

temporal data
Data recorded at different points in time. For example, winter coat sales recorded for each day
of the year would be temporal data.

true positive (TP)


An example in which the model correctly predicted the positive class. For example, the model
inferred that a particular email message was spam, and that email message really was spam.

true positive rate (TP rate)


Synonym for recall. That is:
True Positive Rate=True PositivesTrue Positives+False Negatives
True positive rate is the y-axis in an ROC curve.

unlabeled example
An example that contains features but no label. Unlabeled examples are the input to inference.
In semi-supervised andunsupervised learning, unlabeled examples are used during training.

unsupervised machine learning


Training a model to find patterns in a data set, typically an unlabeled data set.

validation set
A subset of the data set—disjunct from the training set—that you use to
adjust hyperparameters.
Contrast with training set and test set

weight
A coefficient for a feature in a linear model, or an edge in a deep network. The goal of training a
linear model is to determine the ideal weight for each feature. If a weight is 0, then its
corresponding feature does not contribute to the model.

Dept. of ISE, JSSATEB 87

You might also like