0% found this document useful (0 votes)
105 views27 pages

Aiml Manual

The document is a lab manual for the AIML course at Vignan's Lara Institute of Technology & Science, detailing a series of experiments focused on machine learning using tools like Weka and Python. It includes instructions for data preprocessing, building classification models, and exploring various machine learning algorithms. Additionally, it covers the use of decision trees, neural networks, and data visualization techniques, along with practical steps for using the Weka toolkit.

Uploaded by

raam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views27 pages

Aiml Manual

The document is a lab manual for the AIML course at Vignan's Lara Institute of Technology & Science, detailing a series of experiments focused on machine learning using tools like Weka and Python. It includes instructions for data preprocessing, building classification models, and exploring various machine learning algorithms. Additionally, it covers the use of decision trees, neural networks, and data visualization techniques, along with practical steps for using the Weka toolkit.

Uploaded by

raam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

VIGNAN’S LARA INSTITUTE OF TECHNOLOGY & SCIENCE

(Affiliated to JNTU, Kakinada)

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

AIML

LAB MANUAL

III [Link] II SEM


(R20 REGULATION)
List of Experiments

1. Data Preprocessing with Weka or Python


2. Building Decision Trees for Soybean classification model using Weka or Python
3. Generating association rules on Weather data using Weka or Python
4. Exploring machine learning models including classification and clustering using scikitlearn
or Weka or Python
5. Build Neural Network Classifier using Weka or Python
6. Supervisely - Perform Data Labeling for various images using object recognition
7. Image Classifier using Tensor Flow or OpenCV
8. Automatic Facial recognition using Microsoft Azure or OpenCV
EXPERIMENT- 1
Explore machine learning tool “WEKA”
 Explore WEKA Data Mining/Machine Learning Toolkit.
 Downloading and/or installation of WEKA data mining toolkit.
 Understand the features of WEKA toolkit such as Explorer, Knowledge Flow interface,
Experimenter, command-line interface.
 Navigate the options available in the WEKA (ex. Select attributes panel, Preprocess
panel, Classify panel, Cluster panel, Associate panel and Visualize panel)
 Study the arff file format Explore the available data sets in WEKA. Load a data set (ex.
Weather dataset, Iris dataset, etc.)
 Load each dataset and observe the following:
1. List the attribute names and they types
2. Number of records in each dataset
3. Identify the class attribute (if any)
4. Plot Histogram
5. Determine the number of records for each class.
6. Visualize the data in various dimensions

(i) Downloading and/or installation of WEKA data mining toolkit.


Ans: Install Steps for WEKA a Data Mining Tool
1. Download the software as your requirements from the below given link.
[Link]
2. The Java is mandatory for installation of WEKA so if you have already Java on your
machine then download only WEKA else download the software with JVM.
3. Then open the file location and double click on the file
4. Click Next

5. Click I Agree.
6. As your requirement do the necessary changes of settings and click Next. Full and
Associate files are the recommended settings.

7. Change to your desire installation location.

8. If you want a shortcut then check the box and click Install.
9. The Installation will start wait for a while it will finish within a minute.

10. After complete installation click on Next.


11. Hurray !!!!!!! That’s all click on the Finish and take a shovel and start Mining. Best of Luck.

This is the GUI you get when started. You have 4 options Explorer, Experimenter,
KnowledgeFlow and Simple CLI.
(ii) Understand the features of WEKA tool kit such as Explorer, Knowledge flow interface,
Experimenter, command-line interface.

Ans: WEKA
Weka is created by researchers at the university WIKATO in New Zealand. University of
Waikato, Hamilton, New Zealand Alex Seewald (original Command-line primer) David Scuse
(original Experimenter tutorial)

It is java based application.


It is collection often source, Machine Learning Algorithm.
The routines (functions) are implemented as classes and logically arranged in packages.
It comes with an extensive GUI Interface.
Weka routines can be used standalone via the command line interface.
The Graphical User Interface;-
The Weka GUI Chooser (class [Link]) provides a starting point for launching
Weka’s main GUI applications and supporting tools. If one prefers a MDI (“multiple
document interface”) appearance, then this is provided by an alternative launcher called
“Main”(class [Link]). The GUI Chooser consists of four buttons—one for each of the
four major Weka applications—and four menus.

The buttons can be used to start the following applications:


1. Explorer An environment for exploring data with WEKA (the rest of this Documentation
deals with this application in more detail).
2. Experimenter An environment for performing experiments and conducting statistical tests
between learning schemes.
3. Knowledge Flow This environment supports essentially the same functions as the Explorer
but with a drag-and-drop interface. One advantage is that it supports incremental learning.
4. SimpleCLI Provides a simple command-line interface that allows direct execution of WEKA
commands for operating systems that do not provide their own command line interface.
5. Explorer The Graphical user interface.
1.1 Section Tabs
At the very top of the window, just below the title bar, is a row of tabs. When the Explorer is first
started only the first tab is active; the others are grayed out. This is because it is necessary to
open (and potentially pre-process) a data set before starting to explore the data.
The tabs are as follows:
1. Preprocess. Choose and modify the data being acted on.
2. Classify. Train & test learning schemes that classify or perform regression
3. Cluster. Learn clusters for the data.
4. Associate. Learn association rules for the data.
5. Select attributes. Select the most relevant attributes in the data.
6. Visualize. View an interactive 2D plot of the data.

Once the tabs are active, clicking on them flicks between different screens, on which the
respective actions can be performed. The bottom area of the window (including the status box,
the log button, and the Weka bird) stays visible regardless of which section you are in. The
Explorer can be easily extended with custom tabs. The Wiki article “Adding tabs in the
Explorer” explains this in detail.
2. Weka Experimenter:-
The Weka Experiment Environment enables the user to create, run, modify, and analyze
experiments in a more convenient manner than is possible when processing the schemes
individually. For example, the user can create an experiment that runs several schemes against
a series of datasets and then analyze the results to determine if one of the schemes is
(statistically) better than the other schemes.

The Experiment Environment can be run from the command line using the Simple CLI. For
example, the following commands could be typed into the CLI to run the OneR scheme on the
Iris dataset using a basic train and test process. (Note that the commands would be typed on one
line into the CLI.) While commands can be typed directly into the CLI, this technique is not
particularly convenient and the experiments are not easy to modify.

The Experimenter comes in two flavors’, either with a simple interface that provides most
of the functionality one needs for experiments, or with an interface with full access to the
Experimenter’s capabilities. You can choose between those two with the Experiment
Configuration Mode radio buttons:
Knowledge Flow
Introduction
The Knowledge Flow provides an alternative to the Explorer as a graphical front end to WEKA’s
core algorithms. The Knowledge Flow presents a data-flow inspired interface to WEKA. The
user can select WEKA components from a palette, place them on a layout canvas and connect
them together in order to form a knowledge flow for processing and analyzing data. At present,
all of WEKA’s classifiers, filters, clusterers, associators, loaders and savers are available in the
Knowledge Flow along with some extra tools.

The Knowledge Flow offers the following features:


1. Intuitive data flow style layout.
2. Process data in batches or incrementally
3. Process multiple batches or streams in parallel (each separate flow executes in its own
thread).
4. Process multiple streams sequentially via a user-specified order of execution.
5. Chain filters together.
6. View models produced by classifiers for each fold in a cross validation.
7. Visualize performance of incremental classifiers during processing (scrolling plots of
classification accuracy, RMS error, predictions etc.).
8. Plugin “perspectives” that add major new functionality (e.g. 3D data visualization, time
series forecasting environment etc.).

4. Simple CLI
The Simple CLI provides full access to all Weka classes, i.e., classifiers, filters, clusterers, etc.,
but without the hassle of the CLASSPATH (it facilitates the one, with which Weka was started).
It offers a simple Weka shell with separated command line and output.
(iii) Navigate the options available in the WEKA([Link] attributes panel,preprocess
panel,classify panel,cluster panel,associate panel and visualize)

Ans: Steps for identify options in WEKA


1. Open WEKA Tool.
2. Click on WEKA Explorer.
3. Click on Preprocessing tab button.
4. Click on open file button.
5. Choose WEKA folder in C drive.
6. Select and Click on data option button.
7. Choose iris data set and open file.
8. All tabs available in WEKA home page.

(iv) Study the ARFF file format


Ans: ARFF File Format
An ARFF (= Attribute-Relation File Format) file is an ASCII text file that describes a list of
instances sharing a set of attributes. ARFF files are not the only format one can load, but all files
that can be converted with Weka’s “core
converters”. The following formats are currently supported:
ARFF (+ compressed)
C4.5
CSV
Libsvm
binary serialized instances
XRFF (+ compressed)
(v) Explore the available data sets in WEKA.
Ans: Steps for identifying data sets in WEKA
1. Open WEKA Tool.
2. Click on WEKA Explorer.
3. Click on open file button.
4. Choose WEKA folder in C drive.
5. Select and Click on data option button.

Sample Weka Data Sets


Below are some sample WEKA data sets, in arff format.
[Link]
[Link]
[Link]
[Link]
[Link]
[Link]
[Link]
[Link]
[Link]
[Link]
[Link]
[Link]
[Link]

(vi) Load a data set ([Link] dataset,Iris dataset,etc.)


Ans: Steps for load the Weather data set.
1. Open WEKA Tool.
2. Click on WEKA Explorer.
3. Click on open file button.
4. Choose WEKA folder in C drive.
5. Select and Click on data option button.
6. Choose [Link] file and Open the file.

Steps for load the Iris data set.


1. Open WEKA Tool.
2. Click on WEKA Explorer.
3. Click on open file button.
4. Choose WEKA folder in C drive.
5. Select and Click on data option button.
6. Choose [Link] file and Open the file.
(vii) Load each dataset and observe the following: (vii.i) List attribute names and they
types
Ans: Example [Link]
List out the attribute names:
1. outlook
2. temperature
3. humidity
4. windy
5. play

([Link]) Number of records in each dataset. Ans:


@relation [Link]
@attribute outlook {sunny, overcast, rainy}
@attribute temperature {hot, mild, cool}
@attribute humidity {high, normal} @attribute windy {TRUE, FALSE} @attribute play {yes,
no}
@data sunny,hot,high,FALSE,no sunny,hot,high,TRUE,no overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes rainy,cool,normal,FALSE,yes rainy,cool,normal,TRUE,no
overcast,cool,normal,TRUE,yes sunny,mild,high,FALSE,no sunny,cool,normal,FALSE,yes
rainy,mild,normal,FALSE,yes sunny,mild,normal,TRUE,yes overcast,mild,high,TRUE,yes
overcast,hot,normal,FALSE,yes rainy,mild,high,TRUE,no

([Link]) Identify the class attribute (if any) Ans: class attributes
1. sunny
2. overcast
3. rainy

([Link]) Plot Histogram


Ans: Steps for identify the plot histogram
1. Open WEKA Tool.
2. Click on WEKA Explorer.
3. Click on Visualize button.
4. Click on right click button.
5. Select and Click on polyline option button.
(vii.v) Determine the number of records for each class
Ans: @relation [Link]
@data sunny,hot,high,FALSE,no sunny,hot,high,TRUE,no overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes rainy,cool,normal,FALSE,yes rainy,cool,normal,TRUE,no
overcast,cool,normal,TRUE,yes sunny,mild,high,FALSE,no sunny,cool,normal,FALSE,yes
rainy,mild,normal,FALSE,yes sunny,mild,normal,TRUE,yes overcast,mild,high,TRUE,yes
overcast,hot,normal,FALSE,yes rainy,mild,high,TRUE,no

([Link]) Visualize the data in various dimensions

Click on Visualize All button in WEKA Explorer.


EXPERIMENT 2
Demonstrate performing classification on data sets
 Extract if-then rules from the decision tree generated by the classifier, Observe the
confusion matrix

Ans: A decision tree is a structure that includes a root node, branches, and leaf nodes. Each
internal node denotes a test on an attribute, each branch denotes the outcome of a test, and each
leaf node holds a class label. The topmost node in the tree is the root node.

The following decision tree is for the concept buy_computer that indicates whether a customer at
a company is likely to buy a computer or not. Each internal node represents a test on an attribute.
Each leaf node represents a class.

The benefits of having a decision tree are as follows −

1. It does not require any domain knowledge.


2. It is easy to comprehend.
3. The learning and classification steps of a decision tree are simple and fast.

IF-THEN Rules:
Rule-based classifier makes use of a set of IF-THEN rules for classification. We can express a
rule in the following from −

IF condition THEN conclusion


Let us consider a rule R1,
R1: IF age=youth AND student=yes
THEN buy_computer=yes

Points to remember −
The IF part of the rule is called rule antecedent or precondition.
The THEN part of the rule is called rule consequent.
The antecedent part the condition consist of one or more attribute tests and these tests are
logically ANDed.
The consequent part consists of class prediction.

Note − We can also write rule R1 as follows:

R1: (age = youth) ^ (student = yes))(buys computer = yes)


If the condition holds true for a given tuple, then the antecedent is satisfied. Rule Extraction
Here we will learn how to build a rule-based classifier by extracting IF-THEN rules from a
decision tree.

Points to remember −

One rule is created for each path from the root to the leaf node. To form a rule antecedent, each
splitting criterion is logically ANDed. The leaf node holds the class prediction, forming the rule
consequent.

Rule Induction Using Sequential Covering Algorithm

Sequential Covering Algorithm can be used to extract IF-THEN rules form the training data. We
do not require to generate a decision tree first. In this algorithm, each rule for a given class
covers many of the tuples of that class.

Some of the sequential Covering Algorithms are AQ, CN2, and RIPPER. As per the general
strategy the rules are learned one at a time. For each time rules are learned, a tuple covered by
the rule is removed and the process continues for the rest of the tuples. This is because the path
to each leaf in a decision tree corresponds to a rule.

Note − The Decision tree induction can be considered as learning a set of rules simultaneously.
The Following is the sequential learning Algorithm where rules are learned for one class at a
time. When learning a rule from a class Ci, we want the rule to cover all the tuples from class C
only and no tuple form any other class.
Algorithm: Sequential Covering

Input:

D, a data set class-labeled tuples, Att_vals, the set of all attributes and their possible values.

Rule Pruning
The rule is pruned is due to the following reason −
The Assessment of quality is made on the original set of training data. The rule may perform
well on training data but less well on subsequent data. That's why the rule pruning is required.
The rule is pruned by removing conjunct. The rule R is pruned, if pruned version of R has greater
quality than what was assessed on an independent set of tuples.
FOIL is one of the simple and effective method for rule pruning. For a given rule R,
FOIL_Prune = pos - neg / pos + neg

where pos and neg is the number of positive tuples covered by R, respectively.
Note − This value will increase with the accuracy of R on the pruning set. Hence, if the
FOIL_Prune value is higher for the pruned version of R, then we prune R.

Steps for run decision tree algorithms in WEKA


1. Open WEKA Tool.
2. Click on WEKA Explorer.
3. Click on Preprocessing tab button.
4. Click on open file button.
5. Choose WEKA folder in C drive.
6. Select and Click on data option button
7. Choose iris data set and open file.
8. . Click on classify tab and Choose decision table algorithm and select cross-validation folds
value-10 test option.
9. Click on start button
EXPERIMENT - 3
Perform data preprocessing tasks and Demonstrate performing association rule miningon data
sets
 Load weather. nominal, Iris, Glass datasets into Weka and run Apriori Algorithmwith
different support and confidence values.
A. Load each dataset into Weka and run Aprior algorithm with different support and
confidence values. Study the rules generated.
Ans:
Steps for run Apriori algorithm in WEKA
1. Open WEKA Tool.
2. Click on WEKA Explorer.
3. Click on Preprocessing tab button.
4. Click on open file button.
5. Choose WEKA folder in C drive.
6. Select and Click on data option button.
7. Choose Weather data set and open file.
8. Click on Associate tab and Choose Aprior algorithm
9. Click on start button.

Association Rule:
An association rule has two parts, an antecedent (if) and a consequent (then). An antecedent is an
item found in the data. A consequent is an item that is found in combination with the antecedent.
Association rules are created by analyzing data for frequent if/then patterns and using the criteria
support and confidence to identify the most important relationships. Support is an indication of
how frequently the items appear in the database. Confidence indicates the number of times
the if/then statements have been found to be true.
In data mining, association rules are useful for analyzing and predicting customer behavior. They
play an important part in shopping basket data analysis, product clustering, catalog design and
store layout.

Support and Confidence values:

Support count: The support count of an itemset X, denoted by [Link], in a data set T is the
number of transactions in T that contain X. Assume T has n transactions.
Then,
support ( X 𝖴Y ).count n
confidence support = support({AU C})
( X 𝖴 Y ).count
confidence = support({A U C})/support({A})
Experiment-4
4. Demonstrate performing clustering of data sets
.
[Link] each dataset into Weka and run simple k-means clustering algorithm with different
values of k(number of desired clusters). Study the clusters formed. Observe the sum of
squared errors and centroids, and derive insights.

Ans: Steps for run K-mean Clustering algorithms in WEKA

1. Open WEKA Tool.


2. Click on WEKA Explorer.
[Link] on Preprocessing tab button.
[Link] on open file button.
[Link] WEKA folder in C drive.
6. Select and Click on data option button.
7. Choose iris data set and open file.
8. Click on cluster tab and Choose k-mean and select use training set test option.
9. Click on start button.

[Link] other clustering techniques available in Weka.

Ans: Clustering Algorithms And Techniques in WEKA, They are


Experiment-5
Build neural network classifier using weka

Weka is a popular machine learning software that can be used to build a neural
network classifier
. To build a neural network classifier in Weka, the Multilayer
Perceptron algorithm can be used
. The following steps can be followed to build a neural network classifier in
Weka:
1. Load the dataset: The first step is to load the dataset into Weka. This can be
done by clicking on the "Explorer" tab and then clicking on "Open file" to select
the dataset.
2. Select the Multilayer Perceptron algorithm: Click on the "Classify" tab and
then click on the "Choose" button in the "Classifier" section. Under the
"functions" folder, select the "Multilayer Perceptron" item.
3. Set the test options: In the "Test options" section, select the "Percentage split"
option and set it to 80%. This tells Weka to use 80% of the dataset to create the
neural network and 20% to evaluate its accuracy.
4. Build the neural network: Click the "Start" button to build the neural network.
5. Evaluate the accuracy: Once the neural network is built, its accuracy can be
evaluated by looking at the "Classification accuracy" and "Confusion matrix" in
the output.
conclusion
It is important to note that building a neural network classifier in Weka requires
some knowledge of machine learning and neural networks. It is recommended to
have a basic understanding of these concepts before attempting to build a neural
network classifier in Weka.

You might also like