0% found this document useful (0 votes)
56 views

Data Base Management Key Points

Weka is a collection of machine learning algorithms for data mining tasks such as classification, regression, clustering, and association rule mining. It contains tools for data pre-processing, classification, regression, clustering, visualization, and feature selection. Weka also supports various data formats including ARFF, CSV and databases through ODBC. It provides a graphical user interface and Java APIs for incorporating machine learning into applications.

Uploaded by

vishnu vardhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Data Base Management Key Points

Weka is a collection of machine learning algorithms for data mining tasks such as classification, regression, clustering, and association rule mining. It contains tools for data pre-processing, classification, regression, clustering, visualization, and feature selection. Weka also supports various data formats including ARFF, CSV and databases through ODBC. It provides a graphical user interface and Java APIs for incorporating machine learning into applications.

Uploaded by

vishnu vardhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Name: vishnu vardhan P

roll no:45
​DWDM ASSIGNMENT

INTRODUCTION TO WEKA- A TOOLKIT FOR MACHINE LEARNING

Machine learning is nothing but a type of artificial intelligence which enables


computers to learn the data without help of any explicit programs. Machine
learning systems crawl through the data to find the patterns and, when these
are found, adjust the program’s actions accordingly.
Data mining analyses the data from different perspectives and summarises it
into parcels of useful information. The machine learning method is similar to
data mining. The difference is that data mining systems extract the data for
human comprehension. Data mining uses machine language to find valuable
information from large volumes of data.

WEKA:

Weka is data mining software that uses a collection of machine learning


algorithms. These algorithms can be applied directly to the data or called from
the Java code.
Weka is a collection of tools for:

● Regression
● Clustering
● Association
● Data pre-processing
● Classification
● Visualisation

FEATURES OF WEKA:

1. Platform Independent
2. Open source and free
3. Different Machine Learning Algorithms for Data Mining
4. Easy to use
5. Data Preprocessing tools
6. Flexibility for scripting experiments
7. Graphical user interface

WEKA’S APPLICATION INTERFACE:

There are totally five application interfaces available for Weka. When we open
Weka, it will start the ​Weka GUI Chooser​ screen from where we can open the
Weka application interface.

WEKA DATA FORMATS:


Weka uses the Attribute Relation File Format for data analysis, by default. But
listed below are some formats that Weka supports, from where data can be
imported:

● CSV
● ARFF
● Database using ODBC

Attribute Relation File Format (ARFF):​ This has two parts:


1) The header section defines the relation (data set) name, attribute name and
the type.
2) The data section lists the data instances.
An ARFF file requires the declaration of the relation, attribute and data. Below
figure is an example of an ARFF file.
· ​@relation:​ This is the first line in any ARFF file, written in the header section,
followed by the relation/data set name. The relation name must be a string and
if it contains spaces, then it should be enclosed between quotes.
· ​@attribute:​ These are declared with their names and the type or range in the
header section. Weka supports the following data types for attributes:

● Numeric
● <nominal-specification>
● String
● date
● @data – Defined in the Data section followed by the list of all data
segments
WEKA EXPLORER:

The Weka Explorer is illustrated in the below figure and contains a total of six
tabs.
The tabs are as follows.
1) ​Preprocess:​ This allows us to choose the data file.
2) ​Classify:​ This allows us to apply and experiment with different algorithms on
preprocessed data files.
3) ​Cluster:​ This allows us to apply different clustering tools, which identify
clusters within the data file.
4) ​Association:​ This allows us to apply association rules, which identify the
association within the data.
5) ​Select attributes:​ These allow us to see the changes on the inclusion and
exclusion of attributes from the experiment.
6) ​Visualize:​ This allows us to see the possible visualisation produced on the
data set in a 2D format, in scatter plot and bar graph output.
The user cannot move between the different tabs until the initial preprocessing
of the data set has been completed.

Preprocessing:​ Data preprocessing is a must. There are three ways to inject


the data for preprocessing:

● Open File – enables the user to select the file from the local machine
● Open URL – enables the user to select the data file from different
locations
● pen Database – enables users to retrieve a data file from a database
source

Classification:​ To predict nominal or numeric quantities, we have classifiers in


Weka. Available learning schemes are decision-trees and lists, support vector
machines, instance-based classifiers, logistic regression and Bayes’ nets. Once
the data has been loaded, all the tabs are enabled. Based on the requirements
and by trial and error, we can find out the most suitable algorithm to produce
an easily understandable representation of data.

Clustering:​ The cluster tab enables the user to identify similarities or groups of
occurrences within the data set. Clustering can provide data for the user to
analyse. The training set, percentage split, supplied test set and classes are
used for clustering, for which the user can ignore some attributes from the data
set, based on the requirements. Available clustering schemes in Weka are
k-Means, EM, Cobweb, X-means and Farthest First.

Association:​ The only available scheme for association in Weka is the Apriori
algorithm. It identifies statistical dependencies between clusters of attributes,
and only works with discrete data. The Apriori algorithm computes all the rules
having minimum support and exceeding a given confidence level.

Attribute selection:​ Attribute selection crawls through all possible


combinations of attributes in the data to decide which of these will best fit the
desired calculation—which subset of attributes works best for prediction. The
attribute selection method contains two parts.

● Search method:​ Best-first, forward selection, random, exhaustive,


genetic algorithm, ranking algorithm
● Evaluation method:​ Correlation-based, wrapper, information gain,
chi-squared
All the available attributes are used in the evaluation of the data set by
default. But it enables users to exclude some of them if they want to.

Visualization:​ The user can see the final piece of the puzzle, derived
throughout the process. It allows users to visualize a 2D representation of data
and is used to determine the difficulty of the learning problem. We can visualize
single attributes (1D) and pairs of attributes (2D), and rotate 3D visualizations
in Weka. It has the Jitter option to deal with nominal attributes and to detect
‘hidden’ data points.

IMPLEMENTATION OF WEKA SOFTWARE:

Weka is an open-source software solution developed by the international


scientific community and distributed under the free GNU GPL license.
The software is fully developed using the Java programming language. It is
expected that the source data are presented in the form of a feature matrix of
the objects. Weka provides access to SQL databases using Java Database
Connectivity (JDBC) and allows using the response for an SQL query as the
source of data. This tool doesn’t support processing of related charts; however,
there are many tools allowing combining separate charts into a single chart,
which can be loaded right into Weka.

NATIVE REGRESSION TOOLS OF WEKA:


Weka has a large number of regression and classification tools. Native packages
are the ones included in the executable Weka software, while other non-native
ones can be downloaded and used within Weka environment. Among the native
packages, the most famous tool is the M5p model tree package. Some of the
regression tools are:

● M5Rules (M5' algorithm presented in terms of mathematical function without


a tree)
● Decision stump (same as M5' but with a single number output in each node)
● M5P (splitting domain into successive binary regions and then fit linear
models to each tree node)
● Random Forest (several model trees combined)
● Rep Tree (several model trees combined)
● ZeroR (the average value of outputs)
● Decision Rules (splits data into several regions based on a single
independent variable and provides a single output value for each range)
● Linear Regression
● SMOreg (support vector regression)
● Simple Linear Regression (uses an intercept and only 1 input variable for
multivariate data)
● Multilayer Perceptron (neural network)
● Gaussian Processes

You might also like