0% found this document useful (0 votes)

56 views

Data Base Management Key Points

Weka is a collection of machine learning algorithms for data mining tasks such as classification, regression, clustering, and association rule mining. It contains tools for data pre-processing, classification, regression, clustering, visualization, and feature selection. Weka also supports various data formats including ARFF, CSV and databases through ODBC. It provides a graphical user interface and Java APIs for incorporating machine learning into applications.

Uploaded by

vishnu vardhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views

Data Base Management Key Points

Uploaded by

vishnu vardhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Name: vishnu vardhan P

roll no:45
DWDM ASSIGNMENT

INTRODUCTION TO WEKA- A TOOLKIT FOR MACHINE LEARNING

Machine learning is nothing but a type of artificial intelligence which enables

computers to learn the data without help of any explicit programs. Machine
learning systems crawl through the data to find the patterns and, when these
are found, adjust the program’s actions accordingly.
Data mining analyses the data from different perspectives and summarises it
into parcels of useful information. The machine learning method is similar to
data mining. The difference is that data mining systems extract the data for
human comprehension. Data mining uses machine language to find valuable
information from large volumes of data.

WEKA:

Weka is data mining software that uses a collection of machine learning

algorithms. These algorithms can be applied directly to the data or called from
the Java code.
Weka is a collection of tools for:

● Regression
● Clustering
● Association
● Data pre-processing
● Classification
● Visualisation

FEATURES OF WEKA:

1. Platform Independent
2. Open source and free
3. Different Machine Learning Algorithms for Data Mining
4. Easy to use
5. Data Preprocessing tools
6. Flexibility for scripting experiments
7. Graphical user interface

WEKA’S APPLICATION INTERFACE:

There are totally five application interfaces available for Weka. When we open
Weka, it will start the Weka GUI Chooser screen from where we can open the
Weka application interface.

WEKA DATA FORMATS:

Weka uses the Attribute Relation File Format for data analysis, by default. But
listed below are some formats that Weka supports, from where data can be
imported:

● CSV
● ARFF
● Database using ODBC

Attribute Relation File Format (ARFF): This has two parts:

1) The header section defines the relation (data set) name, attribute name and
the type.
2) The data section lists the data instances.
An ARFF file requires the declaration of the relation, attribute and data. Below
figure is an example of an ARFF file.
· @relation: This is the first line in any ARFF file, written in the header section,
followed by the relation/data set name. The relation name must be a string and
if it contains spaces, then it should be enclosed between quotes.
· @attribute: These are declared with their names and the type or range in the
header section. Weka supports the following data types for attributes:

● Numeric
● <nominal-specification>
● String
● date
● @data – Defined in the Data section followed by the list of all data
segments
WEKA EXPLORER:

The Weka Explorer is illustrated in the below figure and contains a total of six
tabs.
The tabs are as follows.
1) Preprocess: This allows us to choose the data file.
2) Classify: This allows us to apply and experiment with different algorithms on
preprocessed data files.
3) Cluster: This allows us to apply different clustering tools, which identify
clusters within the data file.
4) Association: This allows us to apply association rules, which identify the
association within the data.
5) Select attributes: These allow us to see the changes on the inclusion and
exclusion of attributes from the experiment.
6) Visualize: This allows us to see the possible visualisation produced on the
data set in a 2D format, in scatter plot and bar graph output.
The user cannot move between the different tabs until the initial preprocessing
of the data set has been completed.

Preprocessing: Data preprocessing is a must. There are three ways to inject

the data for preprocessing:

● Open File – enables the user to select the file from the local machine
● Open URL – enables the user to select the data file from different
locations
● pen Database – enables users to retrieve a data file from a database
source

Classification: To predict nominal or numeric quantities, we have classifiers in

Weka. Available learning schemes are decision-trees and lists, support vector
machines, instance-based classifiers, logistic regression and Bayes’ nets. Once
the data has been loaded, all the tabs are enabled. Based on the requirements
and by trial and error, we can find out the most suitable algorithm to produce
an easily understandable representation of data.

Clustering: The cluster tab enables the user to identify similarities or groups of
occurrences within the data set. Clustering can provide data for the user to
analyse. The training set, percentage split, supplied test set and classes are
used for clustering, for which the user can ignore some attributes from the data
set, based on the requirements. Available clustering schemes in Weka are
k-Means, EM, Cobweb, X-means and Farthest First.

Association: The only available scheme for association in Weka is the Apriori
algorithm. It identifies statistical dependencies between clusters of attributes,
and only works with discrete data. The Apriori algorithm computes all the rules
having minimum support and exceeding a given confidence level.

Attribute selection: Attribute selection crawls through all possible

combinations of attributes in the data to decide which of these will best fit the
desired calculation—which subset of attributes works best for prediction. The
attribute selection method contains two parts.

● Search method: Best-first, forward selection, random, exhaustive,

genetic algorithm, ranking algorithm
● Evaluation method: Correlation-based, wrapper, information gain,
chi-squared
All the available attributes are used in the evaluation of the data set by
default. But it enables users to exclude some of them if they want to.

Visualization: The user can see the final piece of the puzzle, derived
throughout the process. It allows users to visualize a 2D representation of data
and is used to determine the difficulty of the learning problem. We can visualize
single attributes (1D) and pairs of attributes (2D), and rotate 3D visualizations
in Weka. It has the Jitter option to deal with nominal attributes and to detect
‘hidden’ data points.

IMPLEMENTATION OF WEKA SOFTWARE:

Weka is an open-source software solution developed by the international

scientific community and distributed under the free GNU GPL license.
The software is fully developed using the Java programming language. It is
expected that the source data are presented in the form of a feature matrix of
the objects. Weka provides access to SQL databases using Java Database
Connectivity (JDBC) and allows using the response for an SQL query as the
source of data. This tool doesn’t support processing of related charts; however,
there are many tools allowing combining separate charts into a single chart,
which can be loaded right into Weka.

NATIVE REGRESSION TOOLS OF WEKA:

Weka has a large number of regression and classification tools. Native packages
are the ones included in the executable Weka software, while other non-native
ones can be downloaded and used within Weka environment. Among the native
packages, the most famous tool is the M5p model tree package. Some of the
regression tools are:

● M5Rules (M5' algorithm presented in terms of mathematical function without

a tree)
● Decision stump (same as M5' but with a single number output in each node)
● M5P (splitting domain into successive binary regions and then fit linear
models to each tree node)
● Random Forest (several model trees combined)
● Rep Tree (several model trees combined)
● ZeroR (the average value of outputs)
● Decision Rules (splits data into several regions based on a single
independent variable and provides a single output value for each range)
● Linear Regression
● SMOreg (support vector regression)
● Simple Linear Regression (uses an intercept and only 1 input variable for
multivariate data)
● Multilayer Perceptron (neural network)
● Gaussian Processes

Functional Data Analysis With R
100% (1)
Functional Data Analysis With R
338 pages
Data Science Cheatsheet
100% (1)
Data Science Cheatsheet
5 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
DMBI Exp1: Introduction To WEKA Tool
No ratings yet
DMBI Exp1: Introduction To WEKA Tool
6 pages
DWDM LAB MANUAL
No ratings yet
DWDM LAB MANUAL
55 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
No ratings yet
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
23 pages
dwdm_file-final_ver3.pdf_20241230_172003_0000
No ratings yet
dwdm_file-final_ver3.pdf_20241230_172003_0000
54 pages
WEKA Intro
No ratings yet
WEKA Intro
17 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
Introduction To Weka: Xingquan (Hill) Zhu
No ratings yet
Introduction To Weka: Xingquan (Hill) Zhu
63 pages
Data Warehousing Lab Excercise
No ratings yet
Data Warehousing Lab Excercise
45 pages
32013105-BDA LabManual
No ratings yet
32013105-BDA LabManual
122 pages
Introduction To Weka
No ratings yet
Introduction To Weka
39 pages
Weka Installation Steps Final
No ratings yet
Weka Installation Steps Final
7 pages
Weka DW&DM Lab Notes
No ratings yet
Weka DW&DM Lab Notes
37 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
Weka Tutorial
No ratings yet
Weka Tutorial
8 pages
Mooc-on-Weka
No ratings yet
Mooc-on-Weka
59 pages
131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal
No ratings yet
131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal
19 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
Weka Software Manuala
No ratings yet
Weka Software Manuala
20 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
DWDM File
No ratings yet
DWDM File
26 pages
5 MIS510 Weka NetDraw
No ratings yet
5 MIS510 Weka NetDraw
33 pages
Datawarehouse Pract 2
No ratings yet
Datawarehouse Pract 2
7 pages
DMW_LabFile_0901CS243D11_swastik
No ratings yet
DMW_LabFile_0901CS243D11_swastik
25 pages
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
No ratings yet
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
27 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
Bioinformatics: Applications Note
No ratings yet
Bioinformatics: Applications Note
3 pages
aiml manual
No ratings yet
aiml manual
27 pages
DHW Lab (Ex1 To 3)
No ratings yet
DHW Lab (Ex1 To 3)
18 pages
WEKA Lab Record
No ratings yet
WEKA Lab Record
69 pages
Laboratory Manual On: Data Mining
No ratings yet
Laboratory Manual On: Data Mining
41 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
AI-43 Data Mining
No ratings yet
AI-43 Data Mining
96 pages
Dm&pa Lab Manual
No ratings yet
Dm&pa Lab Manual
68 pages
2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
DWM1
No ratings yet
DWM1
19 pages
Wekappt
No ratings yet
Wekappt
58 pages
Data warehousing
No ratings yet
Data warehousing
54 pages
Overview: Data Mining Methods: WEKA: A Machine Learning Toolkit The Explorer
No ratings yet
Overview: Data Mining Methods: WEKA: A Machine Learning Toolkit The Explorer
41 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
50 pages
Weka Lab
No ratings yet
Weka Lab
11 pages
Data Warehousing Lab Exp 1-3
No ratings yet
Data Warehousing Lab Exp 1-3
24 pages
Weka Tutorial
No ratings yet
Weka Tutorial
32 pages
Data Warehousing Laboratory
0% (1)
Data Warehousing Laboratory
28 pages
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
No ratings yet
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
4 pages
DWDM Lab File
No ratings yet
DWDM Lab File
29 pages
Lab Updated - Merged
No ratings yet
Lab Updated - Merged
49 pages
Data Warehousing and Data Mining Lab Manual
0% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
100% (1)
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
8 pages
Lab Manual Format
No ratings yet
Lab Manual Format
37 pages
Data Warehousing and Data Mining Lab Manual
100% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
DMDV_210
No ratings yet
DMDV_210
63 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
CVV Answers
No ratings yet
CVV Answers
13 pages
RTS Module 4
No ratings yet
RTS Module 4
10 pages
Module-1: 1.1 Examples of Real Time Applications
No ratings yet
Module-1: 1.1 Examples of Real Time Applications
16 pages
Level 3 - Verbal Answer Key
No ratings yet
Level 3 - Verbal Answer Key
8 pages
C - Aptitude Book
No ratings yet
C - Aptitude Book
19 pages
IoT Based Health Monitoring System PLAG Report
No ratings yet
IoT Based Health Monitoring System PLAG Report
40 pages
Provisional Certificate For COVID-19 Vaccination - 1 Dose: Beneficiary Details
No ratings yet
Provisional Certificate For COVID-19 Vaccination - 1 Dose: Beneficiary Details
1 page
Ec Imp Questions
No ratings yet
Ec Imp Questions
102 pages
Machine Learning MCQ'S
No ratings yet
Machine Learning MCQ'S
3 pages
C - Aptitude Book
No ratings yet
C - Aptitude Book
19 pages
Top 10 Machine Learning Algorithms With Their Use
100% (1)
Top 10 Machine Learning Algorithms With Their Use
12 pages
A Novel Image Segmentation Algorithm Based On Neutrosophicsimilarity Clustering
No ratings yet
A Novel Image Segmentation Algorithm Based On Neutrosophicsimilarity Clustering
8 pages
Regional Clustering For Developing Electricity Systems
No ratings yet
Regional Clustering For Developing Electricity Systems
14 pages
Crop_Disease_Detection_using_Image_Segmentation
No ratings yet
Crop_Disease_Detection_using_Image_Segmentation
5 pages
The Google Similarity Distance: Rudi L. Cilibrasi and Paul M.B. Vit Anyi
No ratings yet
The Google Similarity Distance: Rudi L. Cilibrasi and Paul M.B. Vit Anyi
16 pages
MCA Syllabus
No ratings yet
MCA Syllabus
69 pages
14.M.E Big Data
No ratings yet
14.M.E Big Data
89 pages
Complete Chapter
No ratings yet
Complete Chapter
6 pages
Educational Data Mining and Its Role in Determining Factors Affecting Students Academic Performance A Systematic Review
No ratings yet
Educational Data Mining and Its Role in Determining Factors Affecting Students Academic Performance A Systematic Review
7 pages
Figure 3.7 Polynomial Model Fitting The Data Perfectly (13) : 3.3.3.1.3. Bias Variance Trade-Off
No ratings yet
Figure 3.7 Polynomial Model Fitting The Data Perfectly (13) : 3.3.3.1.3. Bias Variance Trade-Off
4 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
Nikhil MOOC Report
No ratings yet
Nikhil MOOC Report
16 pages
Lahore University of Management Sciences DISC 420 - Business Analytics Fall Semester 2015
No ratings yet
Lahore University of Management Sciences DISC 420 - Business Analytics Fall Semester 2015
5 pages
Contour and Texture Analysis For Image Segmentation
No ratings yet
Contour and Texture Analysis For Image Segmentation
21 pages
KOE093-DATA-WAREHOUSING-DATA-MINING (1)
100% (1)
KOE093-DATA-WAREHOUSING-DATA-MINING (1)
2 pages
AIML unit 4
No ratings yet
AIML unit 4
26 pages
K-Medoids-Clustering Method
No ratings yet
K-Medoids-Clustering Method
5 pages
Hattarki Project Report CSE 572
No ratings yet
Hattarki Project Report CSE 572
5 pages
Remote Sensing: Mean Shift Segmentation Assessment For Individual Forest Tree Delineation From Airborne Lidar Data
No ratings yet
Remote Sensing: Mean Shift Segmentation Assessment For Individual Forest Tree Delineation From Airborne Lidar Data
19 pages
DAY Course Content Description
No ratings yet
DAY Course Content Description
1 page
Tutorial 4
No ratings yet
Tutorial 4
3 pages
BCM601-Module 1
No ratings yet
BCM601-Module 1
35 pages
2024_Applications of artificial intelligence in the AEC industry a review and future outlook
No ratings yet
2024_Applications of artificial intelligence in the AEC industry a review and future outlook
18 pages
Manoj Intern Data Science
No ratings yet
Manoj Intern Data Science
37 pages
Assignment PGP11170 Prithika Dasgupta
No ratings yet
Assignment PGP11170 Prithika Dasgupta
20 pages
6 WebMining
No ratings yet
6 WebMining
45 pages
Diabetes Prediction Using Machine Learning A Review
No ratings yet
Diabetes Prediction Using Machine Learning A Review
10 pages
Auto_Insurance_Business_Analytics_Approach_for_Cus
No ratings yet
Auto_Insurance_Business_Analytics_Approach_for_Cus
9 pages

Data Base Management Key Points

Uploaded by

Data Base Management Key Points

Uploaded by

Name: vishnu vardhan P

INTRODUCTION TO WEKA- A TOOLKIT FOR MACHINE LEARNING

Machine learning is nothing but a type of artificial intelligence which enables

Weka is data mining software that uses a collection of machine learning

WEKA’S APPLICATION INTERFACE:

WEKA DATA FORMATS:

Attribute Relation File Format (ARFF):​ This has two parts:

Preprocessing:​ Data preprocessing is a must. There are three ways to inject

Classification:​ To predict nominal or numeric quantities, we have classifiers in

Attribute selection:​ Attribute selection crawls through all possible

● Search method:​ Best-first, forward selection, random, exhaustive,

IMPLEMENTATION OF WEKA SOFTWARE:

Weka is an open-source software solution developed by the international

NATIVE REGRESSION TOOLS OF WEKA:

● M5Rules (M5' algorithm presented in terms of mathematical function without

You might also like

Attribute Relation File Format (ARFF): This has two parts:

Preprocessing: Data preprocessing is a must. There are three ways to inject

Classification: To predict nominal or numeric quantities, we have classifiers in

Attribute selection: Attribute selection crawls through all possible

● Search method: Best-first, forward selection, random, exhaustive,