0% found this document useful (0 votes)
226 views63 pages

Weka Data Mining Overview and Techniques

Weka is an open source data mining software written in Java. It contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. Weka can import data from files or databases and contains filters for tasks like discretization, normalization, and attribute selection. It builds models using classification algorithms like decision trees, neural networks, and SVM. Results can be evaluated using cross-validation in the Explorer GUI.

Uploaded by

princessdiaress
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
226 views63 pages

Weka Data Mining Overview and Techniques

Weka is an open source data mining software written in Java. It contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. Weka can import data from files or databases and contains filters for tasks like discretization, normalization, and attribute selection. It builds models using classification algorithms like decision trees, neural networks, and SVM. Results can be evaluated using cross-validation in the Explorer GUI.

Uploaded by

princessdiaress
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Introduction to Weka

Xingquan (Hill) Zhu

Slides copied from Jeffrey Junfeng Pan (UST)


Outline
Weka
Data Source
Feature selection
Model building
Classifier / Cross Validation
Result visualization
WEKA
[Link]
Data mining software in Java
Open source software

UCI Data Repository


[Link]
[Link]
Explorer: pre-processing the data

Data can be imported from a file in various


formats: ARFF, CSV, C4.5, binary
Data can also be read from a URL or from an
SQL database (using JDBC)
Pre-processing tools in WEKA are called
filters
WEKA contains filters for:
Discretization, normalization, resampling, attribute
selection, transforming and combining attributes,
WEKA only deals with flat files
@relation heart-disease-simplified

@attribute age numeric


@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}

@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
WEKA only deals with flat files
@relation heart-disease-simplified

@attribute age numeric


@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}

@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
Explorer: attribute selection
Panel that can be used to investigate which (subsets of)
attributes are the most predictive ones
Attribute selection methods contain two parts:
A search method: best-first, forward selection, random,

exhaustive, genetic algorithm, ranking


An evaluation method: correlation-based, wrapper,

information gain, chi-squared,


Very flexible: WEKA allows (almost) arbitrary combinations of
these two
Explorer: building classifiers

Classifiers in WEKA are models for predicting


nominal or numeric quantities
Implemented learning schemes include:
Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes nets,
Meta-classifiers include:
Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning,
Problem with Running Weka
Problem : Out of memory for large data set

Solution : java -Xmx1000m -jar [Link]


Outline
Weka
Data Source
Feature selection
Model building
Classifier / Cross Validation
Result visualization

You might also like