Data Base Management Key Points
Data Base Management Key Points
roll no:45
DWDM ASSIGNMENT
WEKA:
● Regression
● Clustering
● Association
● Data pre-processing
● Classification
● Visualisation
FEATURES OF WEKA:
1. Platform Independent
2. Open source and free
3. Different Machine Learning Algorithms for Data Mining
4. Easy to use
5. Data Preprocessing tools
6. Flexibility for scripting experiments
7. Graphical user interface
There are totally five application interfaces available for Weka. When we open
Weka, it will start the Weka GUI Chooser screen from where we can open the
Weka application interface.
● CSV
● ARFF
● Database using ODBC
● Numeric
● <nominal-specification>
● String
● date
● @data – Defined in the Data section followed by the list of all data
segments
WEKA EXPLORER:
The Weka Explorer is illustrated in the below figure and contains a total of six
tabs.
The tabs are as follows.
1) Preprocess: This allows us to choose the data file.
2) Classify: This allows us to apply and experiment with different algorithms on
preprocessed data files.
3) Cluster: This allows us to apply different clustering tools, which identify
clusters within the data file.
4) Association: This allows us to apply association rules, which identify the
association within the data.
5) Select attributes: These allow us to see the changes on the inclusion and
exclusion of attributes from the experiment.
6) Visualize: This allows us to see the possible visualisation produced on the
data set in a 2D format, in scatter plot and bar graph output.
The user cannot move between the different tabs until the initial preprocessing
of the data set has been completed.
● Open File – enables the user to select the file from the local machine
● Open URL – enables the user to select the data file from different
locations
● pen Database – enables users to retrieve a data file from a database
source
Clustering: The cluster tab enables the user to identify similarities or groups of
occurrences within the data set. Clustering can provide data for the user to
analyse. The training set, percentage split, supplied test set and classes are
used for clustering, for which the user can ignore some attributes from the data
set, based on the requirements. Available clustering schemes in Weka are
k-Means, EM, Cobweb, X-means and Farthest First.
Association: The only available scheme for association in Weka is the Apriori
algorithm. It identifies statistical dependencies between clusters of attributes,
and only works with discrete data. The Apriori algorithm computes all the rules
having minimum support and exceeding a given confidence level.
Visualization: The user can see the final piece of the puzzle, derived
throughout the process. It allows users to visualize a 2D representation of data
and is used to determine the difficulty of the learning problem. We can visualize
single attributes (1D) and pairs of attributes (2D), and rotate 3D visualizations
in Weka. It has the Jitter option to deal with nominal attributes and to detect
‘hidden’ data points.