GroupAssignment KnowledgeDiscovery TEB1213 Sept2022
GroupAssignment KnowledgeDiscovery TEB1213 Sept2022
PREPARED BY:
Data mining is a powerful technique that was developed to greatly assist businesses in focusing
on the most important data in their data center. DM tools can forecast future trends and
behaviors, as well as provide answers to questions that would normally take too long to answer.
Data mining tools are among the many analytical tools used for data analysis. It allows users to
classify, summarize, and perform various analyses on data from various perspectives.
WEKA
WEKA is a Java-based system that includes a number of machine learning and data
mining methods that are commonly used for data categorization, clustering, association
rule analysis, and assessment. WEKA provides a user interface that allows users to
directly apply data mining techniques to datasets or to insert custom Java code specific
to their project within WEKA.
1.0 Description
Firstly, we chose to investigate a dataset pertaining to the total number of graduates
from 2016 to 2020. The information comes from the Malaysia Open Data Portal's official
website. According to the data from that portal, the total amount has graduated with the
age group from 25 to 34. There are also displays the education level, which is diploma
and degree level, as well as the gender for each graduate, which is male and female.
The reason we chose this dataset is that we can easily analyze the data from each
attribute. We chose two classification methods from this dataset and will show the
results here. Following that, we will compare performance based on the experiment
results.
2.0 Implementation
Education
Sex Age Group Year No. of
level
graduates
Male Degree 25 - 34 2016 399.1
Male Diploma 25 - 34 2016 405
Femal Degree 25 - 34 2016 576.7
e
Femal Diploma 25 - 34 2016 466.5
e
Male Degree 25 - 34 2017 447.5
Male Diploma 25 - 34 2017 410.2
Femal Degree 25 - 34 2017 640.4
e
Femal Diploma 25 - 34 2017 470.2
e
Male Degree 25 - 34 2018 438.7
Male Diploma 25 - 34 2018 439.4
Femal Degree 25 - 34 2018 669.9
e
Femal Diploma 25 - 34 2018 486.9
e
Male Degree 25 - 34 2019 472.5
Male Diploma 25 - 34 2019 440.7
Femal Degree 25 - 34 2019 677.5
e
Femal Diploma 25 - 34 2019 496.4
e
Male Degree 25 - 34 2020 467
Male Diploma 25 - 34 2020 466.5
Femal Degree 25 - 34 2020 697.9
e
Femal Diploma 25 - 34 2020 513.3
e
Male Degree 35 - 44 2016 268
Male Diploma 35 - 44 2016 212.8
Femal Degree 35 - 44 2016 300.4
e
Femal Diploma 35 - 44 2016 196.1
e
Male Degree 35 - 44 2017 291.5
Male Diploma 35 - 44 2017 224.8
Femal Degree 35 - 44 2017 319.1
e
Femal Diploma 35 - 44 2017 229.1
e
Male Degree 35 - 44 2018 316.3
Male Diploma 35 - 44 2018 258.3
Femal Degree 35 - 44 2018 357.6
e
Femal Diploma 35 - 44 2018 256.4
e
Male Degree 35 - 44 2019 334.8
Male Diploma 35 - 44 2019 260.5
Femal Degree 35 - 44 2019 402.1
e
Femal Diploma 35 - 44 2019 276.2
e
Male Degree 35 - 44 2020 367.5
Male Diploma 35 - 44 2020 283.2
Femal Degree 35 - 44 2020 429.8
e
Femal Diploma 35 - 44 2020 299
e
Male Degree greater than or equal 2016 297.3
45
Male Diploma greater than or equal 2016 191.3
45
Femal Degree greater than or equal 2016 182.8
e 45
Femal Diploma greater than or equal 2016 157.6
e 45
Male Degree greater than or equal 2017 325.2
45
Male Diploma greater than or equal 2017 211.3
45
Femal Degree greater than or equal 2017 204.1
e 45
Femal Diploma greater than or equal 2017 168.5
e 45
Male Degree greater than or equal 2018 346.5
45
Male Diploma greater than or equal 2018r 209
45
All the data above is the information about number of graduates by the category, age and
gender in Malaysia from 2016 to 2020.
Education No. of
Sex Age Group Year
level graduates
Male Degree 25 - 34 2016 399
Femal Degree 25 - 34 2016 577
e
Male Diploma 25 - 34 2016 405
Femal Diploma 25 - 34 2016 467
e
Male Degree 35 - 44 2016 268
Femal Degree 35 - 44 2016 300
e
Male Diploma 35 - 44 2016 213
Femal Diploma 35 - 44 2016 196
e
Male Degree greater than or equal 2016 297
45
Femal Degree greater than or equal 2016 183
e 45
Male Diploma greater than or equal 2016 191
45
Femal Diploma greater than or equal 2016 158
e 45
Male Degree less than or equal 24 2016 60
Femal Degree less than or equal 24 2016 90
e
Male Diploma less than or equal 24 2016 218
Femal Diploma less than or equal 24 2016 260
e
Male Degree 25 - 34 2017 448
Femal Degree 25 - 34 2017 640
e
Male Diploma 25 - 34 2017 410
Femal Diploma 25 - 34 2017 470
e
Male Degree 35 - 44 2017 292
Femal Degree 35 - 44 2017 319
e
Male Diploma 35 - 44 2017 225
Femal Diploma 35 - 44 2017 229
e
Male Degree greater than or equal 2017 325
45
Femal Degree greater than or equal 2017 204
e 45
Male Diploma greater than or equal 2017 211
45
Femal Diploma greater than or equal 2017 169
e 45
Male Degree less than or equal 24 2017 59
Femal Degree less than or equal 24 2017 98
e
Male Diploma less than or equal 24 2017 229
Femal Diploma less than or equal 24 2017 274
e
Male Degree 25 - 34 2018 439
Femal Degree 25 - 34 2018 670
e
Male Diploma 25 - 34 2018 439
Femal Diploma 25 - 34 2018 487
e
Male Degree 35 - 44 2018 316
Femal Degree 35 - 44 2018 358
e
Male Diploma 35 - 44 2018 258
Femal Diploma 35 - 44 2018 256
e
Male Degree greater than or equal 2018 347
45
Femal Degree greater than or equal 2018 219
e 45
Male Diploma greater than or equal 2018 209
45
The goal of one rule is to state the decision rules for a feature. As a result, it chooses precisely
one feature and one or more feature values for that feature to categorise data instances,
acting as a simple classifier, the data collected in the WEKA format has been shown the
results in the training and testing sets to obtain the comparison between the displayed
correlation.
Naïve bayes
Conversely, Nave Bayes is a machine learning model classifier that is used to distinguish
between multiple objects based on specified qualities listed. We can see how the total
amount has graduated with the age group from 25 to 34 using the few attributes recorded in
this report.
4.0 Comparison of methods in table or graph
In summary, the table above compares using Nave Bayes and 1R. 1R showed a value of 64/80
correct instances, whereas Nave Bayes produced a value of 52/60 correct instances. Each
method produced a different result on classification