0% found this document useful (0 votes)
29 views

GroupAssignment KnowledgeDiscovery TEB1213 Sept2022

The document describes a dataset pertaining to the total number of graduates in Malaysia from 2016 to 2020. It includes information on education level, gender, age group and year. Two classification methods are applied to the dataset in WEKA and the results are compared.

Uploaded by

Syamil Iman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

GroupAssignment KnowledgeDiscovery TEB1213 Sept2022

The document describes a dataset pertaining to the total number of graduates in Malaysia from 2016 to 2020. It includes information on education level, gender, age group and year. Two classification methods are applied to the dataset in WEKA and the results are compared.

Uploaded by

Syamil Iman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

TEB1213: KNOWLEDGE DISCOVERY

GROUP ASSIGNMENT : SEPTEMBER 2022

LECTURER: MADAM SHAKIRAH

PREPARED BY:

NO NAME STUDENT ID COURSE


1 SYAMIL IMAN BIN SAFARI 18002632 BBM
2 MUHAMED SHAMEER BIN UBAIDUR RAHMAN 20000578 BBM
3 MOHAMAD AFFEQ AZIM BIN MOHD AZLAN 20001420 BBM
Objective: To familiar with WEKA as a tool for data mining and knowledge
discovery

Data mining is a powerful technique that was developed to greatly assist businesses in focusing
on the most important data in their data center. DM tools can forecast future trends and
behaviors, as well as provide answers to questions that would normally take too long to answer.
Data mining tools are among the many analytical tools used for data analysis. It allows users to
classify, summarize, and perform various analyses on data from various perspectives.

WEKA

WEKA is a Java-based system that includes a number of machine learning and data
mining methods that are commonly used for data categorization, clustering, association
rule analysis, and assessment. WEKA provides a user interface that allows users to
directly apply data mining techniques to datasets or to insert custom Java code specific
to their project within WEKA.

1.0 Description
Firstly, we chose to investigate a dataset pertaining to the total number of graduates
from 2016 to 2020. The information comes from the Malaysia Open Data Portal's official
website. According to the data from that portal, the total amount has graduated with the
age group from 25 to 34. There are also displays the education level, which is diploma
and degree level, as well as the gender for each graduate, which is male and female.
The reason we chose this dataset is that we can easily analyze the data from each
attribute. We chose two classification methods from this dataset and will show the
results here. Following that, we will compare performance based on the experiment
results.
2.0 Implementation

Education
Sex Age Group Year No. of
level
graduates
Male Degree 25 - 34 2016 399.1
Male Diploma 25 - 34 2016 405
Femal Degree 25 - 34 2016 576.7
e
Femal Diploma 25 - 34 2016 466.5
e
Male Degree 25 - 34 2017 447.5
Male Diploma 25 - 34 2017 410.2
Femal Degree 25 - 34 2017 640.4
e
Femal Diploma 25 - 34 2017 470.2
e
Male Degree 25 - 34 2018 438.7
Male Diploma 25 - 34 2018 439.4
Femal Degree 25 - 34 2018 669.9
e
Femal Diploma 25 - 34 2018 486.9
e
Male Degree 25 - 34 2019 472.5
Male Diploma 25 - 34 2019 440.7
Femal Degree 25 - 34 2019 677.5
e
Femal Diploma 25 - 34 2019 496.4
e
Male Degree 25 - 34 2020 467
Male Diploma 25 - 34 2020 466.5
Femal Degree 25 - 34 2020 697.9
e
Femal Diploma 25 - 34 2020 513.3
e
Male Degree 35 - 44 2016 268
Male Diploma 35 - 44 2016 212.8
Femal Degree 35 - 44 2016 300.4
e
Femal Diploma 35 - 44 2016 196.1
e
Male Degree 35 - 44 2017 291.5
Male Diploma 35 - 44 2017 224.8
Femal Degree 35 - 44 2017 319.1
e
Femal Diploma 35 - 44 2017 229.1
e
Male Degree 35 - 44 2018 316.3
Male Diploma 35 - 44 2018 258.3
Femal Degree 35 - 44 2018 357.6
e
Femal Diploma 35 - 44 2018 256.4
e
Male Degree 35 - 44 2019 334.8
Male Diploma 35 - 44 2019 260.5
Femal Degree 35 - 44 2019 402.1
e
Femal Diploma 35 - 44 2019 276.2
e
Male Degree 35 - 44 2020 367.5
Male Diploma 35 - 44 2020 283.2
Femal Degree 35 - 44 2020 429.8
e
Femal Diploma 35 - 44 2020 299
e
Male Degree greater than or equal 2016 297.3
45
Male Diploma greater than or equal 2016 191.3
45
Femal Degree greater than or equal 2016 182.8
e 45
Femal Diploma greater than or equal 2016 157.6
e 45
Male Degree greater than or equal 2017 325.2
45
Male Diploma greater than or equal 2017 211.3
45
Femal Degree greater than or equal 2017 204.1
e 45
Femal Diploma greater than or equal 2017 168.5
e 45
Male Degree greater than or equal 2018 346.5
45
Male Diploma greater than or equal 2018r 209
45

All the data above is the information about number of graduates by the category, age and
gender in Malaysia from 2016 to 2020.

After Binning by the boundary

Education No. of
Sex Age Group Year
level graduates
Male Degree 25 - 34 2016 399
Femal Degree 25 - 34 2016 577
e
Male Diploma 25 - 34 2016 405
Femal Diploma 25 - 34 2016 467
e
Male Degree 35 - 44 2016 268
Femal Degree 35 - 44 2016 300
e
Male Diploma 35 - 44 2016 213
Femal Diploma 35 - 44 2016 196
e
Male Degree greater than or equal 2016 297
45
Femal Degree greater than or equal 2016 183
e 45
Male Diploma greater than or equal 2016 191
45
Femal Diploma greater than or equal 2016 158
e 45
Male Degree less than or equal 24 2016 60
Femal Degree less than or equal 24 2016 90
e
Male Diploma less than or equal 24 2016 218
Femal Diploma less than or equal 24 2016 260
e
Male Degree 25 - 34 2017 448
Femal Degree 25 - 34 2017 640
e
Male Diploma 25 - 34 2017 410
Femal Diploma 25 - 34 2017 470
e
Male Degree 35 - 44 2017 292
Femal Degree 35 - 44 2017 319
e
Male Diploma 35 - 44 2017 225
Femal Diploma 35 - 44 2017 229
e
Male Degree greater than or equal 2017 325
45
Femal Degree greater than or equal 2017 204
e 45
Male Diploma greater than or equal 2017 211
45
Femal Diploma greater than or equal 2017 169
e 45
Male Degree less than or equal 24 2017 59
Femal Degree less than or equal 24 2017 98
e
Male Diploma less than or equal 24 2017 229
Femal Diploma less than or equal 24 2017 274
e
Male Degree 25 - 34 2018 439
Femal Degree 25 - 34 2018 670
e
Male Diploma 25 - 34 2018 439
Femal Diploma 25 - 34 2018 487
e
Male Degree 35 - 44 2018 316
Femal Degree 35 - 44 2018 358
e
Male Diploma 35 - 44 2018 258
Femal Diploma 35 - 44 2018 256
e
Male Degree greater than or equal 2018 347
45
Femal Degree greater than or equal 2018 219
e 45
Male Diploma greater than or equal 2018 209
45

Femal Diploma greater than or equal 2018 185


e 45
Male Degree less than or equal 24 2018 59
Femal Degree less than or equal 24 2018 114
e
Male Diploma less than or equal 24 2018 231
Femal Diploma less than or equal 24 2018 267
e
Male Degree 25 - 34 2019 473
Femal Degree 25 - 34 2019 678
e
Male Diploma 25 - 34 2019 441
Femal Diploma 25 - 34 2019 496
e
Male Degree 35 - 44 2019 335
Femal Degree 35 - 44 2019 402
e
Male Diploma 35 - 44 2019 261
Femal Diploma 35 - 44 2019 276
e
Male Degree greater than or equal 2019 362
45
Femal Degree greater than or equal 2019 255
e 45
Male Diploma greater than or equal 2019 246
45
Femal Diploma greater than or equal 2019 201
e 45
Male Degree less than or equal 24 2019 66
Femal Degree less than or equal 24 2019 124
e
Male Diploma less than or equal 24 2019 242
Femal Diploma less than or equal 24 2019 275
e
Male Degree 25 - 34 2020 467
Femal Degree 25 - 34 2020 698
e
Male Diploma 25 - 34 2020 467
Femal Diploma 25 - 34 2020 513
e
Male Degree 35 - 44 2020 368
Femal Degree 35 - 44 2020 430
e
Male Diploma 35 - 44 2020 283
Femal Diploma 35 - 44 2020 299
e
Male Degree greater than or equal 2020 395
45
Femal Degree greater than or equal 2020 304
e 45
Male Diploma greater than or equal 2020 256
45
Femal Diploma greater than or equal 2020 219
e 45
Male Degree less than or equal 24 2020 66
Femal Degree less than or equal 24 2020 101
e
Male Diploma less than or equal 24 2020 234
Femal Diploma less than or equal 24 2020 258
e
The minimum and maximum values in a certain bin are known as the bin boundaries when
smoothing by bin boundaries is used. The nearest boundary value is then used to replace each
bin value, which is what we performed with the data we had gathered. We smooth it in order
to analyze the frequency of numerical data that has been categorized to encompass a range of
potential values.

After discretization (year & no of graduates)


Education No. of
Sex level Age Group Year graduates
Male Degree 25 - 34 <2018 >303
Female Degree 25 - 34 <2018 >303
Male Diploma 25 - 34 <2018 >303
Female Diploma 25 - 34 <2018 >303
Male Degree 35 - 44 <2018 <303
Female Degree 35 - 44 <2018 <303
Male Diploma 35 - 44 <1018 <303
Female Diploma 35 - 44 <2018 <303
Male Degree greater than or equal 45 <2018 <303
Female Degree greater than or equal 45 <2018 <303
Male Diploma greater than or equal 45 <2018 <303
Female Diploma greater than or equal 45 <2018 <303
Male Degree less than or equal 24 <2018 <303
Female Degree less than or equal 24 <2018 <303
Male Diploma less than or equal 24 <2018 <303
Female Diploma less than or equal 24 <2018 <303
Male Degree 25 - 34 <2018 >303
Female Degree 25 - 34 <2018 >303
Male Diploma 25 - 34 <2018 >303
Female Diploma 25 - 34 <2018 >303
Male Degree 35 - 44 <2018 292
Female Degree 35 - 44 <2018 >303
Male Diploma 35 - 44 <2018 <303
Female Diploma 35 - 44 <2018 <303
Male Degree greater than or equal 45 <2018 >303
Female Degree greater than or equal 45 <2018 <303
Male Diploma greater than or equal 45 <2018 <303
Female Diploma greater than or equal 45 <2018 <303
Male Degree less than or equal 24 <2018 <303
Female Degree less than or equal 24 <2018 <303
Male Diploma less than or equal 24 <2018 <303
Female Diploma less than or equal 24 <2018 <303
Male Degree 25 - 34 ≤2018 >303
Female Degree 25 - 34 ≤2018 >303
Male Diploma 25 - 34 ≤2018 >303
Female Diploma 25 - 34 ≤2018 >303
Male Degree 35 - 44 ≤2018 >303
Female Degree 35 - 44 ≤2018 >303
Male Diploma 35 - 44 ≤2018 <303
Female Diploma 35 - 44 ≤2018 <303
Male Degree greater than or equal 45 ≤2018 >303
Female Degree greater than or equal 45 ≤2018 <303
Male Diploma greater than or equal 45 ≤2018 <303
Female Diploma greater than or equal 45 ≤2018 <303
Male Degree less than or equal 24 ≤2018 <303
Female Degree less than or equal 24 ≤2018 <303
Male Diploma less than or equal 24 ≤2018 <303
Female Diploma less than or equal 24 ≤2018 <303
Male Degree 25 - 34 >2018 >303
Female Degree 25 - 34 >2018 >303
Male Diploma 25 - 34 >2018 >303
Female Diploma 25 - 34 >2018 >303
Male Degree 35 - 44 >2018 >303
Female Degree 35 - 44 >2018 >303
Male Diploma 35 - 44 >2018 <303
Female Diploma 35 - 44 >2018 <303
Male Degree greater than or equal 45 >2018 >303
Female Degree greater than or equal 45 >2018 <303
Male Diploma greater than or equal 45 >2018 <303
Female Diploma greater than or equal 45 >2018 <303
Male Degree less than or equal 24 >2018 <303
Female Degree less than or equal 24 >2018 <303
Male Diploma less than or equal 24 >2018 <303
Female Diploma less than or equal 24 >2018 <303
Male Degree 25 - 34 >2018 >303
Female Degree 25 - 34 >2018 >303
Male Diploma 25 - 34 >2018 >303
Female Diploma 25 - 34 >2018 >303
Male Degree 35 - 44 >2018 >303
Female Degree 35 - 44 >2018 >303
Male Diploma 35 - 44 >2018 <303
Female Diploma 35 - 44 >2018 <303
Male Degree greater than or equal 45 >2018 >303
Female Degree greater than or equal 45 >2018 >303
Male Diploma greater than or equal 45 >2018 <303
Female Diploma greater than or equal 45 >2018 <303
Male Degree less than or equal 24 >2018 <303
Female Degree less than or equal 24 >2018 <303
Male Diploma less than or equal 24 >2018 <303
Female Diploma less than or equal 24 >2018 <303

The technique of discretization allows us to convert continuous variables, models, or


functions into a discrete form. To achieve this, we build a set of contiguous intervals (or bins)
that span the range of the variable, model, or function we want to study. In order to make
the evaluation and management of data easier, we applied the same technique to this set of
data. This technique involves breaking up a large number of data values into smaller ones.
3.0 Results and Analysis
1R

The goal of one rule is to state the decision rules for a feature. As a result, it chooses precisely
one feature and one or more feature values for that feature to categorise data instances,
acting as a simple classifier, the data collected in the WEKA format has been shown the
results in the training and testing sets to obtain the comparison between the displayed
correlation.
Naïve bayes

Conversely, Nave Bayes is a machine learning model classifier that is used to distinguish
between multiple objects based on specified qualities listed. We can see how the total
amount has graduated with the age group from 25 to 34 using the few attributes recorded in
this report.
4.0 Comparison of methods in table or graph

Summary of each methods Naïve Bayes 1R


Correctly Classified Instances 52 65 64 80
Incorrectly Classified Instances 28 35 16 20
Kappa Statistics 0.3 0.6
Mean absolute squared error 0.4416 0.2
Root mean squared error 0.4697 0.4472
Relative mean squared error 88.3293 40
Root relative squared error 93.9405 89.4427
Total number of instances 80 80

In summary, the table above compares using Nave Bayes and 1R. 1R showed a value of 64/80
correct instances, whereas Nave Bayes produced a value of 52/60 correct instances. Each
method produced a different result on classification

You might also like