Usage Apriori and Clustering Algorithms in WEKA Tools To Mining Dataset of Traffic Accidents
Usage Apriori and Clustering Algorithms in WEKA Tools To Mining Dataset of Traffic Accidents
To cite this article: Faisal Mohammed Nafie Ali & Abdelmoneim Ali Mohamed Hamed
(2018) Usage Apriori and clustering algorithms in WEKA tools to mining dataset of
traffic accidents, Journal of Information and Telecommunication, 2:3, 231-245, DOI:
10.1080/24751839.2018.1448205
1. Introduction
There is a significant amount of data stored in the databases, and with the rapid spread of
the data warehouse, it is necessary to find techniques to extract information and knowl-
edge by exploiting these data stored for used in problem-solving and decision-making
using modern computer applications, the current smart technology famous as artificial
intelligence. Data mining is an analytical process that combines artificial intelligence, stat-
istics, and machine learning. It is considered a step of knowledge in databases. Data
mining and machine learning are topics in artificial intelligence that focus on pattern dis-
covery, prediction, and forecasting based on possessions of gathered data (Witten, Frank,
Hall, & Pal, 2016).
CONTACT Faisal Mohammed Nafie Ali [email protected] Department of Computer Science, College of Science and
Humanities at Alghat, Majmaah University, Majmaah 11952, Saudi Arabia
© 2018 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/
licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
232 F. M. NAFIE ALI AND A. A. MOHAMED HAMED
Data mining is repeated process within which progress as the operation is defined by
discovery, through either automatic or manual method. It is potential to put data mining
actions into one of two classes: predictive and descriptive. The function of predictive is
produced the system explained by the given data set. Predictive is generate new, not
trivial information based on the available data collections (Han, Pei, & Kamber, 2011).
Several techniques are using in data mining to extracting data such as R-programing,
SPSS, IBM Clementine, WEKA, Knime, and Orange. Table 1 presents the compression
between several data mining tools and shown advantages and disadvantages of these
tools (Solanki, 2013).
This research aims to suggest an approach for employ association rules mining algor-
ithms and clustering by using data mining tool to offering new rules from a broad set of
discovered rules which taken from Traffic accident data at Alghat Provence in KSA within
four years (1432, 1434, 1435, and 1436).
Clustering is the assignment of appointing a set of items to groups so that the elements
in the same cluster are more like to any other than to these in another one. Clustering is an
essential mission of explorative data mining, and a combined method for statistical data
analysis used in such fields, containing machine learning, pattern recognition, image
analysis, information retrieval, and bioinformatics. It offers the best interface to the user
than comparing the other data mining tools (Han et al., 2011). It is a technique to
group a set of items having similar features.
Association rules applied to find the connection between data items in a transactional
database. Association rules data mining algorithms used to discover frequent association
(Amira, Pareek, & Araar, 2015).
There are many algorithms used to mining data. In this paper, authors attempted to find
the best association rules using WEKA data mining tools. Apriori and cluster are the first-
rate and most famed algorithms. Apriori is the simple algorithm, which applied for mining
of repeated the patterns from the transaction database. The Apriori reaches good perform-
ance by decreasing the size of candidate sets. However, in states with very many recurrent
item sets, large item sets, or little minimum support; it still suffers from the cost of gener-
ating a massive number of candidate sets (Wu et al., 2008). The objective of using Apriori
algorithm is to find frequent itemsets and association between different itemsets, that is,
association rule. Apriori is Easy implementation. The algorithm applies information from
previous steps to produce the frequent itemsets (Shweta & Garg, 2013). Apriori is the
most uncomplicated algorithm, which is employed for mining of repetitive patterns
from the transaction database. We have aimed to execute the Apriori algorithm for ade-
quate study work, and we have applied WEKA for mentioning the process of association
rule mining. The benefits of using Apriori algorithm are usages large item set property.
Easily parallelized, simply and easy to implement, Apriori algorithm is an efficient algor-
ithm for finding all frequent itemsets.
EM algorithm is a general method of finding the maximum likelihood estimate of data
distribution when data is partially missing or hidden (Prajwala & Sangeeta, 2014).The
advantages of using EM algorithm are to give a beneficial result for the real world data
set. Moreover, use this algorithm when you want to perform cluster analysis of a small
scene or region-of-interest, and are not satisfied with the results obtained from the
other algorithms (Sharma, Bajpai, & Litoriya, 2012). EM algorithm is an essential algorithm
for data mining. We used this algorithm when we are satisfied the result of other algor-
ithms methods. EM is chosen to cluster data for the many reasons: first, It has a robust stat-
istical basis. Second, it is linear in database size. Third, it is healthy to noisy data. Fourth, it
can accept the desired number of the cluster as input. Fifth, it can handle high dimension-
ality, and final, it converges fast given a proper initialization (Abbas, 2008), guarantees
about optimality, easily explainable results (Ordonez & Omiecinski, 2002). Also has
several disadvantages, Algorithm is highly involved, it is hard to initialize, and the
quality of the final solution depends on the quality of the initial solution (Slimani &
Lazzez, 2014).
2. WEKA
WEKA term is a set of modern machine learning ways and data pre-handling tools. It is
identified as a set of machine learning approaches for data extraction tasks (Seppelt,
Voinov, & Lange, 2012). It designed so that users can speedily test out existing machine
learning modes on new datasets in very flexible ways (Frank et al., 2009). The workbench
contains techniques for the first data mining troubles: regressions, classification, cluster-
ing, and association rule mining, conception, and attribute selection (Seppelt et al.,
2012). It is an excellent appropriate for improving new machine learning methods (Hall
et al., 2009). The user can access components through JAVA programming or command
line interfaces. It affords graphical user interface in an application named the WEKA Knowl-
edge Flow Environment featuring visual programming and WEKA explorer (Parikh &
Tirkha, 2013).
There are three additional graphical user interfaces to WEKA. The Knowledge Stream
interface allows the user to design configurations for flowed data handling. WEKA’s third
interface, the Experimenter intended to help the user to answer a fundamental practical
question when implementing classification and regression methods: Which techniques
and parameter values work better for the given problem. The fourth interface, so-called
the Workbench, is a unified graphical user interface that combines the other three into
one application (Witten et al., 2016). In this study, we chose WEKA from other software
tools on the market because it is the package that would be recommended for people
234 F. M. NAFIE ALI AND A. A. MOHAMED HAMED
who are beginners to such software to those who are very adept. The software merely is very
robust with built-in features. WEKA is contained many built-in features that require no pro-
gramming or coding knowledge. Then WEKA has become very common with the academic
and industrial researchers, also widely applied for teaching aims. WEKA is better suited for
mining association rules, powerful in machine learning techniques and Suited for
machine learning. WEKA is user-friendly with a graphical interface that allows for fast
setup and operation. WEKA work on the prediction that the user data is gained as a flat
file or relation. In another word, that means each data object described by a stable
number of attributes that ordinarily are of a specified type, normal alpha-numeric or
numeric values (Ramamohan, Vasantharao, Chakravarti, & Ratnam, 2012).
WEKA offers applications of learning algorithms that you can efficiently use to your
dataset. It contains a diversity of tools for converting datasets, such as the algorithms
for discretization and sampling (Witten et al., 2016).
WEKA makes it easy to compare different solution strategies based on the same evalu-
ation method and identify the one that is most appropriate for the problem at hand. It is
implemented in JAVA and runs on almost any computing platform (Frank, Hall, Trigg,
Holmes, & Witten, 2004). WEKA provides applications of learning algorithms that can
quickly implement any dataset. It also contains a diversity of tools for transforming
dataset (Frank et al., 2009). WEKA is an open source software tool for implementing
machine-learning algorithms.
3. Related work
Bansal and Bhambhu (2013) reported that association rule transacts with frequent itemsets
as done by much association algorithms like Apriori algorithm, which used in widely real
vitality applications. In this paper, authors contain the use of association rule mining in
extracting pattern that frequently happened within a dataset and explanation the
implementation of the Apriori algorithm WEKA technique from a dataset, which is gathering
of demeaning crimes against women in Session court. This paper studies two association
rule algorithms Apriori algorithm and Predictive Apriori algorithm and matches the result
of both the algorithms using WEKA tool. Therefore, the result of rules together algorithms
visibly shows that Apriori algorithm achieves better and faster than the Predictive Apriori
algorithm. The study uses a comprehension of recurrent pattern matching based on
support and confidence measures produced excellent results in various fields. The paper
indicates that investigation of repetitive pattern matching based on support and confidence
measures provided excellent results in multiple areas.
Apriori Algorithm can saw as a two-stage operation:
(I) All item sets having support factor greater than or equal to, the user-specified
minimum support.
(II) All rules are having the confidence factor more significant than or similar to the user-
specified minimum confidence.
Research explained that association rule in data mining shows a critical key in the
process of mining data for repeated item sets. Apriori algorithm is applied to find out
JOURNAL OF INFORMATION AND TELECOMMUNICATION 235
and comprehend the underlying patterns involved in the court’s records from their data
contains in various sections.
Amira, Pareek, and Araar (2015) offered association rule-mining algorithms are com-
monly applied to find all rules in the database to pleasing some minimum support and
minimum confidence restrictions. The number of generated rules reduced the adaptation
of the association rule-mining algorithm to mine only a particular subset of association
rules where the classification class attribute is assigned to the right-hand-side was inves-
tigated in past research. In this study, a dataset about traffic accidents was gathered from
Dubai Traffic Department, UAE. After data preprocessing, Apriori and Predictive Apriori
association rules algorithms were applied to the dataset to explore the link between
recorded accidents factors to accident acuity in Dubai area. Two sets of class association
rules were generated using the two algorithms and summarized to get the most interest-
ing rules using technical measures. Experimental results showed that the class association
rules generated by Apriori algorithm were more effective than those generated by Predic-
tive Apriori algorithm were. More associations between accident factors and accident
severity level were explored when applying Apriori algorithm. This paper showed that
when applying rule covers method on the generated class association rules using
Apriori and Predictive Apriori algorithms, many class association rules produced by
Apriori algorithm were eliminated, and more effective rules than those generated by Pre-
dictive Apriori algorithm were obtained.
Shrivastava and Panda (2014) explained there are several algorithms developed to mine
the association rules from the huge databases. Authors offered the Apriori algorithm is the
best common algorithm to mine the association rules from the dataset. Various tools are
existing to execute the Apriori algorithm. WEKA is an open source software tool for imple-
menting machine-learning algorithms. A study defined WEKA is the gathering or a collec-
tion of the implements for execution data mining with the application of the association
rules in it. Association rules formed by analysing data for various samples and using the
standard support and dependability to identify the most important relationships. They
are differed into separate classes in data mining and used in the WEKA to perform the
operations. The result in Apriori algorithm generates the best association rule for the
dataset after operating the WEKA tool. The implementation of Apriori algorithm, it can
be more compatible and purposeful in future, by the implementation of the new associ-
ation algorithms for some other new operations and analysis in this WEKA tool.
Agrawal and Agrawal (2017) explained details description about Analysis of Clustering
Algorithm of WEKA Tools. Paper defined clustering is a method used in several areas such
as image analysis, pattern recognition, and statistical data analysis. Clustering is a partition
of data into sets of similar items. Every cluster contains various items that are analogous to
them and unlike compared to objects of other sets. Some clustering algorithms represent
to produce clusters (Chauhan, Kaur, & Alam, 2010). WEKA tool used to compare different
clustering algorithms. It used because it provides a better interface to the user than
compare to other data mining tools. In this paper, algorithms are analysing and comparing
the various clustering algorithm by using WEKA tool to find out which algorithm will be
more comfortable for the operators for execution clustering algorithm. This present the
applications of data mining WEKA tool it provides the cluster’s huge data set and cluster-
ing that provide making a hand in the optimizing of the search engine.
236 F. M. NAFIE ALI AND A. A. MOHAMED HAMED
(FPTree) was introduced. FP-Growth was then developed based on this data structure and
currently is a benchmarked and fastest algorithm for mining frequent itemset Lee, Kim,
Cai, Han (2003). The benefits of FP-Growth are, it needs two times of scanning the trans-
action database. First, it scans the database to calculate a list of various items sorted by
descending order and eliminates rare items. Then, it scans to compress the database
into an FPTree structure and mines the FP-Tree recursively to construct its conditional
FP-Tree.
Mansouri and Javad Kargar (2014) showed driving accidents had always been counted as
one of the most likely causes of deaths in the societies today. In this study, the rules and
issues motivating the traffic road accidents have been mined along with extracting a
local data model after collecting the data from a diversity of sources followed by data collec-
tion and combination, data cleaning, and separating the inconsistent data. In this study used
data mining methods, such as clustering and decision tree. The objective of this research
was to analyse and monitor the road traffic accidents using the data mining techniques
in suburban roads in Isfahan Province. The obtained results in this study are interesting
and significant which can be considered by authorities as invaluable information to be
used for decreasing the road accidents. Furthermore, five algorithms existing in data
mining was used in this study for knowledge discovery of the accident dataset. The C5.0
decision tree algorithm proved to generate the best results and performance. Later in this
research clustering of the data was also performed but did not result in separation of clusters
with a specific meaning. Based on the clustering results, it can be concluded that each route
follows its particular pattern and differentiating the data concerning time, vehicle, and the
road status is not generalizable to all of the routes. In determining the accident type as
Casualty, fatal, and car crash, the most important characteristic was the type of vehicle.
Verma, Srivastava, Chack, Diswar, and Gupta (2012) presented data clustering is a
manner of setting similar data into groupings. The paper showed that the cluster algor-
ithm divisions a data set into some groups such that the similarity within a group is
larger than among groups. This study revises six types of clustering techniques – k-
means clustering, hierarchical clustering, DBS can clustering, density-based clustering,
optics, EM algorithm. These clustering techniques are implemented and analysed using
a clustering tool WEKA. Performance of the six techniques are obtainable and compared.
The paper presented several indicates: The performance of K-means algorithm is better
than hierarchical clustering algorithm, all the algorithms have some confusion in some
(noisy) data when clustered, K-means and EM algorithm are very sentient for fuss in a
dataset. This noise makes it complex for the algorithm to cluster data into convenient clus-
ters while affecting the outcome of the algorithm, K-means algorithm is faster than other
clustering algorithm and generates property clusters when applying, a hierarchical cluster-
ing algorithm is more sensitive for noisy data.
Prajwala and Sangeeta (2014) demonstrated the two clustering algorithms considered
are EM and density-based algorithm. EM algorithm is a common way of discovering the
maximum likelihood estimation of data distribution when data are lost or concealed. In
density-based clustering, clusters are large areas in the data space, split by sections of
lower object density. This paper showed WEKA an open source tool is used for comparing
these two algorithms. In conditions of likelihood, EM algorithm is better than a density-
based algorithm; the density-based algorithm takes less time than EM algorithm to
build the model.
238 F. M. NAFIE ALI AND A. A. MOHAMED HAMED
Kumar and Rukmani (2010) proposed this research on the web using mining and in par-
ticular, efforts on finding the web usage procedures of websites from the server log files.
The study used Apriori algorithm and Frequent Pattern Growth algorithm for evaluation
memory practice and time usage. The characterize of using Apriori algorithm are Operates
large item set property. It easily parallelized and easy to implement. The research showed
some restrictions of Apriori algorithm. Treating a huge number of applicant sets is costly. It
is tedious to recurrently scan the database and checked a large set of nominees by pattern
identical, which is especially true for mining long patterns. The main distinguishes of the
FP-growth algorithm is usages compressed data structure and rejects recurrent database
scan. The main obstacle of the FP-growth algorithm is the fulminatory amount of lacks a
good candidate generation method. Future research can combine FP-Tree with Apriori
nominee to make way to solve the drawbacks of together Apriori and FP-growth.
Krömer et al. (2013) used to investigate a data set describing traffic accidents in Ethiopia
and use a machine learning method based on artificial evolution and fuzzy systems to
mine symbolic description of selected features of the data set. Paper demonstrated
there are simple fuzzy classifiers as well as complex rule-based fuzzy classification
systems that usually build and maintain sophisticated rule bases. The popularity of
fuzzy classifiers can be attributed to their ability to perform soft classification, to assign
multiple labels to data samples, and to the ease of their interpretation. This study com-
pared the ability of evolutionary fuzzy rules to evolve classifiers for binary and multi-
class attributes. While the rules for a binary attribute were successful, the artificial evol-
ution as implemented in this work was not able to find fuzzy rules that would accurately
classify data according to selected multi-class attributes.
Rai, Verma, and Thoke (2012) defined MSApriori is an association rule mining algorithm
planned to mine frequent itemsets including rare objects and to give better performance in
comparison with approaches that employ single minimum support. In association rule,
mining MSApriori algorithm plays an important role as it considers rare item sets. This
paper proposed a novel approach MSApriori-T algorithm, which uses total support tree
structure to make MSApriori algorithm more efficient. T-tree stores each item in a tree as
nodes and links are available to its child node. To beat the drawback of an MSApriori algor-
ithm that needs high storage requirement and processing time, authors proposed an
approach that combines the MSA prior algorithm with a total support tree storage structure
resulting in a more efficient algorithm in terms of storage requirement and processing time.
4. Methodology
The importance of this research is in suggesting a way using data mining algorithms to
determine the causes of accidents in terms of time, road, driver nationality, and type of
accident from a large set of discovered rules extracted from Alghat traffic accidents real
data. This study was based on traffic accident data which taken from public traffic depart-
ment in Alghat Provence in KSA within four years (1432, 1434, 1435, and 1436). One of the
main obstacles, which researchers faced when collecting data from traffic department that
information of the accident in traffic registration form is incomplete. For this reason, many
variables had been neglected from the analysis such as the driver age, driver health status,
driver behaviour, and weather state. WEKA tools used for preprocessing and analysing
data. In WEKA, we implemented two tools, Apriori algorithm in association rules and EM
JOURNAL OF INFORMATION AND TELECOMMUNICATION 239
clustering algorithm. A comparison between these algorithms (Ariori and clustering) were
made to discover the factors, which caused accidents.
Figure 1 shows an Attributes relationship File Format (ARFF) for the traffic accident
dataset after converted it from excel file. The header of the data is started with the
name of the relationship (traffic), and a block knows the attributes (year, type of accident,
location, number of vehicles, driver, injured, and death). Also, the @data line includes the
values entries for each attribute. It is prepared dataset in Attribute relation format file to
execute in WIKA interface. ARFF format just gives a dataset; it cannot appoint which of
the attributes the one that is supposed to be predicted. It can be applied to locate different
algorithms used in WEKA software.
In this part Figure 2 displays ARFF file for the traffic accident dataset which pre-proces-
sing in WEKA explorer. The file contains 8 attributes and 946 instances. At this stage, the
data will be ready for mining and extraction information by using various algorithms sup-
ported by WEKA tools.
Figure 3 shows the use of the Apriori algorithm to find best results that have minSup-
port = 0.4 and minimum confidence = 0.9.
Figure 4. Shows the best results obtained by the EM cluster algorithm.
Table 6 represents the number of accidents happened within four years (1432, 1434,
1435, and 1435), the data show that the most accidents and injured occurred in 1434.
The highest death in 1435.
Table 7 represents the summarizes results obtained using the EM clustering algorithm.
Through the results obtained from current study showed that in EM cluster algor-
ithm time taken to build model was (1.58 s), and Log likelihood was (7.46685-). Log-
likelihood here refers to the probability of identifying a correct group of data
elements. The EM algorithm is a general statistical method of maximum likelihood
estimation. EM cluster may converge to a poor locally optimal solution, therefore; it
needs an unknown number of iterations to converge to a good solution (Ordonez
& Omiecinski, 2002). While Appling Apriori algorithm in this research we obtained
for the best result because the Apriori algorithm is an efficient algorithm for
finding all frequent itemsets. Therefore Apriori algorithm more effective better than
the EM cluster algorithm.
6. Conclusion
The aim of the study to present the implementations of the WEKA tools in data mining
techniques. Apriori and cluster algorithms used to discover and concept the underlying
patterns involved in the traffic accident dataset in Alghat Provence. As result of rules of
both algorithms, display that Apriori algorithm performs better and faster than cluster
algorithm. The paper presents Apriori algorithm is a simple and efficient tool to analyse
the dataset. In general, WEKA interface is a very useful tool in data mining, which allows
the user to choose many different algorithm and compare them to reach the accurately
required results.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes on contributors
Faisal Mohammed Nafie Ali received the B.Sc. from Omdurman Ahlia University, Faculty of Applied &
Computer science in Sudan, in 2001. He got M.Sc. and Ph.D. degree in Computer Science in 2009 and
2014 respectively from Alneelain University Faculty of Computer science and Information Technol-
ogy in Sudan. He worked in the field of education as Computer Science teacher in Sudan from
2002 to 2013. He worked as Assistant Professor in Majmaah University, Suadia Arabia from 2014
until now; He worked as Oracle Database Administrator in National Pensions Fund in Sudan from
2007 to 2014. He has an experience in Data mining using WEKA and Clementine. He has many Cer-
tifications in oracle database Administrator.
Abdelmoneim Ali Mohamed Hamed received the B.Sc. and M.Sc.in mathematics in Sudan, Alnileen
University Faculty of science, in 1989 and 2005 respectively. He has received Ph.D. in applied stat-
istics in Sudan, Sudan University for science and technology, 2012. From 1989 to 2008, He worked
in the field of education as a mathematics teacher in Sudan and Saudi Arabia. From 2009 to 2013,
He worked as a lecturer at Al Ahfad University for Girls. From 2014 until now, he worked as Assistant
Professor at Al Majmaah University. He has an experience in statistical analysis using SPSS and WEKA.
244 F. M. NAFIE ALI AND A. A. MOHAMED HAMED
References
Abbas, O. A. (2008). Comparisons between data clustering algorithms. International Arab Journal of
Information Technology (IAJIT), 5(3), 320–325.
Agrawal, R., & Agrawal, J. (2017). Analysis of clustering algorithm of Weka tool on air pollution
dataset. International Journal of Computer Applications, 168(13), 1–5.
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules, In Proc. 20th Int. Conf.
Very Large Data Bases, VLDB, vol. 1215, (pp. 487–499), September.
Amira, A., Vikas, P., & Abdelaziz, A. (2015). Applying Association Rules Mining Algorithms for Traffic
Accidents in Dubai. International Journal of Soft Computing and Engineering (IJSCE), 5(4), 1–12.
Bansal, D., & Bhambhu, L. (2013). Usage of Apriori algorithm of data mining as an application to grie-
vous crimes against women. International Journal of Computer Trends and Technology, 4(19), 3194–
3199.
Chauhan, R., Kaur, H., & Alam, M. A. (2010). Data clustering method for discovering clusters in spatial
cancer databases. International Journal of Computer Applications, 10(6), 9–14.
Frank, E., Hall, M., Holmes, G., Kirkby, R., Pfahringer, B., Witten, I. H., & Trigg, L. (2009). Weka-a machine
learning workbench for data mining. In In data mining and knowledge discovery handbook (pp.
1269–1277). Boston, MA: Springer.
Frank, E., Hall, M., Trigg, L., Holmes, G., & Witten, I. H. (2004). Data mining in bioinformatics using
Weka. Bioinformatics (oxford, England), 20(15), 2479–2481.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data
mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10–18.
Han, J., Pei, J., & Kamber, M. (2011). Data mining: Concepts and techniques. Elsevier.
Krömer, P., Beshah, T., Ejigu, D., Snášel, V., Platoš, J., & Abraham, A. (2013, April). Mining traffic accident
features by evolutionary fuzzy rules. In Computational Intelligence in Vehicles and Transportation
Systems (CIVTS), 2013 IEEE Symposium on (pp. 38–43). IEEE.
Kumar, B. S., & Rukmani, K. V. (2010). Implementation of web usage mining using APRIORI and FP
growth algorithms. International Journal of Advanced Networking and Applications, 1(06), 400–404.
Lee, Y. K, Kim, W. Y, Cai, Y. D, & Han, J. (2003). CoMine: Efficient Mining of Correlated Patterns. In ICDM,
3, 581–584. November.
Mansouri, M., & Javad Kargar, M. (2014). Analysis and monitoring of the traffic suburban road acci-
dents using data mining techniques; a case study of Isfahan Province in Iran. The Open
Transportation Journal, 8(1), 39–49.
Novitasari, W., Hermawan, A., Abdullah, Z., Sembiring, R. W., & Herawan, T. (2015). A method of dis-
covering interesting association rules from student admission dataset. International Journal of
Software Engineering and Its Applications, 9(8), 51–66.
Ordonez, C., & Omiecinski, E. (2002, November). FREM: Fast and robust EM clustering for large data
sets. In Proceedings of the eleventh international conference on Information and knowledge manage-
ment. (pp. 590-599). ACM.
Parikh, D., & Tirkha, P. (2013). Data mining & data stream mining—open source tools. International
Journal of Innovative Research in Science, Engineering and Technology, 2(10), 5234–5239.
Prajwala, T. R., & Sangeeta, V. I. (2014). Comparative analysis of EM clustering algorithm and density
based clustering algorithm using WEKA tool. International Journal of Engineering Research and
Development, 9(8), 19–24.
Rai, D., Verma, K., & Thoke, A. S. (2012). MSApriori using total support tree data structure. International
Journal of Computer Applications, 43(23), 45–49.
Ramamohan, Y., Vasantharao, K., Chakravarti, C. K., & Ratnam, A. S. K. (2012). A study of data mining
tools in knowledge discovery process. International Journal of Soft Computing and Engineering
(IJSCE) ISSN, 2(3), 2231–2307.
Seppelt, R., Voinov, A. A., & Lange, S. (2012). Tools for environmental data mining and intelligent
decision support. Iemss. Org.
Sharma, N., Bajpai, A., & Litoriya, M. R. (2012). Comparison the various clustering algorithms of WEKA
tools. Facilities, 4(7), 78–80.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 245
Shrivastava, A. K., & Panda, R. N. (2014). Implementation of Apriori algorithm using WEKA. KIET
International Journal of Intelligent Computing and Informatics, 1(1), 4.
Shweta, M., & Garg, D. K. (2013). Mining efficient association rules through Apriori algorithm using
attributes and comparative analysis of various association rule algorithms. International Journal
of Advanced Research in Computer Science and Software Engineering, 3(6), 306–312.
Slimani, T, & Lazzez, A. (2014). Efficient analysis of pattern and association rule mining approaches.
Journal of Information Technology and Computer Science (IJITCS), 6(3), 70–81.
Solanki, H. (2013). Comparative study of data mining tools and analysis with unified data mining
theory. International Journal of Computer Applications, 75(16), 23–28.
Steinbach, M., Karypis, G., & Kumar, V. (2000, August). A comparison of document clustering tech-
niques. In KDD workshop on text mining (Vol. 400, No. 1, pp. 525–526).
Tanna, P., & Ghodasara, Y. (2014). Using Apriori with WEKA for frequent pattern mining. arXiv preprint
arXiv:1406.7371.
Verma, M, Srivastava, M, Chack, N, Diswar, A. K, & Gupta, N. (2012). A comparative study of various
clustering algorithms in data mining. International Journal of Engineering Research and
Applications (IJERA), 2(3), 1379–1384.
Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data mining: Practical machine learning tools and
techniques. Cambridge, MA: Morgan Kaufmann.
Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., & Zhou, Z. H. (2008). Top 10 algorithms
in data mining. Knowledge and Information Systems, 14(1), 1–37.