Decision Tree For The Weather Forecasting
Decision Tree For The Weather Forecasting
ABSTRACT 2. DATAMINING
Predicting the classification of data in a suitable class is a The manual extraction of patterns from data has been done by
challenging task. It depends on various factors to predict the human being since centuries ago and it was like cleaning of
dependent variables. Since decision tree evaluation can be Augustean stable. Information technology revolution has
quantified and it is simple to use, so a model using decision tree dramatically increased data collection, storage and manipulation
has been proposed by the author to predict the event like fog, ability, this lead to increase data set size and complexity.
rain and thunder by inputting average temperature, humidity and As data sets have grown in size and complexity, the need of
pressure. Which can be used by farmers or by peoples of all special tools like neural networks, cluster analysis, genetic
walk of life in taking the intelligent decisions. This model can be algorithms , decision trees and support vector machines
used in machine learning and further the proposed model has emerged for the analysis of data. Data mining is the process of
scope for improvement as more and more relevant attributes applying these methods with the intention of uncovering hidden
can be used in predicting the dependent variable. Decision patterns in large data sets[22]. It uses the statistical methods
tree(Decision stump) has been implemented in Weka to and artificial intelligence algorithms in indexing and storing of
facilitate the forecasting of weather.. data bases so that information retrieved from it can be
rationalized with an efficiency. Knowledge discovery in
Keywords databases field is concerned with the development of methods
Decision tree, Data mining, Classification, Genetic algorithm. and techniques for making sense of data.[23] . KDD refers to
the overall process of discovering useful knowledge from data
1. INTRODUCTION and data mining refers to a particular step in the KDD process.
Classification is an utmost important task in data mining for the Data mining is the application of specific algorithms for
purpose of machine learning. First of all we create a model than extracting patterns from the huge data[23]. Following are the
model is trained by a sample of data called training set . A goals of data mining.
trained model is provided with unseen data called test set in
predicting the future event classification with an accuracy[10].
2.1Classification[25]
Few of the classification tasks has been in use since long, like In the data analysis , it is essential to put the instances in a
classifying a tumor cells as benign or malignant, Classifying desired class. It categorizes the instance in a particular category.
credit card transactions as legitimate or fraudulent, Classifying The ability of a classifier refers to the ability to correctly
secondary structures of protein as alpha-helix, beta-sheet, or classifying the unseen data in a class. The bagging and boosting
random coil, Categorizing news stories as finance, are the techniques for improving the classification accuracy.
entertainment, sports. Following classification methods has Bagging improves generalization performance by reducing
been used commonly depending on their forte. variance of base classifier. If a base classifier is unstable
,bagging helps to reduce the error associated with the random
Decision Tree based Methods
fluctuation in training data .In boosting each classifier is
Rule-based Methods dependent on the previous one and focuses on previous error by
giving them more weights. Following methods are being used
Memory based reasoning in classification.
Neural Networks 2.1.1 Rule based methods[25]
Naïve Bayes and Bayesian Belief Networks Data mining system learns from examples. It formulates
classification rules in order for the prediction of future. For
Support Vector Machines instance, in customer database in a bank, a query is made
Rest of the paper is organized as follows whether a new customer applying for a loan is a good
investment? Typical rule are as follows which may be produced
Section 2 covers Data mining. by rule based systems.
Section 3 covers Decision tree. if STATUS = married and INCOME > 10000 and
HOUSE_OWNER =yes
Section 4 covers Analysis .
then INVESTMENT_TYPE = good.
Section 5 covers Conclusion. 2.1.2 Neural Network
Section 6 covers References. Neural network can be used in the classification purpose. They
simulate the human brain . Artificial Neuron can be supervised
or unsupervised. They are composed of many units called
neuron. Artificial neuron require long training time and are black
31
International Journal of Computer Applications (0975 – 8887)
Volume 76– No.2, August 2013
box which lacks explanation, but it has high tolerance to noisy decision trees is an example of a greedy algorithm, and it is by far
data so it can classify untrained data [25]. the most common strategy for learning decision trees from data
in data mining. In TDIDT systems machine learning can be
2.1.3 Bayesian classification classified on the basis of following[2]
Bayesian classification predicts class membership using Bayes
Learning strategy used.
theorem, which further uses probability. Its performance is
.Representation of the knowledge acquired by the
comparable to selected neural network and decision tree. They
system.
can facilitate decision making even on computational intractable
problems[25]. The application domain of the system.
Decision trees can be described as the combination of
2.1.4 Support Vector Machine[25] mathematical and computational techniques to aid the
description, categorization and generalization of a given set of
Support vector machine can classify both linear and non linear
data to facilitate the machine learning. In decision tree dependent
data..Data from two classes are separated by hyper plane,
variable is predicted from the independent variable. Decision
Support vector machine finds the hyper plane by using training
trees used in data mining are generally of following types[16].
data. Its training is slow but accuracy is very high and SVM can
Classification tree analysis is used in prediction of data
model non linear problems also.
in a class.
2.1.5 Genetic Algorithm[25] Regression tree analysis is required in prediction of
independent variable as a unit of number (e.g. the
Genetic algorithm has taken a queue from the natural evolution. price of a house, or a patient’s length of stay in a
Initial population is created using randomly generated rules. hospital).
Each rule is represented by a string of bits. In next generation, The Classification And Regression Tree (CART) analysis is a
survival of the fittest selects the fittest rules. Crossover and term commonly used to refer to both of the above
mutation are used in production of offspring. In cross over procedures [4]. Trees used for regression and classification
substring of a rule are exchanged with substring of another rule. resembles in procedure but differs at procedures of splitting a
In mutation randomly selected bits are inverted. It being an node. Decision tree learning is the construction of a decision tree
iterative purpose, a rule will get position in next generation, if it from class-labeled training data. A decision tree is a flow-chart,
crosses a threshold. Genetic algorithm can be used in where each internal node denotes a test on an attribute and each
classification besides optimization purpose. branch represents the outcome of a test, and each leaf node holds
2.1.6 Case Based Reasoning a class label or the prediction. The topmost node in a tree is the
root node as in tree.
Case based reasoning stores the old instances in a database to There are many specific decision-tree algorithms. Few of them
classify the unseen instances as equal to stored instance, if it are enumerated as follows[16].
does not exist than it search for another very similar ID3 (Iterative dichotomiser3)
instance[25]. C4.5 (successor of ID3)
2.2 Association[25] CART (Classification And Regression Tree)
CHAID (CHI-squared Automatic Interaction Detector).
Rules that associate one attribute of a relation to another MARS( extends decision trees to better handle
attribute approaches are the most efficient means of discovering numerical data.)
such rules like in supermarket database. If a certain percentage Decision tree splits the attributes by using greedy search that
of all the records that contain items A and B also contain item C optimizes on certain criterion. Test conditions are specified
.the specific percentage of occurrences is the confidence factor depending on the attributes types whether it is nominal, ordinal or
of the rule . Association rule mining is useful in mining single continuous. Determining the best split remains an issue. Greedy
dimensional Boolean association rule from the transactional method advocates that nodes with homogeneous class distribution
databases, it can be further extended for mining multilevel rule are preferred so there is a need to measure a method of node
from the transactional databases[25]. impurity. Node impurity is measured by the following.
Gini index.
2.3 Sequence/Temporal Entropy.
Sequential pattern functions identifies the collections of related Misclassification error.
records and detects frequently occurring pattern over a period of
time under study .Difference between sequence rules and other 3.1 Gini index[19]
rules is the temporal factor. For example - Retailers database
can be used to discover the set of purchases that frequently
Used by the CART as acronym of classification and regression
precedes the purchase of a microwave oven or harvesting tree algorithm. Gini impurity is a measure of how often a
season. randomly chosen element from the set would be incorrectly
3. DECISION TREE labeled if it were randomly labeled according to the distribution
of labels in the subset. Gini impurity can be computed by
Decision tree learning is a method commonly used in data summing the probability of each item being chosen times the
mining. The goal is to create a model that predicts the value of a probability of a mistake in categorizing that item. It reaches its
target parameter based on several input parameter. A tree can be minimum value zero when all cases in the node fall into a single
made to learn by splitting the source data set into subsets based target category.
on an attribute value test[16]. This process is repeated on each
derived subset in a recursive manner called recursive partitioning.
The recursion is completed when the subset at a node has all the
same value of the target variable or when splitting no longer adds
(1)
value to the predictions. This process of top-down induction of
32
International Journal of Computer Applications (0975 – 8887)
Volume 76– No.2, August 2013
For example one decision classifies 1 instance in class c1 and removed by using different test set from the training set to get
rest 5 instances are classified in c2 then probability p(c1)= 1/6 the best pruned tree[25].The bagging and boosting are the
and p (c2)=5/6. Then techniques for improving the classification accuracy. Bagging
improves generalization performance by reducing variance of
base classifier. If a base classifier is unstable ,bagging helps to
(2)
reduce the error associated with the random fluctuation in
training data .In boosting each classifier is dependent on the
After finding the above value, sort the values of attributes and
previous one and focuses on previous error by giving them more
find the Gini index of each values of the attributes and choose the
weights[25].
position of split at the least Gini index.
4. ANALYSIS
3.2 Information gain[19] In this study data is taken for one year from the
http:///www.wundergrounds.com.
Entropy at a given node t is denoted by formula A training set for the 64 instances was prepared from the dataset
.Only three parameters has been taken into account(average temp,
average humidity and sea level). Then 72 test instances are
prepared by taking data randomly. Weka data mining tools has
(3) been downloaded from
http:///www.kaz.dl.sourceforge.net/project/weka/weka3-6-
Where as p( j | t) is the relative frequency of class j at node t. It windows-jre/3.6.9/weka-3-6-9jre.exe for the analysis purpose of
Measures homogeneity of a node. Where as Maximum (log n) predicting an event by decision tree on the basis of above
occurs when records are equally distributed among all classes mentioned parameters. Data cleaning was done so that classes of
implying least information. and Minimum as 0.0 when all events are not complex. Result shows that out of 72 test
records belong to one class, implying most information. For instances, 46 tests were classified properly which gave an kappa
example a decision is to be made and it splits one instance in statistics of .0584. Results can be further improved by taking
class c1 and 5 instances in class c2.then entropy is calculated as more attributes in the model and increasing the training set data.
follows P(C1) = 1/6 P(C2) = 5/6.
Entropy = – (1/6) log2 (1/6) – (5/6) log2 (1/6) = 0.65bits Table 1. Confusion Matrix.
A B C D E F Classified As
Then Information Gain in the form of gain split is calculated as 46 0 0 0 1 0 A=Fog
follows 0 0 0 0 0 0 B=Null
22 0 0 0 3 0 C=Rain
0 0 0 0 0 0 D=Thunderstorm
0 0 0 0 0 0 E=Rain with other event
(4) 0 0 0 0 0 F=Fog with other event
33
International Journal of Computer Applications (0975 – 8887)
Volume 76– No.2, August 2013
[6] Friedman, J. H. (1999). Stochastic gradient [15 Anurag Srivastava et al. Parallel formulation of decision
boosting. Stanford University. trees and classification algorithms,Kluwer academic
publishers.
[7] Hastie, T., Tibshirani, R., Friedman, J. H. (2001). The
elements of statistical learning : Data mining, inference, and [16] pages downloaded from
prediction. New York: Springer Verlag. http://en.wikipedia.org/wiki/Decision_tree_learning
[8] Rodriguez et al. (2006), Rotation forest : A new classifier [17] Varun Chandola, Shyam Boriah, and Vipin Kumar,(2009) In
ensemble method, IEEE Transactions on Pattern Analysis Proceedings of SIAM Data Mining Conference, Sparks.
and Machine Intelligence, 28(10).
[18] Breiman, L. (1996). Bagging Predictors. "Machine
[9] Pages downloaded from Learning, 24": pp. 123-140.
http://www.anderson.ucla.edu/faculty/jason.frand/teacher/te
chnologies/palace/datamining.htm [19] Pages downloaded from http://www-
users.cs.umn.edu/~kumar/dmbook/dmslides/chap4_basic_cl
[10] Pages downloaded from http://www- assification.pdf
users.cs.umn.edu/~kumar/papers/papers.html
[20] Anurag Srivastava,E hong Han,Vipin Kumar,Vineet singh
[11] Hyafil, Laurent; Rivest, RL (1976). "Constructing Optimal Kluwer academic publishers, Boston,’parallel formulation of
Binary Decision Trees is NP-complete". Information decision tree- classification algorithms.
Processing Letters 5 (1): 15–17. doi:10.1016/0020-
0190(76)90095-8. [21] pages downloaded from
http://citeseer.ist.psu.edu/oliver93decision.html
[12] Shyam Boriah, Varun Chandola and Vipin
Kumar,(2008),Similarity measure of categorical datas, In [22] Kantardzic, Mehmed (2003). Data Mining: Concepts,
Proceedings of SIAM Data Mining Conference, Atlanta, Models, Methods, and Algorithms. John Wiley &
GA. Sons. ISBN 0-471-22852-4. OCLC 50055336.
[13] Principles of Data Mining.( 2007). doi:10.1007/978-1- [23 Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic
84628-766-4. ISBN 978-1-84628-765-7. Smyth,(1996) ,”To Knowledge Discovery in Databases” 6,
American Association for Artificial Intelligence. AI
[14] Horváth, Tamás; Yamamoto, Akihiro, eds. (2003). Inductive Magazine Volume 17 Number 3.
Logic Programming. Lecture Notes in Computer
Science 2835.doi:10.1007/b13700. ISBN 978-3-540-20144- [24] pages downloaded from
1. http://www.cs.ccsu.edu/~markov/ccsu_courses/DataMining-
1.html
[25] Jiawei Han and Michelin Kamber ,“(2006) Data Mining
Concepts and Techniques”,2nd edition, Morgan Kaufman.
34
IJCATM : www.ijcaonline.org