Chp8 (Topic Not in Book) - ClassificationPrediction+Issues
Chp8 (Topic Not in Book) - ClassificationPrediction+Issues
Databases are rich with hidden information that can be used for intelligent decision making.
Classificationand prediction are two forms of data analysis that can be used to extract
models describing important data classes or to predict future data trends. Such analysis
can help provide us with a better understanding ofthe data at large Whereas classif
cation predicts categorical (discrete, unordered)
labels, prediction models continuous
valued functions. For example, we can build a classification model to categorize bank
loan applications as either safe or risky, or aprediction model to predict the expenditures
in dollarsof potential customerson computer equipment given their income and occu
pation. Many classification and prediction methods have been proposed by researchers
in machine learning, pattern recognition, and statistics. Most algorithms are memory
resident, typically assuming a small data size. Recent data mining research has built on
such work, developing scalable classification and prediction techniques capable of han
dling largedisk-resident data.
In this chapter, you will learn basic techniques for data classification, such as how to
build decision tree classifiers, Bayesian classifiers, Bayesian belief networks, and rule
based classifiers. Backpropagation (a neural network technique) is also discussed, in
addition to a more recent approach to classification known as support vector machines.
Classification based on association rule mining is explored. Other approaches to classifi
cation, such as k-nearest-neighbor classifiers, case-based reasoning, genetic algorithms,
rough sets,and fuzzy logic techniques, are introduced.Methods for prediction, including
linear regression, nonlinear regression, and other regression-based models,are briefly
discussed. Where applicable, you will learn about extensions to these techniques for their
application toclassiñication and prediction in large databases. Classification and predic
tion have numerous applications, including fraud detection, target marketing, perfor
mance prediction, manufacturing, and medical diagnosis.
analysis to help guess whether a customer with a given profile will buy a new
Computoneofer.
Amedical researcher wants to analyze breast cancer data in order to predictwhich
three specific treatments a patient should receive. In each of these examples, the data a.
ysis task is classification, where a model or classifier is constructed to predict
labels, such as"safe" or"risky" for the loan application categorical
data; "yes" or"no" forthe marke:
ing data; or "treatment A, " treatment B," or "treatment C" for the medical data Th
categories can betepresented by discrete values) where the ordering among values has no
meaning, For example,the values 1, 2, and 3may be used to represent treatments A, B
and C, where there is no ordering implied among this group of treatment regimes.
Suppose that the marketing manager would like to predict how much a given cus.
tomer willspend during a sale at AllElectronics. This data analysis task is an example of
numeric prediction, where the model constructed predicts acontinuous-valued function,
or ordered value,as opposed to acategorical label. This model is a predictor. Regression
analysis is astatistical mgthodoloy that is most often used fornumeric prediction, hence
the two terms are ofeh used synonymously. We do not treat the two terms as synonyms,
however, because several other methods can be used for numericprediction, as we shall
see later in this chapter. Classification and numeric prediction are the two major types of
prediction problems. For simplicity, when there is no ambiguity, we will use the short
ened term of prediction to refer to numeric prediction.
"How does classification work? Data classificationçs atwo-step process, as shown for
the loan application data of Figure 6.1. (The data are simplified for illustrative pur
poses. In reality,we may expect many more attributes to be considered.) In the first step.
a classifier is built describing a predetermined set of data classes or concepts. This ia
the learning step (or training phase), where a classification algorithm builds the cas
sifier byanalyzing or "learning from" atraining set made up of database tuples and the
associated class labels. A tuple, X, is represented by an n-dimensional attribute vecto,
X = (X1 X2,...,), depicting n measurements made on the tuple from n database
attributes, respectively, A1, A2,..., A,.'Each tuple, X, is assumed to belong to a preo
fined class as determined by another database attribute called the class label attribue
The class label attribute is discrete-valued and unordered. It is categorical in that ead
valueserves as a category or class. The individual tuples making up the training set
referred toas training tuples and are selected from the database under analysis. In
context of clasification, data tuples can be referred to as samples, examples, instante
data points,or objects.?
Because the class label of each training tuple is provided, this step is also know
supervised learning (.e., the learning of the classifier is "supervised" in that it is
feature
Each attribute represents a"feature" of X. Hence, the pattern recognition literature uses theterm) the
vector rather than attribute vector. Since our discussion is from adatabase perspective, we proposefont:
term "attribute vector." In our notation, any variable representing a vector is shown in bolditalic
measurements depicting the vector are shown in italic font, e.g., X = (x|, X2, X3). samples
'In the machine learning literature, training tuples are commonly referred to as. training theme
Throughout this text, we prefer to use the term tuples instead of samples, since we discuss the
of classifcation from a database-oriented
perspective.
ciseet valcl
e
6.1 What Is Classification? What Is Prediction? 287
Classification algorithm
Training data
income loan_decision
name age
Classification rules
name
age income loan _decision (John Henry, middle_aged, low)
Juan Bello Loan decision?
senior low safe
Sylvia Crest middle_aged low risky
Anne Yee middle_aged high safe
(b) risky
Figure 6.1 The data process: (a) Learning: Training data are analyzed by a classification
classification
3orithm. Here, the class label attribute is loan_decision, and the learned model or classifier is
represented in the form of classification rules.(b) Classification: Test data are usedto estimate
the
accuracyto the
be applied
of the classification rules. .fthe accuracy is considered acceptable, the rules can
classification of new datatuples.
to which class each training tuple belongs). It contrasts with unsupervised learning(or
clustering),
or set of in which the class label ofeach training tuple is not known, and the number
classes to be Tearned may not be known in advance. For example, if we did not
have the loan_decision data available for the training set, we could use clusteringto try to
288 Chapter 6 Classification and Prediction
determine "groups of like tuples," which may Correspond to risk groupswithin the lo
appication data. Clustering is thetopic of Chapter 7.
This fhrst step of the classification process can also be viewed as the learning of a
ping or function, y= f(X).,that can predict the associáted class label y of a map-
I. In this view, we wish to learn a given tuple
mapping or function that separates the data classe
Typically,
or
this mapping is represented in the form of classification rules, deçision tre
mathematical formulae. In our example, the apping is répresented as classification
Fules thatLderttty toan applications as being either safe or risky (Figure 6.1(a)). The rules
can be uaed´to categofize futedata tuples, as well as provide deeper insight into the
database contents. They also provide a çompressed representation of the data.
"What about classification accutacy?Inthe second step (Figure 6.1(b)),the model is
used for classification. Firfth predictive accuracy of the classifier is estimated. Ifwe were
touse the training set to measure ihe accuracy of the clasifier, this estimate would likely
beoptimistic, because the classifier tends to overfit the data (i.e., during learning it may
incorporate some particular anomaties of the training datathat are not present in the gen
eral data set overall). Therefore, a test set is used, made up of test tuples and their asso
ciated class labels. These tuples are randomly selected from the general data set. They are
independent of the training tuples, meaning that they are not used to construct the clas
sifier.
The accuracy of a classifier on a given test set is the percentage of test set tuples that
are correctly cdassified by the classifier. The associatedclass label of each test tuple is com
pared with the learned classifier's class prediction for that tuple. Section 6.13 describes
several methods for estimating classifier accuracy. If the accuracy of the classifier is con
sidered acceptable, the classifier can be used to classify future data tuples for which the
class label is not known. (Such data are also referred to in the machine learning literature
as "unknown" or "previously unseen" data.) For example, the classification rules learned
in Figure 6.l(a) from the analysis of datafrom previous loan applications can be used to
approve or reject new or future loan applicants.
"How is (numeric) prediction different from classification?" Data prediction is atwo
step process, similar to that of data classification as described in Figure 6.1. Howevek.
for prediction, we lose the terminology of "lass label attribute" because the attribute
for which values are beingpredicted is continuous-valued (odered) rather than cate
gorical (discrete-valued and-unordered). The attribute can be referred to simply as te
predicted attribute. Suppose that, in our example, we instead wanted to predict the
armount (in dollars) that would be "safe" for the bank to loan an applicant. The da
mining task becomes prediction, rather than classification.We would replace the cat;
gorical attribute, loan decision, with the continuous-valued loan amount as the predicte
attribute, and build a predictor for our task.
X
Note that prediction can also be viewed as a mapping or function, y =f(X), wherew
isthe input (e.g., a tuple describing a loan applicant), and the output y is a continuou
We could also use this term for classification, although for that task the term "class label attribute"is
more descriptive.
6.2 Issues Regarding Classification and Prediction 289
orderedI value (such asthe predicted amount that the bank can safely loan the applicant);
to
That is, we wish learn a mapping
or function that models the relationship between
X and v.
Prediction and classification also differ in the methods that are used to build their
respective models. As with classification, the training set usedto build a predictor should
not be used to assess its accuracy. An independent test set should be used instead. The
2cCuracy of a predictor is estimated by computing an error based on the difference
between the predicted value and theactual known value ofy for each of the test tuples, X.
There are various predictor error measures (Section 6.12.2). General methods for error
estimation are discussed in Section 6.13.
Ideally, the time spent on relevance analysis, when added to the time spent on learnins
from the resulting "reduced" attribute (or feature) subset, should be less than the ti
that would have been spent on learning from the original set of attributes. Hence, s1uck
analysis can help improve classification efficiency and scalability.
Data transformation and reduction: The data may be transformed by normalization
particularly when neural networks or methods involvingdistance measurements are
used in the learning step. Normalization involves scaling all values for agiven attribute
sothat they fall within i speched range, such as -1.0to 1.0, or 0.0 to 1.0. In
methods that use distance measurements, for example, this would prevent attributes
with initiallylarge ranges (like, say, incone) from outweighing attributes with initialy
smaller ranges (such as binaryattributes).
Thedata can also be transformed by generalizing it to higher-level concepts. Concept
hierarchies may be used for this purpos, This 1s particularlyuseful for continuous
valued attributes. For example, numericvalues for the attribute income can be gener
alized to discrete ranges, such as low, medium, and high. Similarly, categorical
attributes, like street, can be generalized to higher-level concepts, like city. Because
generalization compresses the original training data, fewer input/output operations
may be involved during learning.
Data can also be reduced by applying many other methods, ranging from wavelet
transformation and principle components analysis to discretizationtechniques,such
as binning, histogram analysis, and clustering.
Data cleaning, relevance analysis (in the form of correlation analysis and attribute
subset selection),and data transformation are described in greater detail in Chapter 2of
this book.
sed: This refers to the computational costs involved in generating and using the
predictor.
given classifier or
predictions
Robustness: This is the ability of theclassifier or predictortomake correct
missing values.
given noisy data or data with
predictor efficiently
Scalability: This refers to the ability to construct the classifier or
data.
given large amounts of
Interpretability:This refers to the levelof understandingand insight that is provided
classifier or predictor. Interpretability is subjective and therefore more diffi
by the this area, such as the extraction of
classi
in
cult to assess. We discuss some work
classifier called backpropagation
fication rules from a "black box" neural network
(Section 6.6.4).
are discussed throughout the chapter with respect to the various classifi
issues
These
prediction methods presented. Recent data mining research has contributed
cation and
scalable algorithms for classification and prediction. Additional
o the development of exploration of mined "associations" between attributes and
Contributions include the
effective classification. Model selection is discussed in Section 6.15.
Lner use for
age?
senior
youth middle _aged
credit_rating?
student? yes
excellent
no
yes fair
yes
yes no
customner at AlIElectronics