0% found this document useful (0 votes)
18 views

Chp8 (Topic Not in Book) - ClassificationPrediction+Issues

Uploaded by

deepanshub.cs.21
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Chp8 (Topic Not in Book) - ClassificationPrediction+Issues

Uploaded by

deepanshub.cs.21
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

00

Classification and Prediction

Databases are rich with hidden information that can be used for intelligent decision making.
Classificationand prediction are two forms of data analysis that can be used to extract
models describing important data classes or to predict future data trends. Such analysis
can help provide us with a better understanding ofthe data at large Whereas classif
cation predicts categorical (discrete, unordered)
labels, prediction models continuous
valued functions. For example, we can build a classification model to categorize bank
loan applications as either safe or risky, or aprediction model to predict the expenditures
in dollarsof potential customerson computer equipment given their income and occu
pation. Many classification and prediction methods have been proposed by researchers
in machine learning, pattern recognition, and statistics. Most algorithms are memory
resident, typically assuming a small data size. Recent data mining research has built on
such work, developing scalable classification and prediction techniques capable of han
dling largedisk-resident data.
In this chapter, you will learn basic techniques for data classification, such as how to
build decision tree classifiers, Bayesian classifiers, Bayesian belief networks, and rule
based classifiers. Backpropagation (a neural network technique) is also discussed, in
addition to a more recent approach to classification known as support vector machines.
Classification based on association rule mining is explored. Other approaches to classifi
cation, such as k-nearest-neighbor classifiers, case-based reasoning, genetic algorithms,
rough sets,and fuzzy logic techniques, are introduced.Methods for prediction, including
linear regression, nonlinear regression, and other regression-based models,are briefly
discussed. Where applicable, you will learn about extensions to these techniques for their
application toclassiñication and prediction in large databases. Classification and predic
tion have numerous applications, including fraud detection, target marketing, perfor
mance prediction, manufacturing, and medical diagnosis.

What Is Classification? What Is Prediction?


Abank loans officer needs analysis of her data in order to learn which loan applicants are
"safe" and which are"risky" for the bank. Amarketing manager at AllElectronics needs data
285
286 Chapter 6 Classification and Prediction

analysis to help guess whether a customer with a given profile will buy a new
Computoneofer.
Amedical researcher wants to analyze breast cancer data in order to predictwhich
three specific treatments a patient should receive. In each of these examples, the data a.
ysis task is classification, where a model or classifier is constructed to predict
labels, such as"safe" or"risky" for the loan application categorical
data; "yes" or"no" forthe marke:
ing data; or "treatment A, " treatment B," or "treatment C" for the medical data Th
categories can betepresented by discrete values) where the ordering among values has no
meaning, For example,the values 1, 2, and 3may be used to represent treatments A, B
and C, where there is no ordering implied among this group of treatment regimes.
Suppose that the marketing manager would like to predict how much a given cus.
tomer willspend during a sale at AllElectronics. This data analysis task is an example of
numeric prediction, where the model constructed predicts acontinuous-valued function,
or ordered value,as opposed to acategorical label. This model is a predictor. Regression
analysis is astatistical mgthodoloy that is most often used fornumeric prediction, hence
the two terms are ofeh used synonymously. We do not treat the two terms as synonyms,
however, because several other methods can be used for numericprediction, as we shall
see later in this chapter. Classification and numeric prediction are the two major types of
prediction problems. For simplicity, when there is no ambiguity, we will use the short
ened term of prediction to refer to numeric prediction.
"How does classification work? Data classificationçs atwo-step process, as shown for
the loan application data of Figure 6.1. (The data are simplified for illustrative pur
poses. In reality,we may expect many more attributes to be considered.) In the first step.
a classifier is built describing a predetermined set of data classes or concepts. This ia
the learning step (or training phase), where a classification algorithm builds the cas
sifier byanalyzing or "learning from" atraining set made up of database tuples and the
associated class labels. A tuple, X, is represented by an n-dimensional attribute vecto,
X = (X1 X2,...,), depicting n measurements made on the tuple from n database
attributes, respectively, A1, A2,..., A,.'Each tuple, X, is assumed to belong to a preo
fined class as determined by another database attribute called the class label attribue
The class label attribute is discrete-valued and unordered. It is categorical in that ead
valueserves as a category or class. The individual tuples making up the training set
referred toas training tuples and are selected from the database under analysis. In
context of clasification, data tuples can be referred to as samples, examples, instante
data points,or objects.?
Because the class label of each training tuple is provided, this step is also know
supervised learning (.e., the learning of the classifier is "supervised" in that it is
feature
Each attribute represents a"feature" of X. Hence, the pattern recognition literature uses theterm) the
vector rather than attribute vector. Since our discussion is from adatabase perspective, we proposefont:
term "attribute vector." In our notation, any variable representing a vector is shown in bolditalic
measurements depicting the vector are shown in italic font, e.g., X = (x|, X2, X3). samples
'In the machine learning literature, training tuples are commonly referred to as. training theme
Throughout this text, we prefer to use the term tuples instead of samples, since we discuss the
of classifcation from a database-oriented
perspective.
ciseet valcl
e
6.1 What Is Classification? What Is Prediction? 287

Classification algorithm

Training data

income loan_decision
name age

Sandy Jones young low risky


Bill Lee young low risky
Caroline Fox middle_aged high safe
Rick Field middle _aged low risky
Susan Lake senior low safe Classification rules
senior medium safe
Claire Phips
Joe Smith middle_aged high safe
IF age = youth THEN loan _decision = risky
IF incone = high THEN loan_decision = safe
IF age = middle agedAND income = low
THEN loan _decision = risky
(a

Classification rules

Test data New data

name
age income loan _decision (John Henry, middle_aged, low)
Juan Bello Loan decision?
senior low safe
Sylvia Crest middle_aged low risky
Anne Yee middle_aged high safe

(b) risky

Figure 6.1 The data process: (a) Learning: Training data are analyzed by a classification
classification
3orithm. Here, the class label attribute is loan_decision, and the learned model or classifier is
represented in the form of classification rules.(b) Classification: Test data are usedto estimate
the
accuracyto the
be applied
of the classification rules. .fthe accuracy is considered acceptable, the rules can
classification of new datatuples.
to which class each training tuple belongs). It contrasts with unsupervised learning(or
clustering),
or set of in which the class label ofeach training tuple is not known, and the number
classes to be Tearned may not be known in advance. For example, if we did not
have the loan_decision data available for the training set, we could use clusteringto try to
288 Chapter 6 Classification and Prediction

determine "groups of like tuples," which may Correspond to risk groupswithin the lo
appication data. Clustering is thetopic of Chapter 7.
This fhrst step of the classification process can also be viewed as the learning of a
ping or function, y= f(X).,that can predict the associáted class label y of a map-
I. In this view, we wish to learn a given tuple
mapping or function that separates the data classe
Typically,
or
this mapping is represented in the form of classification rules, deçision tre
mathematical formulae. In our example, the apping is répresented as classification
Fules thatLderttty toan applications as being either safe or risky (Figure 6.1(a)). The rules
can be uaed´to categofize futedata tuples, as well as provide deeper insight into the
database contents. They also provide a çompressed representation of the data.
"What about classification accutacy?Inthe second step (Figure 6.1(b)),the model is
used for classification. Firfth predictive accuracy of the classifier is estimated. Ifwe were
touse the training set to measure ihe accuracy of the clasifier, this estimate would likely
beoptimistic, because the classifier tends to overfit the data (i.e., during learning it may
incorporate some particular anomaties of the training datathat are not present in the gen
eral data set overall). Therefore, a test set is used, made up of test tuples and their asso
ciated class labels. These tuples are randomly selected from the general data set. They are
independent of the training tuples, meaning that they are not used to construct the clas
sifier.
The accuracy of a classifier on a given test set is the percentage of test set tuples that
are correctly cdassified by the classifier. The associatedclass label of each test tuple is com
pared with the learned classifier's class prediction for that tuple. Section 6.13 describes
several methods for estimating classifier accuracy. If the accuracy of the classifier is con
sidered acceptable, the classifier can be used to classify future data tuples for which the
class label is not known. (Such data are also referred to in the machine learning literature
as "unknown" or "previously unseen" data.) For example, the classification rules learned
in Figure 6.l(a) from the analysis of datafrom previous loan applications can be used to
approve or reject new or future loan applicants.
"How is (numeric) prediction different from classification?" Data prediction is atwo
step process, similar to that of data classification as described in Figure 6.1. Howevek.
for prediction, we lose the terminology of "lass label attribute" because the attribute
for which values are beingpredicted is continuous-valued (odered) rather than cate
gorical (discrete-valued and-unordered). The attribute can be referred to simply as te
predicted attribute. Suppose that, in our example, we instead wanted to predict the
armount (in dollars) that would be "safe" for the bank to loan an applicant. The da
mining task becomes prediction, rather than classification.We would replace the cat;
gorical attribute, loan decision, with the continuous-valued loan amount as the predicte
attribute, and build a predictor for our task.
X
Note that prediction can also be viewed as a mapping or function, y =f(X), wherew
isthe input (e.g., a tuple describing a loan applicant), and the output y is a continuou

We could also use this term for classification, although for that task the term "class label attribute"is
more descriptive.
6.2 Issues Regarding Classification and Prediction 289

orderedI value (such asthe predicted amount that the bank can safely loan the applicant);
to
That is, we wish learn a mapping
or function that models the relationship between
X and v.
Prediction and classification also differ in the methods that are used to build their
respective models. As with classification, the training set usedto build a predictor should
not be used to assess its accuracy. An independent test set should be used instead. The
2cCuracy of a predictor is estimated by computing an error based on the difference
between the predicted value and theactual known value ofy for each of the test tuples, X.
There are various predictor error measures (Section 6.12.2). General methods for error
estimation are discussed in Section 6.13.

Issues Regarding Classification and Prediction


6.2 This section describes issues regarding preprocessing the data for classification and pre
diction. Criteria for the comparison and evaluation of classification methods are also
described.

6.2.1 Preparing the Data for Classification and Prediction


Ihe following preprocessing steps may be applied to the data to help improve the accu
racy, efficiency, and scalability of the classification or prediction process.
Data cleaning: This refers to the preprocessing of data in order to remove or reduce
nO1se (by applying smoothing techniques, for example) and the treatment of missing
Values (e.g., by replacing a missing value with the most commonly occurring value
for that attribute, or with the most probable value based on statistics). Although most
classification algorithms have some mechanisms for handling noisy or missing data,
this step can help reduce confusion during learning.
Relevance analysis: Many of the attributes in the data may be redundant. Correla
tion analysis can be used to identify whether any two given attributes are statistically
elated. For example, a strong correlation between attributes A and A would sug
gest that one of the two could be removed from further analysis,Adatabase may also
contain irrelevant attributes. Attribute subset selection can be used in these cases
to find a reduced set of atributes such that the resulting probatbility distribution of
the data classes is as close as possible to the original distribution obtained using all
altributes. Hence, relevance analysis, in the form of correlation analysis and attribute
subset selection, çan be used to detect attributes that do not contribute to the classi-
Hcation or prediction task. Incuding such attributes may otherwise slow down, and
possibly mislead, the learning step.

machine learning. this is known as feature subset selection.


290 Chapter 6Classification and Prediction

Ideally, the time spent on relevance analysis, when added to the time spent on learnins
from the resulting "reduced" attribute (or feature) subset, should be less than the ti
that would have been spent on learning from the original set of attributes. Hence, s1uck
analysis can help improve classification efficiency and scalability.
Data transformation and reduction: The data may be transformed by normalization
particularly when neural networks or methods involvingdistance measurements are
used in the learning step. Normalization involves scaling all values for agiven attribute
sothat they fall within i speched range, such as -1.0to 1.0, or 0.0 to 1.0. In
methods that use distance measurements, for example, this would prevent attributes
with initiallylarge ranges (like, say, incone) from outweighing attributes with initialy
smaller ranges (such as binaryattributes).
Thedata can also be transformed by generalizing it to higher-level concepts. Concept
hierarchies may be used for this purpos, This 1s particularlyuseful for continuous
valued attributes. For example, numericvalues for the attribute income can be gener
alized to discrete ranges, such as low, medium, and high. Similarly, categorical
attributes, like street, can be generalized to higher-level concepts, like city. Because
generalization compresses the original training data, fewer input/output operations
may be involved during learning.
Data can also be reduced by applying many other methods, ranging from wavelet
transformation and principle components analysis to discretizationtechniques,such
as binning, histogram analysis, and clustering.
Data cleaning, relevance analysis (in the form of correlation analysis and attribute
subset selection),and data transformation are described in greater detail in Chapter 2of
this book.

6.2.2 Comparing Classification and Prediction Methods


Classification and prediction methods can be compared and evaluated according to the
following criteria:
Cor-
c
Accuracy: The accuracy of a classifier refers to the ability of a given classifier toclass
rectly predict the class label of new or previously unseen data (i.e., tuples withoutgivel
label information). Similarly, the accuracy of a predictor refers to how wellaunsel
predictor can guess the value of the predicted attribute for new or previously using
data. Accuracy mneasures are given in Section 6.12. Accuracy can be estimated
one or more testsels that are independent of the training set: EstÉmation techniques
such as coss-validationandhootstrapping, are described in Section 6.13. Strategics
for improving the accuracy of aimodel are given in Section 6. 14. Because theaccurac)
new
on
computed only an estimate of how well the classifier or predictor
is will do Thisís
data tuples, confidence limits can be computed to help gauge this estimnate.
discussed in Section 6.15.
-nor
clant 6.3 Classification by Decision Tree.nduction 291

sed: This refers to the computational costs involved in generating and using the
predictor.
given classifier or
predictions
Robustness: This is the ability of theclassifier or predictortomake correct
missing values.
given noisy data or data with
predictor efficiently
Scalability: This refers to the ability to construct the classifier or
data.
given large amounts of
Interpretability:This refers to the levelof understandingand insight that is provided
classifier or predictor. Interpretability is subjective and therefore more diffi
by the this area, such as the extraction of
classi
in
cult to assess. We discuss some work
classifier called backpropagation
fication rules from a "black box" neural network
(Section 6.6.4).

are discussed throughout the chapter with respect to the various classifi
issues
These
prediction methods presented. Recent data mining research has contributed
cation and
scalable algorithms for classification and prediction. Additional
o the development of exploration of mined "associations" between attributes and
Contributions include the
effective classification. Model selection is discussed in Section 6.15.
Lner use for

4lassification by Decision Tree.Induction


trees from class-labeledtrainingtuples.
Decision treeiinduction is the learning of decisionwhereeachinternal node (nonleaf node)
leaf
AdecisionItree is aflowchart-l-liketree structure,
representsan outcome of the test, and each
root node.
denotes têston an attribute, eachclass
branch
label. The topmost node in a tree is the
minalhode) holds a

age?

senior
youth middle _aged
credit_rating?
student? yes
excellent
no
yes fair

yes
yes no

customner at AlIElectronics

You might also like