100% found this document useful (1 vote)

145 views

Chapter 4: Classification & Prediction

This document summarizes key aspects of classification and prediction covered in Chapter 4. It discusses basic concepts of classification and prediction as well as various classification methods including decision trees, Bayes classification, rule-based classification, and lazy learners. It covers extracting rules from decision trees, sequential covering algorithms for rule induction, and evaluating rule quality. K-nearest neighbor classifiers are introduced as an example of lazy learning. The document emphasizes choosing the k value in k-NN classifiers to balance overfitting and oversmoothing.

Uploaded by

Ronak Patel

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

145 views

Chapter 4: Classification & Prediction

Uploaded by

Ronak Patel

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Chapter 4: Classification & Prediction

}
}
}
}

4.1 Basic Concepts of Classification and Prediction

4.2 Decision Tree Induction
4.3 Bayes Classification Methods
4.4 Rule Based Classification
4.4.1 The principle
4.4.2 Extracting Rules form a Decision Tree
4.4.3 Rule Induction using a Sequential Covering Algorithm

}
}
}

4.5 Lazy Learners

4.6 Prediction
4.7 How to Evaluate and Improve Classification

4.4.1 The Principle

}

The model learned in Rule-Based classification is represented as

set of IF-THEN rules
IF condition THEN conclusion

Example
R1: IF age=youth AND student=yes THEN buys_computer=yes

Terminology
"

The IF part is know as the rule antecedent or precondition

The THEN part is known as rule consequent

Consists on one or more attributes

Contains a class prediction

If the condition in a rule antecedent holds true we say

The condition is satisfied

The rule covers the tuple

How to Assess the Rules

}

A rule R can be assessed by

"
"

Coverage
Accuracy

Methodology
Tuple X
" Class labeled Data set D
Consider
" Ncovers: the number of tuples covered by R
" ncorrect : the number of tuples correctly classified by R
" |D|: the total number of tuples in D
"

ncovers
coverage( R) =
| D|

ncorrect
accuracy( R) =
ncovers

How To Use Rules for Classification

}

Predict the class label for tuple X

"
"

If a rule is satisfied by X, the rule is said to be triggered

If a rule R is the only one satisfied by X, the rule fires by returning
the class prediction of X

Important

Triggering firing

More than one rule can be satisfied

Problems
"

What if no rule is satisfied by X?

Solution: use a default rule that fires, for example, the most
frequent class

If more than one rule is triggered, what if each rule specifies a

different class?

Conflicting Rules
X(age=youth, student=yes, income=low)
R1: IF age=youth AND student=yes THEN buys_computer=yes
R2: IF income=low THEN buys_computer=no
}

Need a conflict resolution strategy

Size ordering approach

Give priority to the rule having the toughest requirement

Toughness is measured by the rule antecedent size
The triggering rule with the most attribute sets is fired

Rule ordering approach

Prioritize the rules beforehand

Class-based ordering
Rule-based ordering

4.4.2 Rule Extraction from a Decision Tree

}
}
}
}

One rule is created for each path from the root to a leave node
Each splitting criterion along a given path is logically ANDed to form
the rule antecedent (IF part)
The leaf node holds the class prediction (the rule consequent)
Example:
R1: IF age=youth AND student=no
THEN buys_cpmputer=no
R2: IF age=youth AND student=yes
THEN buys_cpmputer=yes
R3: IF age=middle-aged
THEN buys_cpmputer=yes
R4: IF age=senior AND credit_rating=excellent THEN buys_cpmputer=yes
R5: IF age=senior AND credit_rating=fair
THEN buys_cpmputer=no
age?
youth

senior
Middle-aged

student?
no
no

yes
yes

yes

credit rating?
fair

Excellent

yes

Characteristics of Decision Tree Rules

Decision tree rules are mutually exclusive and exhaustive
} Exclusive
"
"

No rule conflict, no two rules triggered for the same tuple

One rule per leaf and any tuple is mapped to only one leaf

Exhaustive
"
"

One rule for each attribute-value combination

The set of rules does not require a default rule

Note: The order of rules does not matter when extracted from a
decision tree
} Pruning rules
Any rule that does not improve accuracy
youth
can be pruned
" Pruning may generates non
student?
Mutually exclusive and non
yes
no
Exhaustive rules:
ye
C4.5 uses class-based ordering no
s
"

age
?
senior
Middle-aged
ye
s

credit rating?
fair

Excellent

ye
s

4.4.3 Sequential Covering Algorithm

}
}

IF-THEN rules are directly extracted from training data

Rules are learned sequentially (one at a time)
"

Note: In decision trees rules are learned simultaneously

Each rule for a given class ideally

covers many tuples of that class and
hopefully no tuples from other classes

youth

age
?

senior

Middle-aged
A

C
B

When a rule is learned, the tuples covered by the rule are removed
(need of accurate rules but not necessarly high coverage)

The process repeats on the remaining tuples until a stopping

condition:
"
"

No tuples left
The quality measure of a rule is below a threshold

How are Rules Learned?

}

In a general-to-specific manner
Example
"

In loan-application data, costumers have (age, income,

education level, residence, credit-rating, and term of the loan )
Two classes: loan_decision=accept and loan_decision=reject

Start with a general rule for class accept:

IF
"
"
"

THEN loan_decision=accept

Consider each possible attribute test that may be added to the rule
Adopt a greedy depth-first strategy choosing the rule with high quality
(use beam search where the k best attributes are maintained)
Repeat the process till the rule reached an acceptable quality level
IF income=high AND credit_rating=execellent
THEN loan_decision=accept

Rule Quality Measures

}

Accuracy seems to be natural as a quality measure, but

a
a

"
"

r
a

a
r

a
a

r
a

R1: correctly classifies 18 tuples out of 20 (accuracy=90%)

R2: correctly classifies 2 tuples out of 2 (accuracy=100%)

Accuracy alone is not enough

" Coverage alone is not enough (cover many tuples of classes )
"

Use Entropy

Rule Quality Measures

}

Using Entropy (Information Gain)

R: IF condition THEN class=c

If logically ANDing a given attribute test to condition we obtain

condition
R: IF condition THEN class=c
"
"
"
"

Test the potential rule R using entropy

Compute the entropy based on probabilities pi, where pi is the
probability of a class Ci in D
D is the set of tuples covered by R
Entropy prefers conditions that conver a large number of tuples of a
single class and few tuples of other classes

Summary of Section 4.4

}

Rule-based classification builds a model that is a set of rules

Rules can be extracted from a decision tree or directly from training

data

Rule quality measures are important to assess the rules and to

define orders for conflict resolution

Chapter 4: Classification & Prediction

}
}
}
}
}

4.1 Basic Concepts of Classification and Prediction

4.2 Decision Tree Induction
4.3 Bayes Classification Methods
4.4 Rule Based Classification
4.5 Lazy Learners
4.5.1 K-Nearest-Neighbor Classifiers
4.5.2 Shortcomings of K-NN Algorithms

}
}

4.6 Prediction
4.7 How to Evaluate and Improve Classification

4.5 Lazy Learners

}

The classification algorithms presented before are eager learners

"
"

Construct a generalization model before receiving new tuples to

classify
Learned models are ready and eager to classify previously unseen
tuples

Lazy learners
The learner waits till the last minute
Before doing any model construction
In order to classify a given test tuple
" Store training tuples
" Wait for test tuples
" Perform generalization based on similarity between test and the
stored training tuples
"

Eager Learners

Lazy learners

} Do

lot of work on training data

} Do

less work on training data

} Do

less work when test tuples are

} Do

more work when test tuples are

presented

4.5.1 k-Nearest Neighbor Classifiers

}

Nearest-neighbor classifiers compare a given test tuple with

training tuples that are similar
"
"
"

Training tuples are described by n attributes

Training tuples are stored in n-dimensional space
Find the k-nearest tuples from the training set to the unknown tuple

The closeness between tuples is defined in terms of distance metric

X1(x11,,x1n)
X2(x21,,x2n)

E.g., Euclidian distance

dist ( X 1 , X 2 ) =

2
(
x

x
)
1i 2i
i =1

4.5.1 k-Nearest Neighbor Classifiers

}

Classification
"
"
"

The unknown tuple is assigned the most common class among its k
nearest neighbor
When k=1 the unknown tuple is assigned the class of the training
tuple that is closest to it
1-NN scheme has a miss-classification probability that is no worse
than twice that of the situation where we know the precise
probability density of each function

Prediction
"
"
"

Nearest neighbor classifiers can also be used for prediction

Return a real-valued prediction for a given unknown tuple
The classifier returns the average value of the real-valued labels
associated with the k-nearest neighbors of the unknown tuple

How to Determine the Value of K

}

In typical applications k is in units or tens rather than in hundreds

or thousands

Higher values of k provide smoothing that reduces the risk of

overfitting due to noise in the training data

Value of k can be chosen based on error rate measures

We should ovoid over-smoothing by choosing k=n, where n is the

total number of tuples in the training data set

Lets see how to choose k via an example

Example
RID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

Income($000s)
60
85.5
64.8
61.5
87
110.1
108
82.8
69
93
51
81
75
52.8
64.8
43.2
84
49.2
59.4
66
47.4
33
51
63

lot Size (000s sq.ft )

18.4
16.8
21.6
20.8
23.6
19.2
17.6
22.4
20
20.8
22
20
19.6
20.8
17.2
20.4
17.6
17.6
16
18.4
16.4
18.8
14
14.8

class: Owners =1
Non-Owners=2
1
1
1
mower
1
1
1
1
1
1
1
We randomly divide
1
the data into
2
2
2
18 training cases
2
2
6 test cases:
2
tuples 6,7,12,14,19, 20
2
2
2
Use training cases
2
to classify test cases
2
2
and compute error rates
2

Values of K
}

If we choose k=1 we will classify in a way that is very sensitive to

the local characteristics of our data

If we choose a large value of k we average over a large number

of data points and average out the variability due to the noise
associated with data points

If we choose k=18 we would simply predict the most frequent

class in the data set in all cases
"

Very stable but completely ignores the information in the

independent variables
k

Misclassification
error %

We would choose k=11 (or possibly 13) in this case

4.5.2 Shortcomings of k-NN Algorithms

}

First: no time requires to estimate parameters from training data,

but the time to find the nearest neighbor can be prohibitive

Some ideas to overcome this problem

Reduce the time taken to compute distances by working in reduced

dimension (use PCA)

Use sophisticated data structure such as trees to speed up the

identification of the nearest neighbor

Edit the training data to remove redundant or almost reduncdant

points.
E.g, remove observations in the training data that have no effect
on the classification because they are surrounded by
obervations that all belong to the same class

4.5.2 Shortcomings of k-NN Algorithms

}

Second: the Curse of Dimensionality

Let p be the number of dimensions

The expected distance to the nearest neighbor goes up

dramatically with p unless the size of the training data set increases
exponentially with p

Some ideas to overcome this problem

Reduce the dimensionality of the space of attributes

Select subsets of the predictor variables by combining them using

methods such as principal components, singular value decomposition
and factor analysis

Chapter 4: Classification & Prediction

}
}
}
}
}
}

4.1 Basic Concepts of Classification and Prediction

4.2 Decision Tree Induction
4.3 Bayes Classification Methods
4.4 Rule Based Classification
4.5 Lazy Learners
4.6 Prediction
4.6.1 Definitions
4.6.2 Linear Regression
4.6.3 Nonlinear Regression
4.6.4 Generalized Linear Models: Logistic Regression
4.7 How to Evaluate and Improve Classification

4.6.1 Definitions
}
}

Numeric Prediction (or prediction) is the task of predicting

continuous (or ordered) values for given input
Examples
"
"

}
}

Given the profile of a costumer, predict how much money he will

spend
Predict the potential sale of a new product given its price

The most widely used approach for prediction is regression

Regression Analysis
"
"

A statistical methodology
Used to model the relationship between one or more independent
(predictor) variable and a dependent (response) variable

}
}

Predictor variables: the attributes describing a tuple

Response variable: what we want to predict

Many prediction problems can be solved using linear regression

A non-linear problem can be converted a linear one

4.6.2 Linear Regression

}

Straight-line regression analysis involves

"
"

A single predictor variable

A response variable

y = b + wx
"
"

The variance of y is constant

b and w are regression coefficients

b: Y-intercept
w: the slope of the line

Regression coefficients can also be considered as weights

y = 0 + 1 x
"

Need of estimating the regression coefficients

Method of Least Squares

}
}

Estimate the best-fitting straight line as the one that minimizes the
error between the actual data and the estimate of the line
Used to solve overdetermined systems (more equations than
unknowns)

f is the model function where

yi = f ( x, ) = 0 + 1 x
}

Minimize the sum, S, of squared residuals

| D|

S =

ri = yi f ( xi , )

i =1

D: a set of training tuples with 1 predictor and 1 response each

"
"
"
"

(x1, y1)
(x2,y2)

(x|D|,y|D|)

Method of Least Squares

}

The minimum of the sum of squares is found by setting the

gradient to zero. If the model contains m parameters there are m
gradient equations

ri
S
= 2
= 0, j = 1,..., m
j
j
i
}

When m=2, the regression coefficients are estimated by:

| D|

1 =

x )( yi y )

0 = y 1 x

i =1

| D|

x )2

i =1

where x is the man value of x1,x2,,y|D|

y is the mean value of y1,y2,,y|D|

Example
}

Four data points: (1,6),(2,5),(3,7) and (4,10)

Model these data as

Find the parameters that approximately solve:

!!0
#
#!0
"
#!0
#
$!0

y = f ( x, ) = 0 + 1 x

+1!1 = 6
+ 2 !1 = 5
+ 3!1 = 7
+ 4 !1 = 10
2

S = [6 ( 0 + 11 )] + [5 ( 0 + 21 )]
2

+ [7 ( 0 + 31 )] + [10 ( 0 + 41 )]
}

By determine the partial derivatives S with respect to 1 and 1

and setting then to zero, we find:

0 = 3.5 and 1 = 1.4

Multiple Linear Regression

}
}
}

Involve more than one predictor variables

Model a response variable as linear function of n predictor
variables A1,A2,,An
D: a set of training tuples with n predictors and 1 response each
"
"
"
"

(X1, y1)
(X2,y2)

(X|D|,y|D|)

y = 0 + 1 x1 + 2 x2 + ... + n xn
}

The method of least square is used to estimate the coefficients.

However the computation becomes long
"

Use statistical software packages (e.g., SAS, SPSS, and S-Plus)

4.6.3 Nonlinear Regression

}
}

How to model data that does not show a linear dependence?

Example: polynomial regression
"
"
"

Add polynomial terms to the basic linear model

Apply transformations to variables
Covert the nonlinear model to a linear one

Consider a cubic polynomial relationship given by:

y = 0 + 1 x + 2 x 2 + 3 x 3
}

To convert this equation to linear from, we define new

variables

x1 = x

x2 = x 2

x3 = x 3

The equation becomes

y = 0 + 1 x1 + 2 x2 + 3 x3

4.6.4 Generalized Linear Models

}

Represent the theoretical foundation on which the linear

regression can be applied to model classification

The variance of the response variable, is a function of the mean

value of y, unlike the linear regression where the variance of y is
constant

Common types of generalized linear models include

"
"

Poisson regression
Logistic regression

Logistic regression models the probability of some event

occurring as a linear function of a set of predictor variables

Logistic Regression
}

The logistic regression is used for binomial regression

It predicts the probability of occurrence of an event by fitting

data to a logistic curve
1

x represents the exposure to

some set of risk factors
}

f(x) represents the probability of

a particular outcome,
given that set of risk factors.

0.5

-4

-2

ex
f ( x) =
1+ ex

Logistic Regression
}

The variable x is a measure of the total contribution of all the risk

factors (independent variables) used in the model and is known
as the logit
ex
f ( x) =
1+ ex
x is usually defined as
1

x = 0 + 1 x1 + ... + k xk
}

0.5

The logistic regression model is given by

-4

0 + 1 x1 +...+ k xk

-2

e
P(Y = 1 | x1 , x2 ,...xk ) =
0 + 1 x1 +...+ k xk
1+ e

Logistic Regression
0 + 1 x1 +...+ k xk

e
P(Y = 1 | x1 , x2 ,...xk ) =
0 + 1 x1 +...+ k xk
1+ e
}

Estimate parameters using Maximum Likelihood Estimator

"
"

Data: yj, x1j,x2j,,xpj, j=1,2,,n

Likelihood Function is given by:
n

L( ) =
j =1

0 + 1 x1 j +...+ p x pj

1+ e

0 + 1 x1 j +...+ p x pj

To simplify the computation, we can maximize the log likelihood

function

To estimate the parameters

"
"

Compute the partial derivatives of the loglikelihood

Equate each partial derivative to zero, and solve the resulting
nonlinear equations

Summary of Section 4.6

}

Numeric Prediction is the task of predicting continuous values

Regression analysis is mostly used for prediction

Regression can be of different forms Linear and nonlinear

Logistic regression is used to model binomial regression

Chapter 4: Classification & Prediction

}
}
}
}
}
}

4.1 Basic Concepts of Classification and Prediction

4.2 Decision Tree Induction
4.3 Bayes Classification Methods
4.4 Rule Based Classification
4.5 Lazy Learners
4.6 Prediction
4.7 How to Evaluate and Improve Classification
4.7.1 Accuracy and Error Measures
4.7.2 Evaluating a Classifier or Predictor
4.7.3 Increasing the Accuracy

4.7.1 Accuracy and Error Measures

Classifier Accuracy Measures
}

Using training data to build and test a classifier can result in a

misleading overoptimistic estimates

Accuracy is better measured using test data that was not used to
build the classifier

Accuracy: [Acc (M)]- accuracy of model M

"
"
"
"

The percentage of test set tuples that are correctly classified

Referred to as the overall recognition rate of the classifier
Error rate or misclassification rate: 1-Acc(M)
When training data are used, the error rate is called resubstituion
error

Classifier Accuracy Measures

}

The confusion matrix as a table of at least m by m size. An entry

CMi,j indicated the number of tuples of class i that were labeled
as class j
Real class
\Predicted class

Class1

Class2

Classm

Class1

CM1,1

CM12

CM1,m

Class2

CM2,1

CM2,2

CM2,m

Classm

CMm,1

CMm,2

CMm,m

Ideally, most of the tuples would be represented along the

diagonal of the confusion matrix

Classifier Accuracy Measures

Case of binary classification
}
}

}
}
}
}

Positive tuples: tuples of the main class of interest (e,g., C1)

Negative tuples: tuples of the other class (e.g, C2)
Real class\Predicted
class

True positive

False negative

False positive

True negative

True positives: positive tuples correctly labeled

True negatives: negative tuples correctly labeled
False positives: negative tuples incorrectly labeled
False negatives: positive tuples incorrectly labeled

Classifier Accuracy Measures

}

Other measures can be used when the accuracy measure is not

acceptable
"

Sensitivity
Specificity

Precision

t _ pos
sens =
pos

t _ neg
spec =
neg

t _ pos
precision =
(t _ pos + f _ pos)
pos
neg
accuracy = sens
+ spec
( pos + neg )
( pos + neg )
"

t_pos: the number of true positives

"
"

t_neg: the number of true negatives

Neg: number of positive tuples

Pos: number of positive tuples

F_pos: number of false positives

Predictor Error Measures

}

The predictor returns continuous values

"
"

It is difficult to say whether the predicted value is correct or not

Measure how far the predicted value from the known value

Compute loss functions

Absolute error =| yi yi ' |

Squared error = ( yi yi ' ) 2
"

Mean square error is more sensitive to outliers

The test error or generalization error is the average loss

| D|

Mean absolute error =

|
i =1

| D|

Mean squared error =

yi yi ' |
|D|

2
(
y

y
'
)
i
i
i =1

|D|

Predictor Error Measures

}

The total loss can be normalized by dividing by the total loss

incurred from always predicting the mean
| D|

Relative absolute error =

yi yi ' |

yi y |

i =1
| D|

i =1

| D|

Relative squared error =

2
(
y

y
'
)
i
i
i =1
| D|

2
(
y

y
)
i
i =1

In practice, the choice of error measure does not greatly affect

prediction model selection

4.7.2 Evaluating a Classifier or Predictor

}

How can we use the measures described previously to obtain a

reliable estimate of classifier accuracy (or predictor accuracy in
terms of error)?

Some common techniques used for this purpose are

Holdout Method and Random Subsampling

Cross-validation

Bootstrap

}
}

They assess accuracy based on randomly sampled partitions of

the given data
These techniques increase the overall computation time

Holdout and Random Subsampling

}

Holdout
"

Randomly partition the data into two independent sets: training

set and test set

Typically: two-thirds of the data are allocated to training set

and one-third is allocated to test set
The estimate is pessimistic because only a portion of the initial
data is used to derive the model

Random Subsampling
"

The holdout method is repeated k times

The overall accuracy is taken as the average of the

accuracies obtained from each iteration

Cross-validation
}

Partition the data into k mutually exclusive subsets or folds,

D1, D2,, Dk

Training and testing is preformed k times

"
"
"

First iteration: use D2,Dk and training and D1 as test

Second iteration: use D1,D3,,Dk as training and D2 as test

Each sample is used the same number of times for training and
once for testing

Cross-validation
}

Leave-one-out
"
"
"

Stratified cross-validation
"

A special case of k-fold cross-validation

K is set to the initial number of tuples
Only one sample is left out at a time for the test set

The class distribution of the tuples in each fold is approximately

the same as in the initial data

In general, stratified 10-fold cross validation is recommended for

estimating accuracy due to its relatively low bias and variance

Bootstrap
}

Sample training tuples uniformly with replacement

Several bootstrap methods, and a common one is .632 boostrap

Suppose we are given a data set of d tuples

The data set is sampled d times with replacement

Result: a training set of d samples

About 63.2% of the original data will end up in the bootstrap,
and the remaining 36.8% will form the test set (since (1 1/d)d
e-1 = 0.368)

i.e., each time a tuple is selected, it is equally likely to be

selected again and re-added to the training set

Repeat the sampling procedure k times, overall accuracy of the

model:
k

acc( M ) = (0.632 acc( M i )test _ set +0.368 acc( M i )train _ set )

i =1

4.7.3 Increasing the Accuracy

}

We have seen that pruning improves the accuracy of decision

trees by reducing the overfitting effect

There are some general strategies for improving the accuracy of

classifiers and predictors

Bagging and Boosting are some of these strategies

Ensemble methods: use a combination of models

Combine a series of learned classifiers M1,M2,,Mk

Find an improved composite model M*

Bagging
Intuition
Ask diagnosis
to one doctor

How accurate is
this diagnosis ?
diagnosis
diagnosis_1

patient

diagnosis_2

diagnosis_3

Choose the diagnosis that occurs more than any of the others

Bagging
M1

New data
sample

Data
M2

.
.
.

Combine
votes

Prediction

}
}
}

K iterations
At each iteration a training set Di is sampled with replacement
The combined model M* returns the most frequent class in case
of classification, and the average value in case of prediction

Boosting
Intuition

diagnosis_1 0.4
patient

diagnosis_2 0.5

diagnosis_3 0.1

Assign different weights to the doctors based on the accuracy of

their previous diagnosis

Boosting
}

Weights are assigned to each training tuple

A series of k classifiers is iteratively learned

After a classifier Mi is learned, the weights are adjusted to allow

the subsequent classifier to pay more attention to training tuples
misclassified by Mi

The final boosted classifier M* combines the votes of each

individual classifier where the weight of each classifier is a
function of its accuracy

This strategy can be extended for the prediction of continuous

values

Example:Adaboost Algorithm
}

Given a set of d class-labeled tuples (X1, y1), , (Xd, yd)

Initially, all the weights of tuples are the same: 1/d

Generate k classifiers in k rounds.

At round i, tuples from D are sampled (with replacement) to form

a training set Di of the same size

Each tuples chance of being selected depends on its weight

A classification model Mi is derived and tested using Di

If a tuple is misclassified, its weight increases, otherwise it

decreases (use err(Mi)/(1-err(Mi)))

Example:Adaboost Algorithm
}

Error rate err(Xi) is the misclassification error of tuple Xi

Classifier Mi error rate is the sum of the weights of the misclassified

tuples
d

error ( M i ) = w j err (X j )
j

"
"

Tuple correctly classified: err(Xi)=0

Tuple incorrectly classified: err(Xi)=1

The weight of classifuer Mis vote is

1 error ( M i )
log
error ( M i )

Summary of Section 4.7

}

Accuracy is used to assess classifiers

Error measures are used to assess predictors

Stratified 10-fold cross validation is recommended for estimating

accuracy

Bagging and boosting are used to improve the the accuracy of

classifiers and predictors

Chapter 7
No ratings yet
Chapter 7
26 pages
Data Mining and Data Warehousing
100% (2)
Data Mining and Data Warehousing
11 pages
Cluster Analisys
No ratings yet
Cluster Analisys
100 pages
A Project Report
No ratings yet
A Project Report
21 pages
Chapter - 4 - Association Rule Mining
No ratings yet
Chapter - 4 - Association Rule Mining
86 pages
History of Computing Hardware
No ratings yet
History of Computing Hardware
20 pages
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
No ratings yet
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
46 pages
CSDF Endsem
100% (1)
CSDF Endsem
33 pages
Adama Science and Technology University: School of Electrical Engineering and Computing
No ratings yet
Adama Science and Technology University: School of Electrical Engineering and Computing
10 pages
ADBMS Lab Manual
No ratings yet
ADBMS Lab Manual
33 pages
Artificial Intelligenceii
No ratings yet
Artificial Intelligenceii
48 pages
UNIT V DWM Notes
No ratings yet
UNIT V DWM Notes
18 pages
Document 8
No ratings yet
Document 8
14 pages
DWDM R13 Unit 1 PDF
No ratings yet
DWDM R13 Unit 1 PDF
10 pages
Java Programming Lab Manual
No ratings yet
Java Programming Lab Manual
73 pages
Unit 4
No ratings yet
Unit 4
4 pages
CH - 1
No ratings yet
CH - 1
155 pages
Presentation1 Operating System
No ratings yet
Presentation1 Operating System
12 pages
3.1 What Is Data Warehouse?: Unit Iii
No ratings yet
3.1 What Is Data Warehouse?: Unit Iii
33 pages
Density & Grid based clustering
100% (1)
Density & Grid based clustering
21 pages
Dataming T PDF
No ratings yet
Dataming T PDF
48 pages
os final notes for exam preparation
No ratings yet
os final notes for exam preparation
7 pages
Data Science Techniques Classification Regression and Clustering
No ratings yet
Data Science Techniques Classification Regression and Clustering
5 pages
14 Slide
No ratings yet
14 Slide
36 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
62 pages
Second Generation Computers
No ratings yet
Second Generation Computers
4 pages
IC 1403 Neural Network and Fuzzy Logic Control PDF
No ratings yet
IC 1403 Neural Network and Fuzzy Logic Control PDF
6 pages
Design and Implementation of Movie Reservation System
No ratings yet
Design and Implementation of Movie Reservation System
103 pages
Chapter Two: 4.1 The Structured Paradigm Versus The Object-Oriented Paradigm
100% (1)
Chapter Two: 4.1 The Structured Paradigm Versus The Object-Oriented Paradigm
43 pages
Why Data Mining? Behavioral Data: From Lecture Notes
No ratings yet
Why Data Mining? Behavioral Data: From Lecture Notes
5 pages
Mainframe Operating Systems
No ratings yet
Mainframe Operating Systems
4 pages
A Comprehensive Evaluation of Cryptographic Algorithms Des 3des Aes Rsa and Blowfish PDF
No ratings yet
A Comprehensive Evaluation of Cryptographic Algorithms Des 3des Aes Rsa and Blowfish PDF
8 pages
Final Assessment 20202
No ratings yet
Final Assessment 20202
9 pages
Assignment Set - 1 Database Management System (DBMS and Oracle 9i)
No ratings yet
Assignment Set - 1 Database Management System (DBMS and Oracle 9i)
28 pages
Information Technology Unit 3
No ratings yet
Information Technology Unit 3
7 pages
Security Model Exam
No ratings yet
Security Model Exam
18 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
19 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
04 Data Warehouse and Data Mart
No ratings yet
04 Data Warehouse and Data Mart
15 pages
Data Mining - Tasks: Data Characterization Data Discrimination
No ratings yet
Data Mining - Tasks: Data Characterization Data Discrimination
4 pages
Unit Wise-Question Bank UNIT-1 1. Two Marks Question With Answers: 1. What Are The Uses of Multi Feature Cubes?
No ratings yet
Unit Wise-Question Bank UNIT-1 1. Two Marks Question With Answers: 1. What Are The Uses of Multi Feature Cubes?
85 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
18 pages
Chapter 1
No ratings yet
Chapter 1
58 pages
DBMS Deadlock
No ratings yet
DBMS Deadlock
10 pages
CSC422: Introduction To Artificial Intelligence Lecture Notes Page 1
No ratings yet
CSC422: Introduction To Artificial Intelligence Lecture Notes Page 1
53 pages
Smart Traffic Management System Using IOT and Machine Learning Approach
No ratings yet
Smart Traffic Management System Using IOT and Machine Learning Approach
6 pages
Neural Networks and Fuzzy Logic 19APC0216 Min
No ratings yet
Neural Networks and Fuzzy Logic 19APC0216 Min
71 pages
Mini Project - Documentation Format
No ratings yet
Mini Project - Documentation Format
8 pages
Mengistu Tesfaye
No ratings yet
Mengistu Tesfaye
103 pages
Univac 1969
No ratings yet
Univac 1969
22 pages
Operations Research
25% (4)
Operations Research
2 pages
UNIT V Streaming
No ratings yet
UNIT V Streaming
22 pages
Data Binning
No ratings yet
Data Binning
9 pages
Wollo University: Object Oriented Programming
No ratings yet
Wollo University: Object Oriented Programming
4 pages
Computer Software and Operating System
No ratings yet
Computer Software and Operating System
7 pages
Non Deterministic Finite Automata
No ratings yet
Non Deterministic Finite Automata
30 pages
300+ TOP Operating System LAB VIVA Questions and Answers
No ratings yet
300+ TOP Operating System LAB VIVA Questions and Answers
25 pages
IME672 - Lecture 48
No ratings yet
IME672 - Lecture 48
21 pages
Lesson 3.2 - Supervised Learning Evaluation
No ratings yet
Lesson 3.2 - Supervised Learning Evaluation
31 pages
Lecture 9
No ratings yet
Lecture 9
32 pages
Cs2257 Operating Systems Lab Manual
No ratings yet
Cs2257 Operating Systems Lab Manual
43 pages
Agile Sales and Business Development
No ratings yet
Agile Sales and Business Development
16 pages
Exam, BAC, 2019
No ratings yet
Exam, BAC, 2019
7 pages
17 Beza Gebrameskel PDF
No ratings yet
17 Beza Gebrameskel PDF
88 pages
Voice Output Communication Aids Students With Autism PP
No ratings yet
Voice Output Communication Aids Students With Autism PP
8 pages
Major Project Report
No ratings yet
Major Project Report
31 pages
UMHS Graduation Memo 2017
No ratings yet
UMHS Graduation Memo 2017
4 pages
Test Item Analysis: Elizabeth B. Beniasan Carlito F. Layos
No ratings yet
Test Item Analysis: Elizabeth B. Beniasan Carlito F. Layos
5 pages
Chevening Scholarships: Dare To Lead
No ratings yet
Chevening Scholarships: Dare To Lead
22 pages
EML 19-21 Contract Schedule C-D
No ratings yet
EML 19-21 Contract Schedule C-D
6 pages
fs 1 chapter 4
No ratings yet
fs 1 chapter 4
13 pages
So You Just Bought An E60 (V3.3)
100% (2)
So You Just Bought An E60 (V3.3)
19 pages
Will Be Able To Exchange and Repeat at Least 2 Exchanges Correctly
No ratings yet
Will Be Able To Exchange and Repeat at Least 2 Exchanges Correctly
1 page
Connecting Learning Across School and Ou
No ratings yet
Connecting Learning Across School and Ou
21 pages
CorGov Competency - Framework
No ratings yet
CorGov Competency - Framework
28 pages
Teaching Non-E Major Listening & Speaking by Videos
No ratings yet
Teaching Non-E Major Listening & Speaking by Videos
7 pages
Rebel Returnees - Chapt 1 3
No ratings yet
Rebel Returnees - Chapt 1 3
23 pages
Full Staff List Apr18
No ratings yet
Full Staff List Apr18
1 page
Synopsis How Women Rise
No ratings yet
Synopsis How Women Rise
4 pages
9706 s18 Ms 12 PDF
No ratings yet
9706 s18 Ms 12 PDF
3 pages
Launching DISE 2010-11 Data Collection
No ratings yet
Launching DISE 2010-11 Data Collection
5 pages
Jo Dissertation 2021
No ratings yet
Jo Dissertation 2021
220 pages
2. MARKING SCHEME OF SQP, ENGLISH 10th, SET-2, 2022-23
No ratings yet
2. MARKING SCHEME OF SQP, ENGLISH 10th, SET-2, 2022-23
22 pages
Guerrero Et Al. vs. Brentwood Unified School District
No ratings yet
Guerrero Et Al. vs. Brentwood Unified School District
26 pages
Week-4 Ict - CHS
No ratings yet
Week-4 Ict - CHS
3 pages
Implement Stack Using Linked List.: Institute Code: 0141
100% (1)
Implement Stack Using Linked List.: Institute Code: 0141
15 pages
Test, Punish, and Push Out
No ratings yet
Test, Punish, and Push Out
56 pages
2022-09-27 - Postlesson Affordance Based Reflective Discussion in Elt Classes
No ratings yet
2022-09-27 - Postlesson Affordance Based Reflective Discussion in Elt Classes
14 pages
Neupane Sagar
No ratings yet
Neupane Sagar
36 pages
The Teacher in The Local and in The Global Setting
No ratings yet
The Teacher in The Local and in The Global Setting
24 pages