5.classification and Prediction
5.classification and Prediction
Classification:
The world of data mining is known as an interdisciplinary one. It requires
a range of disciplines such as analytics, database systems, machine
learning, simulation, and information sciences. The classification of the
data mining system allows users to understand the system and to align their
criteria with such systems. Classification is about the discovery of a model
that distinguishes groups and concepts of data. The definition is to forecast
the class of objects by using this model. The derived model relies on the
study of training data sets.
A classification task starts with a data set where the assignments of the
class are known. For example, based on observable data for multiple loan
borrowers over some time, a classification model may be established that
forecasts credit risk. The data could track job records, homeownership or
leasing, years of residency, number, and type of deposits, in addition to the
historical credit ranking, and so on. The goal would be credit ranking, the
predictors would be the other characteristics, and the data would represent
a case for each consumer.
With the help of the bank loan application that we have discussed above,
let us understand the working of classification. The Data Classification
process includes two steps –
Building a Classifier or Model
Using Classifier for Classification
Prediction:
Prediction is one of the data mining techniques that discovers the relation
between independent variables and relation between dependent variables.
To detect the inaccessible data, it uses regression analysis and detects the
missing numeric values in the data. If the class mark is absent, so
classification is used to render the prediction. Due to its relevance in
business intelligence, the prediction is common. If the class mark is absent,
so the prediction is performed using classification.
Following are the examples of cases where the data analysis task is
Prediction:
Suppose the marketing manager needs to predict how much a particular
customer will spend at his company during a sale. We are bothered to
forecast a numerical value in this case. Therefore, an example of numeric
prediction is the data processing activity. In this case, a model or a
predictor will be developed that forecasts a continuous or ordered value
function.
For accuracy:
The precision metric shows the accuracy of the positive class. It measures how
likely the prediction of the positive class is correct.
Sensitivity computes the ratio of positive classes correctly detected. This metric
gives how good the model is to recognize a positive class.
Where P (X⋂Y) is the joint probability of both X and Y being true, because
Bayesian Network:
A Directed Acyclic Graph is used to show a Bayesian Network, and like some
other statistical graph, a DAG consists of a set of nodes and links, where the links
signify the connection between the nodes.
The nodes here represent random variables, and the edges define the relationship
between these variables.
A DAG models the uncertainty of an event taking place based on the Conditional
Probability Distribution (CDP) of each random variable. A Conditional
Probability Table (CPT) is used to represent the CPD of each variable in a
network.
Example: Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not. So, to solve this problem, the decision
tree starts with the root node (Salary attribute by ASM). The root node splits
further into the next decision node (distance from the office) and one leaf node
based on the corresponding labels. The next decision node further gets split into
one decision node (Cab facility) and one leaf node. Finally, the decision node
splits into two leaf nodes (Accepted offers and Declined offer). Consider the
below diagram:
Concept of Entropy
Entropy controls how a Decision Tree decides to split the data. It actually effects
how a Decision Tree draws its boundaries.
On the figure below is depicted the splitting process. Red rings and blue crosses
symbolize elements with 2 different labels. The decision starts by evaluating the
feature values of the elements inside the initial set. Based on their values,
elements are put in Set 1 or Set 2. In this example, after the splitting, the state
seems tidier, most of the red rings have been put in Set 1 while a majority of blue
crosses are in Set 2.
So decision trees are here to tidy the dataset by looking at the values of the feature
vector associated with each data point. Based on the values of each feature,
decisions are made that eventually leads to a leaf and an answer.
At each step, each branching, you want to decrease the entropy, so this quantity
is computed before the cut and after the cut. If it decreases, the split is validated
and we can proceed to the next step, otherwise, we must try to split with another
feature or stop this branch.
Before and after the decision, the sets are different and have different sizes. Still,
entropy can be compared between these sets, using a weighted sum, as we will
see in the next section.
Equation of Entropy:
The scatter plot shows the relationship between GDP and time of a country, but
the relationship is not linear. Instead after 2005 the line starts to become curve
and does not follow a linear straight path. In such cases, a special estimation
method is required called the non-linear regression.