Ilovepdf Merged
Ilovepdf Merged
Dr. Mahesh K C 1
Factor Analysis
• In real life, data tends to follow some patterns but the reasons are not apparent right
from the start of the data analysis.
• In a demographics based survey, many people will answer questions in a particular
‘way’. For example, all married men will have higher expenses than single men but lower
than married men with children.
• In this case, the driving factor which makes them answer following a pattern is the
economic status but these answers may also depend on other factors such as level of
education, salary and locality or area.
• It becomes complicated to assign answers related to multiple factors.
• One option is to map automatically all variables or answers into different new
categories with customized weight (or loadings) based on their influence on that
category.
• Factor analysis starts with the assumption of hidden latent variables which cannot be
observed directly but are reflected in the answers or variables of the data.
• It also makes the assumption that there are as many factors as there are variables.
An example
• Consider the following factor loadings from an airline customer satisfaction survey.
• Interpretation of factors.
Kaiser-Meyer-Olkin (KMO) Test and Bartlett’s Test for Sphericity
• KMO-test: A test used to examine the appropriateness of (measure of sampling
adequacy) FA.
• The test measures the sampling adequacy of each variable in the data and an overall
sampling adequacy of the data.
• KMO < 0.5, indicate FA may not be appropriate. 0≤KMO≤1
• Basically, KMO tests the level of correlation between the variables.
• Bartlett test for Sphericity: A test used to examine the hypothesis that the variables
are uncorrelated.
• Let denote the correlation matrix of order m and I denotes the identity matrix of
order m. Then the hypothesis to be tested is
H 0 : I against H1 : I
• If p-value < 0.01, reject the null hypothesis that no correlation exists among the
variables.
Factor rotation: Varimax Method
• Solution to factor analysis are non-unique without further constraints. This facilitates
the need for factor rotation.
• Factor loadings represents the correlations between the factors and the variables. A
large absolute value indicates the factor and the variables are closely related.
• In un-rotated factor loadings, it is possible that factors are correlated with several
variables leading to interpretation wise difficulties.
• By rotating factors, we would like each factor to have significant loadings with some of
the variables. Similarly, we would like each variable to have significant loadings with
only a few factors.
• Rotation does not affect communalities and percentage of total variance explained.
• Shmueli, G., Bruce, P .C, Yahav, I., Patel, N.R., Lichtendahl, K .C.
(2018), Data Mining for Business Analytics, Wiley.
• Larose, D.T. & Larose, C.D. (2016), Data Mining and Predictive
Analytics, 2nd edition, Wiley.
• Kumar, U.D., (2018), Business Analytics-The Science of Data-
Driven Decision Making, 1st edition, Wiley.
Dr. Mahesh K C 8
Session 9: Simple Linear Regression
Revisited and Diagnostics
Dr. Mahesh K C 1
Introduction
• Managerial decisions are often based on the relationship between two or more variables.
• Examples:
• A company in distribution business may interested in the relationship between price of crude oil and
the company’s transportation cost.
• A marketing executive might want to know how strong the relationship is between advertising
expenditure and sales.
• An economist may be interested in the relationship between the income and expenditure.
• A commercial airliner may interested in predicting the cost of flying based on type of plane, distance,
number of passengers etc.
Dr. Mahesh K C 2
Scatter Plot: Linear and Non-Linear
• A graphical way of identifying the relationship between two variables.
• Let x = students population (in 1000s) and y: sales ($1000s). See Figure-1.
• Let x = months employed and y = items sold. See figure 2.
Figure 2 Non-linear Scatter Plot
Figure-1 Linear Scatter Plot
400
250
350
200 300
Scales Sold
250
150
Sales
200
100
150
50 100
50
0
0 5 10 15 20 25 30 0
0 20 40 60 80 100 120
Population
Months employed
Dr. Mahesh K C 3
Simple Linear Regression (SLR) Model
• Regression analysis is the process of constructing a mathematical model or
function that can be used to predict one variable by another variable or set of
variables.
• The variable being predicted is called the dependent variable (denoted by y) and
the variable being used to predict the dependent variable is called independent
variable (denoted by x).
• The equation that describes how y, the dependent variable, is related with x, the
independent variable and an error term is called the regression model. In SLR,
the model used is:
Y = β0 + β1X + ε
where Y=(y1, y2,…,yn), X =(x1, x2,…, xn), β0, β1 are referred to as the parameters of
the model and ε (epsilon), a random variable referred to as error term.
• The error term accounts for the variability in y that can not be explained by the
linear relationship between x and y.
Dr. Mahesh K C 4
Principle of Least Squares and Estimated regression line
• Since β0 and β1 are parameters in the regression model, generally unknown,
one has to use their estimated values, say, b0 and b1.
• The estimated values are obtained using the principle of least squares which
states that the sum of squares of errors should be minimum.
• Using this principle the values of b0 and b1 are as follows:
b1 xy x y , b0 y b1 x
x x
2 2
ŷ b0 b1 x
Dr. Mahesh K C 5
How well the estimated regression line fits the data?
• Least Squares Regression Method can approximate linear relationship
between any two variables.
• How useful is the estimated regression line for making predictions?
• Coefficient of Determination (The r2 statistic) measures the estimated line’s
goodness of fit to data.
• The following rough guidelines can be some time useful in deciding the
goodness of fit of the model:
If r2 ≥0.8, regression is good,
If 0.5≤ r2 <0.8, regression is moderate and
If r2 < 0.5, regression is poor.
Dr. Mahesh
8 KC
Inference on Regression: The t-test and the F-test
• The inference on regression is done in two ways:
test of significance of predictor (the t-test) and
test of overall significance of the model (the F-test).
H 0:β1 0
• Irrespective of the test, the hypothesis to be tested is:
H 1:β1 0
• Note that with one independent variable, the F-test will provide the same
conclusion as the t-test.
• But with more than one independent variable, only the F-test can be used to
test for an overall significant relationship. In this case, the t-test will be used to
test the individual significance of the independent variable.
Dr. Mahesh K C 9
The Residual Analysis
• For model building and inference purposes, regression model assumptions
require validation.
• Estimated Regression Line: yˆ b0 b1 x
• Residuals: (Actual value – Estimated Value) y i yˆ i ; i 1,2,....,n
Dr. Mahesh
10 K C
Outliers and High Leverage Values
• Observations with very large standardized residual, in absolute value are outliers.
• Generally, observations with standardized residual beyond ± 2 are flagged as
outliers.
Dr. Mahesh
11 K C
Outlier Example
y = -7.3305x + 64.958 y = -7.0283x + 60.425
Figure-1 R² = 0.4968 Figure-2 R² = 0.8912
80 60
X Y SR
1 45 -1.06 70 50
1 55 -0.22 60
40
2 50 -0.02 50
3 75 2.68 40 30
3 40 -0.25 30
20
3 45 0.17 20
4 30 -0.47 10
10
4 35 -0.05 0 0
5 25 -0.28 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
6 15 -0.50
• In figure 1, the point (3,75) could be an outlier. The model has r2 = 0.49.
• After removing the outlier there is a substantial improvement in the model fit (r2
=0.89) (see Figure 2).
• The observation y=75 is an outlier as its SR=2.68.
Dr. Mahesh K C 12
Leverage Point Example
y = -0.4251x + 127.47
Figure-1 R² = 0.7989 y = -1.0909x + 138.18
X Y Leverage 140 Figure-2 R² = 0.8727
135
10 125 0.22
120
130
100
10 130 0.22
80 125
15 120 0.18
60 120
20 115 0.15 40 115
20 120 0.15 20
110
25 110 0.14 0
0 10 20 30 40 50 60 70 80 105
70 100 0.94 0 5 10 15 20 25 30
• The point (70,100) could be a high leverage point or outlier in the predictor
variable.
• Removal of the same changes the r2 value (see figure 2).
• The leverage value 0.94 > 0.57 which corresponds to X=70.
Dr. Mahesh K C 13
Influential Observation
• Influential observation significantly alters regression parameters based on
absence/presence in data set.
• Outlier and high leverage point may or may not be influential.
• Influential observations combine both the characteristics of large residual and
high leverage.
Dr. Mahesh
14 K C
Influential Observation Example
y = 0.0981x + 119.05
X Y Cook’s D Figure-1 R² = 0.0751
Figure-2
y = -1.0909x + 138.18
135 R² = 0.8727
135
10 125 0.07
130
130
10 130 0.29
125 125
15 120 0.00
120 120
20 115 0.06
20 120 115 115
0.00
25 110 0.21
110 110
• Consider the observation (70,130) in figure 1. Observe the estimated line and the
corresponding r2.
• Once we remove the above point, the value actually increase and the entire regression
line changes drastically (see the slopes in both figures).
• The Cook’s Distance D =35.19 >1 corresponds to the point (70, 130) is an influential
observation.
Dr. Mahesh K C 15
Verifying Regression Assumptions
• For inferential purpose adherence to regression assumptions is essential.
• Two graphical methods used to verify assumptions:
(1) Normal probability plot of residuals.
(2) Plot of standardized residuals (SR) against predicted values.
Dr. Mahesh
16 K C
Verifying Regression Assumptions (cont’d)
• Method 2: Plot Standardized Residuals Against Predicted (fitted) Values
• Four commonly found patterns in residual-fit plots shown.
A B
assumption.
• Plot (C): displays “funnel” pattern, which violates constant
variance assumption.
• Plot (D): shows pattern increasing from left to right, which violates zero-mean assumption.
Dr. Mahesh
17 K C
What if graphical tests indicate regression assumption(s) violated?
• For example, constant variance assumption violated.
• Transformation of response and/or /both predictor variable (s) may help.
• Frederic, Mosteller and Tukey Bulging Rule (FMTB Rule)
• Ladder of re-expressions” proposed, which are essentially power transformations.
• Compare curve in scatter plot to curves shown on right.
t 3 t 2 t 1 t 1 2 ln(t ) t t1 t2 t3
Dr. Mahesh
18 K C
Transformations to Achieve Linearity: Two variable case
• Scrabble® is game where players build crosswords by randomly selecting letter tiles.
Each tile has associated point value, where point value roughly related to letter
frequency.
• Plot indicates relationship between two variables curvilinear, rather than linear.
• Therefore, modeling linear relationship is not appropriate.
Dr. Mahesh
19 K C
FMTB Rule applied to Scrabble data
• Bulging rule says to move “x down, y down”
• Should transform x, moving down one or more positions from x’s current
position t1 on ladder. Similarly, transform y from position t1.
• Bulging rule says to apply square root or log transformation to x and y
• Square root transformation applied producing sqrt points and sqrt frequency.
However, scatter plot determines relationship remains non-linear.
• Next, x and y transformed using log transformation. Scatter plot ln points
against ln freq shows somewhat acceptable level of linearity.
Dr. Mahesh
20 K C
Scatter plots: After square root and log transformations
2.0
2.5
1.5
Psq
LogP
2.0
1.0
1.5
0.5
0.0
1.0
1.0 1.5 2.0 2.5 3.0 3.5 0.0 0.5 1.0 1.5 2.0 2.5
Fsq LogF
Dr. Mahesh K C 21
References
Dr. Mahesh K C 22
Session 10: Multiple Linear Regression-
Model Building
Dr. Mahesh K C 1
Multiple Linear Regression (MLR)
• Simple linear regression examines relationship between single predictor and
response.
• Multiple Regression models set of predictors to single continuous response.
• Provide improved precision for estimation and prediction.
• Model uses plane or hyper-plane to approximate relationship between
predictor set and single response.
Dr. Mahesh
2 KC
The Multiple Regression Model
• Multiple Regression Model:
y 0 1 x1 2 x2 p x p
where, β1, β2, …, βp are model parameters whose true value remains unknown and ε
represents error term.
• Model parameters are estimated from data set using method of least squares.
• The estimated regression plane: yˆ b0 b1 x1 b2 x2 .... bp x p
• We interpret the coefficient bi as “estimated change in response variable, for unit increase
in variable xi, when all remaining predictors held constant.”
• The quantity (y – ŷ) measures error in prediction called residual. Residual equals
vertical distance between data point and regression plane (or hyperplane) in multiple
regression.
• Coefficient of Determination (R2): Represents proportion of variability in response
variable accounted for by linear relationship with predictor set.
Dr. Mahesh
3 KC
An Example
• Consider a MLR model to estimate the miles per gallons (mpg) based on
weight (wt) and displacement (disp).
• A 3D scatter plot for the same will be:
Dr. Mahesh K C 4
Model Assumptions
Dr. Mahesh K C 5
Coefficient of Determination (R2) and Adjusted R2
• Would we expect higher R2 value when using two predictors, rather than one?
• Yes, R2 always increases by including additional predictor. When new predictor is
useful, R2 increases substantially. Otherwise, R2 may increase small or negligible
amount.
• Largest R2 may occur for models with most predictors, rather than best
predictors.
Dr. Mahesh
6 KC
Inference on Regression: The t-test and F-test
• t-test is used to test the significance of individual predictors of the regression model.
• Hypothesis Test for model: H0: βi = 0 against H1: βi ≠ 0 (i = 1, 2, 3,…., p)
• Reject the null hypothesis when p-value < level of significance (α).
Dr. Mahesh
7 KC
Multi-collinearity
• Multicollinearity is condition where two or more predictors are correlated.
• This leads to instability in solution space, with possibly incoherent results.
• Data set with severe multicollinearity may have significant F-test, while none
of the t-tests for the individual predictors are significant.
• Multicollinearity produce high variability in coefficient estimates (b1, b2,…).
Dr. Mahesh
8 KC
Multicollinearity Contd.’
• Consider a MLR with two predictors:
yˆ b0 b1 x1 b2 x2
• If predictors x1 and x2 not correlated and orthogonal. In such cases, the predictors form solid
basis upon which the response surface y rests firmly, providing stable coefficients b1 and b2
(see figure A) with small variability SE(b1) and SE(b2).
• If the predictors x1 and x2 correlated (multicollinear situation), so that as one of them
increases, so does the other. In this case , the predictors no longer form a solid basis on which
the response surface y rests firmly (unstable), providing highly variable coefficients b1 and b2
(see figure B) due to high inflated values of SE(b1) and SE(b2).
Dr. Mahesh K C 9
Does method exist to identify multicollinearity in regression model?
• Variance Inflation Factors (VIFs) measures the correlation between the ith
predictor xi and the remaining predictor variables.
1
VIFi ; i 1, 2, 3,..., p
1 Ri2
• In general, VIFi > 5 and VIFi > 10 indicates moderate and severe
multicollinearity, respectively.
Dr. Mahesh
10 K C
Some Guidelines for model building using multiple linear regression
Step 1: Detect (using VIF criterion) and eliminate multicollinearity (if present) by
dropping variables. Drop one variable at a time until multicollinearity is eliminated.
Step 2: Run regression and check for influential observations, outliers and high
leverage observations.
Step 3: If one or more influential observations/outliers/ high leverage observations
are present delete one of them and rerun regression and go back to Step 2.
Step 4: Keep doing this until you get no further influential observations/ outliers/
high leverage observations or 10% (or 5% case to case) of data has been removed.
Step 5: Check for regression assumptions of linearity, normality, homoscedasticity and
independence of the residuals.
Step 6: If some of the assumptions in Step 5 is violated then try using transformations.
If NO transformation can be found which can correct for the violations then STOP.
Dr. Mahesh K C 11
Step 7: When all the regression assumptions are met then look at the p-value of
the F-test. If it is not significant then STOP.
Step 8: If the p-value of the F-test is significant then look at the p-values of the
individual the coefficients. If some of the p-values are not significant then
choose one of the variables with non-significant p-value and drop it from the
model and run regression again.
Step 9: Repeat Step 8 until you get the p-values of all the coefficients significant.
Dr. Mahesh K C 12
Model Building: Health Care Revenue data
• These data were collected by Department of Health and Social Services (DHSS)
of the state of New Mexico and cover 52 of the 60 licensed facilities in New
Mexico in 1998. Specific definitions of the variables are given below. The
location of the facility is indicated whether it is the rural or non rural area.
Variable Definition
RURAL Rural home (1) and non-rural home (0)
BED Number of Beds in home
MCDAYS Annual medical in-patient days(hundreds)
TDAYS Annual Total Patient Days (Hundreds)
PCREV Annual Total Patient Care Revenue($100)
NSAL Annual nursing salaries ($100)
FEXP Annual Facilities Expenditure ($100)
• DHSS is interested to predict patient care revenue based on the other hospital
characteristics.
Dr. Mahesh K C 13
Model Building: HCR Data
• Objective: Build a model to predict patient care revenue.
• Response Variable: PCREV
• Continuous Predictors: BED, MCDAYS, TDAYS, NSAL, FEXP
• Categorical Predictor (dummy): RURAL
• Total records: 52
• Total Variables: 7 (6 predictors)
• No missing values
Dr. Mahesh K C 16
Summary Results of Model 9
Dr. Mahesh K C 17
References
• Shmueli, G., Bruce, P .C, Yahav, I., Patel, N.R., Lichtendahl, K .C.
(2018), Data Mining for Business Analytics, Wiley.
• Larose, D.T. & Larose, C.D. (2016), Data Mining and Predictive
Analytics, 2nd edition, Wiley.
• Kumar, U.D., (2018), Business Analytics-The Science of Data-
Driven Decision Making, 1st edition, Wiley.
Dr. Mahesh K C 18
Sessions 11&12: Multiple Regression
with Categorical Predictors & Variable
Selection Method-Backward Elimination
Dr. Mahesh K C 1
Regression with Categorical Predictors Using Indicator Variables
• Categorical variables can be included in model through use of indicator variables.
• Example: Consider Cars data set. We have: mpg, cylinders, cubicinches, hp, weightlbs, time.to.60
are continuous variables and brand-Categorical Variable with three levels US, Japan and Europe.
The variable “year” is not considered.
• For regression, categorical variable with k categories transformed to k – 1 indicator (dummy)
variables. Indicator variable is binary, equals 1 when observation belongs to category, otherwise
equals 0.
• Brand variable is transformed to two indicator (dummy) variables:
1 1
C1 if country is Japan C 2 if country is US
0 otherwise 0 otherwise
Dr. Mahesh
2 KC
Estimated Regression Equation with Categorical Predictors
• Including indicator variables into model produces estimated reg. eq:
mpg b 0 b1 ( cylinders ) b 2 ( cubicinche s ) b 3 ( hp ) b 4 ( weightlbs )
b 5 ( time .to . 60 ) b 6 C 1 b 7 C 2
Dr. Mahesh
3 KC
Variable Selection Methods
• Several variable selection methods available.
• Assist analyst in determining which variables to include in model.
• Algorithms help select predictors leading to optimal model.
• Four variable selection methods:
(1) Forward Selection
(2) Backwards Elimination
(3) Stepwise Selection
(4) Best Subsets
Dr. Mahesh
4 KC
The Partial F-Test (Theory optional)
• Suppose model has x1,…,xp predictors and we consider adding additional predictor x*.
• Calculate sequential sum of squares from adding x*, given existing x1,…,xp in model.
• Full sum of squares SSFull = x1,…,xp, x* in model.
• Reduced sum of squares SSReduced = x1,…,xp in model.
• Therefore, extra sum of squares SSExtra denoted by
SS Extra SS ( x * | x1 , x2 ,..., x p ) SS Full SS Re duced
• Null hypothesis for Partial F-Test
– Ho: No, SSExtra associated with x* does not contribute significantly to model
– Ha: Yes, SSExtra associated with x* does contribute significantly to model
• Test statistic for Partial F-Test SS Extra
F ( x * | x1 , x 2 , , x p )
MSE Full
Dr. Mahesh
5 KC
Backwards Elimination Procedure
• Procedure begins with all variables in model.
• Step 1:
Perform regression on full model with all variables
For example, assume model has x1,…,x4
• Step 2:
For each variable in model perform partial F-test
Select variable with smallest partial F-statistic, denoted Fmin
• Step 3:
If Fmin not significant, remove associated variable from model and return to Step 2
Otherwise, if Fmin significant, stop algorithm and report current model
If first pass, then current model is full model
If not first pass, then full set of predictors reduced by one or more variables
Dr. Mahesh
6 KC
Backwards Elimination Applied to Cars Data Set
• We begin with all predictors (excluding the predictor “Year”) included in the model.
• Partial F-statistic calculated for each predictor. Smallest F-statistic Fmin (= 0.5132) associated
with cubicinches. Here, Fmin is not significant at 5%, therefore cubicinches is dropped.
• On second pass predictor cylinders is eliminated as its Fmin (= 0.4425) which is not significant
at 5%.
• On third pass predictor time.to.60 is dropped with Fmin (=1.7229) which is not significant at
5%.
• Finally, all predictors are significant at 5% level.
• Procedure terminates with model (B):
Model B : mpg b0 b1 (hp ) b2 ( weightlbs ) b6 (brand )
Dr. Mahesh
7 KC
Backwards Elimination Applied to Cars Data Set
• Most of the time variable selection methods take care of multicollinearity. Still
one may check for the same with latest model.
• Based on Model B check influential, outliers and leverage values.
• Check assumptions on regression. If violated one may try transformation
either on response variable or predictors or both.
Dr. Mahesh K C 8
References
• Shmueli, G., Bruce, P .C, Yahav, I., Patel, N.R., Lichtendahl, K .C.
(2018), Data Mining for Business Analytics, Wiley.
• Larose, D.T. & Larose, C.D. (2016), Data Mining and Predictive
Analytics, 2nd edition, Wiley.
• Kumar, U.D., (2018), Business Analytics-The Science of Data-
Driven Decision Making, 1st edition, Wiley.
Dr. Mahesh K C 9
Session 14: Linear Discriminant Analysis
(LDA)
Dr. Mahesh K C 1
LDA: Basic Concept and Objectives
• A technique for analyzing multivariate data when the response variable is categorical
and the predictors are interval in nature.
• In most cases the dependent variable consists of two groups or classifications, like, high
versus normal blood pressure, loan defaulting versus non defaulting, use versus non use
of internet banking etc.
• The choice between three candidates, A, B or C in an election is an example where the
dependent variable consists of more than two groups.
• Objectives:
• Develop a discriminant function: A linear combination of predictors that will best
discriminate between the categories of response variable (groups).
• Examine whether there exist significance differences between the groups and the
predictors.
• Classify the cases to one of the groups based on the value of the predictors.
• Evaluate the accuracy of the classification.
Dr. Mahesh K C 2
Some examples LDA
• The technique can be used to answer the questions such as:
Dr. Mahesh K C 3
Fisher’s Linear Discriminant Function
• Typically considered more of a statistical classification method than a data
mining method introduced by R A Fisher in 1936.
• Obtain linear combination (known as discriminant function) of independent
variables that will best discriminate the groups in the dependent variables.
• Idea is to find linear functions of the measurements that maximize the ratio of
between-class variability to within-class variability. In other words, obtain
groups that are homogeneous and differ the most from each other.
• For each record, these functions are used to compute scores that measure the
proximity of that record to each of the classes.
• A record is classified as belonging to the class for which it has the highest
classification score.
Dr. Mahesh K C 4
The LDA Model and assumptions
• Let X1, X2,…, Xk denotes the predictors and let D, the discriminant score. Then a linear
combinations of the predictors is given by:
D b1 X 1 bk X k .
where bi stands for discriminant coefficients or weights. Note that with k groups we
need k-1 discriminators.
• Assumptions:
1) The groups must be mutually exclusive and have equal sample size.
2) Groups should have the same variance-covariance matrices on independent
variables.
3) The independent variables should be multivariate normally distributed.
• If assumption 3 is met, then LDA is more powerful tool than other classification
methods such as logistic regression (roughly 30% more efficient, see Efron 1975).
• LDA perform better as sample size increases.
Dr. Mahesh K C 5
Statistics associated with LDA
• Cannonical Correlation: measures the extent of association between the
discriminant function and the groups in the response variable.
• Hit ratio (accuracy) : sum of the diagonal elements divided by the total number of
cases.
• Eigenvalue: the ratio of between-group to within-group sum of squares. Large
eigenvalues imply superior function.
Dr. Mahesh K C 6
The Iris Flower Data
• This famous (Fisher's or Anderson's) iris data set gives the measurements in
centimeters of the variables sepal length and width and petal length and width,
respectively, for 50 flowers from each of 3 species of iris. The species are Iris
setosa, Iris versicolor, and Iris virginica.
• Predictors: Sepal. Length, Sepal. Width, Petal. Length and Petal. Width
• Dependent Variable: Species with three levels Setosa, Versicolor and Virginica
• Total Observations: 150
• The group means (centroid) clearly show a separation between the groups and the
corresponding predictors.
• Since Species have three levels, we got two discriminant functions (LD1 and LD2) with
corresponding weights.
LD1 0.534Sepal.Length 2.125Sepal.Width 1.962 Petal.Length 3.561Petal.Width
LD 2 0.294 Sepal .Length 1.933Sepal .Width 1.143 Petal .Length 3.003 Petal .Width
Dr. Mahesh K C 8
Matrix plot: Iris data
Dr. Mahesh K C 9
LDA of iris data Cont’d
• The proportion of trace for LD1 = 0.993 and that of LD2 = 0.007. This implies the
% of separation achieved by the discriminant function.
• The predicted classification is: Setosa = 35, Versicolor = 36 and Virginica = 35 in
the training data.
• The eigenvalues corresponding to LD1 and LD2 are 44.23 and 3.71. Higher the
eigenvalue better the separation.
• The discriminant scores are obtained by using LD1 and LD2 for different record
values for the predictors.
Dr. Mahesh K C 10
Histogram of Discriminant Scores based on LD1 &LD2
Dr. Mahesh K C 11
Confusion Matrix & Accuracy
• Confusion Matrix: A matrix summarizes the correct and incorrect classifications
that a classifier produced for a certain dataset (see table 1).
• Accuracy: The overall accuracy of the correct classification. For the training data,
accuracy is 100%.
Table 1: Confusion matrix for training data Table 2: Confusion matrix for test data
Predicted Actual (Training) Predicted Actual
Setosa Versicolor Virginica Setosa Versicolor Virginica
Setosa 35 0 0 Setosa 15 0 0
Versicolor 0 36 0 Versicolor 0 13 1
Virginica 0 0 35 Verginica 0 1 14
Dr. Mahesh K C 12
References
• Shmueli, G., Bruce, P .C, Yahav, I., Patel, N.R., Lichtendahl, K .C.
(2018), Data Mining for Business Analytics, Wiley.
• Larose, D.T. & Larose, C.D. (2016), Data Mining and Predictive
Analytics, 2nd edition, Wiley.
• Kumar, U.D., (2018), Business Analytics-The Science of Data-
Driven Decision Making, 1st edition, Wiley.
Dr. Mahesh K C 13