BA-II Notes
BA-II Notes
Ramesh Kandela
[email protected]
Regression
• Regression analysis is a form of predictive modelling technique which investigates the
relationship between a dependent (target) and independent variable (s) (predictor).
• The variable you want to predict is the dependent variable.
• The variable used for prediction is the independent variable.
• Regression: technique concerned with predicting some variables by knowing others. The
process of predicting variable Y using variable X.
Linear regression is used for finding linear relationship between target and one or more
predictors. There are two types of linear regression- Simple and Multiple.
Examples of Relationships In Simple Linear Regression only one
independent variable is present and the
Independent Variable(s) Dependent Variable model has to find the linear relationship
Numerical / Categorical Numerical with one dependent variable.
Advertisement Sales In Multiple Linear Regression there are
Income Expenditure more than one independent variables for
Years of experience, Years of education, and Gender Salaries the model to find the relationship with
Square footage of the house and Number of rooms House price one dependent variable.
Application of Regression Analysis in Business
Some potential uses of regression analysis in business :
• How does a company’s sales level depend on its advertising levels?
• How do wages of employees depend on years of experience, years of
education, and gender?
• How does the total cost of producing a batch of items depend on the total
quantity of items that have been produced?
• How does the selling price of a house depend on such factors as the
appraised value of the house, the square footage of the house, the number
of bedrooms in the house, and perhaps others?
Simple Linear Regression
• Simple linear regression is useful for finding relationship between two variables. One is
predictor or independent variable and other is response or dependent variable.
• The dependent variable is the one being explained, and the independent variable is the one
used to explain the variation in the dependent variable.
• The equation of a linear relationship between two variables x and y is written as
y = a + bx+e
• For example, X may represent TV advertising and Y may represent sales. Then we can regress
Sales onto TV advertising by fitting the model
• Sales ≈ a+ b × TV Advertising + e
• a and b are two unknown constants that represent the intercept and slope terms in the
linear model. Together, a and b are intercept, slope known as the model coefficients or
parameters.
Estimating the Coefficients (Estimating the parameters of Regression )
• To find an intercept a and a slope b such that the resulting line is as close as possible to the
given data points. There are a number of ways of measuring closeness. However, by far the
most common approach involves minimizing the least squares criterion.
• The least squares estimate, is given by
b 0.252458
a 1.507333
Interpretation of the regression coefficients
• a is the intercept term—that is, the expected value of Y when X = 0
• b= slope of regression line that represents the expected change in the value of y for unit
change in the value of x.
• b is the slope—the average increase in Y associated with a one-unit increase in X.
a) Note that when b is positive, an increase in x will lead to an increase in y,
and a decrease in x will lead to a decrease in y. In other words, when b is
positive, the movements in x and y are in the same direction. Such a
relationship between x and y is called a positive linear relationship. The
regression line in this case slopes upward from left to right.
b) On the other hand, if the value of b is negative, an increase in x will lead
to a decrease in y, and a decrease in x will cause an increase in y. The
changes in x and y in this case are in opposite directions. Such a
relationship between x and y is called a negative linear relationship. The
regression line in this case slopes downward from left to right.
Interpretation of a
• Consider a household with zero income.
• A household with no income is expected to spend $150.50 per month. Alternatively, we can
also state that the point estimate of the average monthly food expenditure for all
households with zero income is $150.50.
Interpretation of b
• The value of b in a regression model gives the change in y (dependent variable) due to a
change of one unit in x (independent variable).
• on average, a $1 increase in income of a household will increase the expenditure by $.2525.
We can also state that, on average, a $100 increase in income will result in a $25.25
increase in expenditure.
Fitted Value
After estimates the model coefficients a and b, we can predict future Expenditure (sales) based
on a particular value of Income (advertising) by computing
yො = a + bX
• where ˆy indicates a prediction of Y on the basis of X . Here we use a hat symbol, ˆ , to
denote the estimated value for an unknown parameter or coefficient, or to denote the
predicted value of the response.
• A fitted value is the predicted value of the dependent variable
Fundamental Equation for Regression
Observed Value = Fitted Value + Residual Y
ŷ =Y a= bX++bX
Interpretation of the regression coefficients a
Change
• a is the intercept term—that is, the expected value of Y when X = 0 b = Slope in Y
Change in X
• b= slope of regression line that represents the expected change in a = Y-intercept
the value of y for unit change in the value of x. X
• b is the slope—the average increase in Y associated with a one-unit
increase in X.
Find the least squares regression line for the data on incomes and expenditures on the seven
households given in Table . Use income as an independent variable and food expenditure as a
dependent variable. What is the Expenditure value when x is 36?
Income Expenditure Income(x) Expenditure(y) X-X̅ Y-Y̅ (X-X̅)(Y-Y̅) (X-X̅)^2
55 14 55 14 -0.14286 -1.42857 0.204082 0.020408
83 24 27.85714 8.571429 238.7755 776.0204
83 24 38 13 -17.1429 -2.42857 41.63265 293.8776
38 13 61 16 5.857143 0.571429 3.346939 34.30612
33 9 -22.1429 -6.42857 142.3469 490.3061
61 16 49 15 -6.14286 -0.42857 2.632653 37.73469
33 9 67 17 11.85714 1.571429 18.63265 140.5918
Σ(X-X̅)(Y-Y̅) Σ(X-X̅)^2
49 15 ΣX=386 ΣY=108 =447.5714 =1772.857
67 17
x̅ =55.14286 y̅=15.42857143
EXPENDITURE
a 1.507333 20
Predicted
15
Expenditure
10
Linear 5
(Predicted
Expenditure) 0
25 45INCOME 65 85 105
Residual(Error)
• The line will have a good fit if it minimizes the error between the estimated points on the line
and the actual observed points that were used to draw it.
• Using a and b, we write the estimated regression model as
ŷ = a + bX
• Where ŷ is the estimated or predicted value of y for a given value of x
• The values a and b must be chosen so that they minimize the error.
• residual, denoted by e, is the difference between the observed value y and the fitted value ŷ.
• Then ei = yi− ŷi represents the ith residual—this is the difference between the ith actual
value and the ith predicted value by linear model.
• We define the sum of squares of residuals/ errors(SSE)
Error(e=y-ොy)
Income Expenditure(y) X-X̅ Y-Y̅
(X-X̅)(Y-Y̅) (X-X̅)^2 Predicted(ොy) Error(e=y-y) (y-ොy)^2
55 14 -0.14286 -1.42857 0.204082 0.020408 15.39251 -1.39251 1.939073
83 24 27.85714 8.571429 238.7755 776.0204 22.46132 1.538678 2.367531
38 13 -17.1429 -2.42857 41.63265 293.8776 11.10073 1.899275 3.607245
61 16 5.857143 0.571429 3.346939 34.30612 16.90725 -0.90725 0.823107
33 9 -22.1429 -6.42857 142.3469 490.3061 9.838437 -0.83844 0.702976
49 15 -6.14286 -0.42857 2.632653 37.73469 13.87776 1.12224 1.259423
67 17 11.85714 1.571429 18.63265 140.5918 18.422 -1.422 2.022079
386 108 Sum 447.5714 1772.857 SSE=12.72143
55.14286 15.42857143
b0.252458
a 1.507333
ŷ = a + bX
yො = 1.5073 + 0.2525X
Standard error of the regression (S)
• The standard error of the regression (S), also known as the standard error of the estimate,
represents the average distance that the actual (observed) values fall from the regression
line.
• Conveniently, it tells you how wrong the regression model is on average using the units of
the dependent variable. Smaller values are better because it indicates that the actual values
are closer to the fitted line.
• In this formula, n-2 represents the degrees of freedom for the regression model. The reason
df = n-2 is that we lose one degree of freedom to calculate x and one for y.
• SSE=12.72143
• The degrees of freedom for a simple linear regression model is df = n – 2 (n-2=7-2=5)
• Se=1.5939
Root Mean Square Error (RMSE)
• This is the root of the mean of the squared errors.
• Most popular (has same units as y)
• Root Mean Square Error (RMSE) is a standard way to measure the error of a
model in predicting quantitative data.
• The total sum of squares SST is a measure of the total variation in expenditures.
• The Regression sum of squares SSR is the portion of total variation explained by the regression model
(or by income), and the error sum of squares SSE is the portion of total variation not explained by the
regression model.
• Hence, we can state that 90% of the total variation in the expenditures of households occurs because
of the variation in their incomes, and the remaining 10% is due to randomness and other variables.
Hypothesis Test for Regression Coefficient (t-Test)
The null and alternative hypotheses for the Simple Linear Regression model can be stated as follows:
• H0: There is no relationship between X and Y Hypothesis testing is used to confirm if
• Ha: There is a relationship between X and Y our beta coefficients are significant in a
linear regression model. Every time we
Thus, the null and alternative hypotheses can be restated as follows: run the linear regression model, we
• H0: b = 0 test if the line is significant or not by
checking if the coefficient is significant.
• Ha: b ≠ 0
• The value of the test statistic t is calculated as
Make a Decision
• The value of the test statistic t is greater than the critical value of t and it falls in the rejection region. Hence,
we reject the null hypothesis and conclude that a linear relationship between x and y.
• The p-value is less than 0.05, we reject the null hypothesis and conclude that there is significant evidence
suggesting a linear relationship between x and y.
Interpretation of b
If b > 0, then x(predictor) and y(target) have a positive relationship. That is increase in x will increase y.
If b < 0, then x(predictor) and y(target) have a negative relationship. That is increase in x will decrease y.
Hypothesis Test for Regression Coefficient (t-Test)
Income Expenditure X-X̅ Y-Y̅ (y-ොy)^2
(X-X̅)(Y-Y̅) (X-X̅)^2 Predicted(ොy) Error(e=y-y)
55 14 -0.14286 -1.42857 0.204082 0.020408 15.39251 -1.39251 1.939073
83 24 27.85714 8.571429 238.7755 776.0204 22.46132 1.538678 2.367531
Standard Error of 38 13 -17.1429 -2.42857 41.63265 293.8776 11.10073 1.899275 3.607245
regression Coefficient 61 16 5.857143 0.571429 3.346939 34.30612 16.90725 -0.90725 0.823107
33 9 -22.1429 -6.42857 142.3469 490.3061 9.838437 -0.83844 0.702976
49 15 -6.14286 -0.42857 2.632653 37.73469 13.87776 1.12224 1.259423
67 17 11.85714 1.571429 18.63265 140.5918 18.422 -1.422 2.022079
386 108 Sum 447.5714 1772.857 SSE
X̅X-X̅
=55.142
)^2 Y̅=15.4285714 Sqrt=
86 3 42.10531 12.72143
b0.252458 Se=1.5939
SE(b)=1.5939/42.10531=0.037855 a 1.507333
t=.2525/.037855
=6.66
Df= n -2 =7 -2 =5
The value of the test statistic t 6.66 is greater than the critical value of t 2.571, and
it falls in the rejection region. Hence, we reject the null hypothesis and conclude that x (income)
determines y (food expenditure) positively. That is, food expenditure increases with an
increase in income and it decreases with a decrease in income.
Standard error of the coefficient
• The standard error of an estimator reflects how it varies under repeated sampling.
SE(b)=1.5939/42.10531=0.037855
• The standard error of the coefficient measures how precisely the model estimates the
coefficient's unknown value. The standard error of the coefficient is always positive.
• Use the standard error of the coefficient to measure the precision of the estimate of the
coefficient. The smaller the standard error, the more precise the estimate. Dividing the
coefficient by its standard error of the coefficient calculates a t-value
Confidence Intervals
• These standard errors can be used to compute confidence intervals. A 95% confidence
interval is defined as a range of values such that with 95% probability, the range will contain
the true unknown value of the parameter. It has the form b±2SE(b).
• That is, there is approximately a 95% chance that the interval [b-2SE(b), b+2SE(b)] will
contain the true value of b.
Simple Linear Regression with R
• # read in the first worksheet from the workbook Boston.xlsx
# first row contains variable names
• library(readxl)
• Boston <- read_excel("E:/Kandela/Analytics/IBS/BA-II/Data/Boston.xlsx")
• plot(rm,medv)
• lm(medv~rm)
• slr=lm(medv~rm)
• summary(slr)
• names(slr)
• coef(slr)
• confint(slr)
• abline(slr)
• medv
• fitted.values(slr)
• Actual=medv
• Predicted=fitted.values(slr)
• Residual=residuals(slr)
• Error=data.frame(Actual,Predicted,Residual)
Regression with Excel
Regression in Excel using formulas
• The Excel SLOPE(known_y's, known_x's) and INTERCEPT(known_y's, known_x’s) functions return the slope and
intercept, respectively, of the least-squares line.
• RSQ function to determine the R2 value. & STEYX(known_y's, known_x’s) returns the standard error of the
predicted y-value for each x in the regression.
SUMMARYOUTPUT Regression in Excel using Data Analysis ToolPack
Regression Statistics • Select "Data" from the toolbar. The "Data" menu displays.
Multiple R 0.948054203 • Select "Data Analysis". The Data Analysis - Analysis Tools dialog box displays. From
R Square 0.898806772 the menu, select "Regression" and click "OK".
Adjusted R Square 0.878568127 • In the Regression dialog box, click the "Input Y Range" box and select the
Standard Error 1.595082087 dependent variable data .
Observations 7 • Click the "Input X Range" box and select the independent variable data.
ANOVA • Click "OK" to run the results.
df SS MS F Significance F
Regression 1 112.9928514 112.9928514 44.41042122 0.001148513
Residual 5 12.72143433 2.544286865
Total 6 125.7142857
Regression Statistics
Multiple R 0.930819542 The percentage of error is=
SSE/y̅
R Square 0.866425021
Adjusted R =4108/99151=0.04=4%
Square 0.858329567
Standard Error 4108.99309
Observations 36
ANOVA
df SS MS F Significance F
Regression 2 3614020661 1807010330 107.0261279 3.75374E-15
Residual 33 557166199.1 16883824.22
Total 35 4171186860
Co-efficient of Multiple Determination (R-square) and Adjusted R-square
Adjusted R2 is a corrected goodness-of-fit (model accuracy) measure for linear models. It identifies the
percentage of variance in the target field that is explained by the independent variables.
• R2 value is adjusted by normalizing both SSE and SST with the corresponding degrees of freedom.
• The adjusted R-square with k predictors is given by
or
• N = the sample size
• k = the number of independent variables in the regression equation
• Every time you add a independent variable to a model, the R-square increases, even if the
independent variable is insignificant. It never declines. Whereas Adjusted R-square increases only
when independent variable is significant and affects dependent variable.
• The adjusted R-square value is always less than or equal to the R-square value.
• If no increase in adjusted R-square after adding a new predictor variable to the model may indicate
that the newly added variable may not be statistically significant, or it is not explaining the variation in
the response variables that is not explained by the variables that are already present in the model.
Find the Adjusted R-square
Regression Statistics
Multiple R 0.930819542
R Square 0.866425021
Adjusted R Square 0.858329567
Standard Error 4108.99309
Observations 36
ANOVA
df SS MS F Significance F
Regression 2 3614020661 1807010330 107.0261279 3.75374E-15
Residual 33 557166199.1 16883824.22
Total 35 4171186860
Machine Hours and Production Runs combine to explain 85.83% of the variation in Overhead
Advertising dataset
• The Advertising data displays sales (in thousands of units) for a particular product as a
function of advertising budgets (in thousands of dollars) for TV, radio, and newspaper media.
• As a Business Analyst, to suggest based on this data, a marketing plan for next year will result
in high product sales.
ANOVA
df SS MS F Significance F
Regression 3 4860.323487 1620.107829 570.2707037 1.57523E-96
Residual 196 556.8252629 2.840945219
Total 199 5417.14875
Adjusted R
Now a can be interpreted as the average salary Square 0.115819379
among males, a+b as the average salary among
Standard Error 10584.26048
females, and b as the average difference in
salary between females and males.
Observations 208
Female Predicted Salary = 45505 − 8296(1) =
Standard Lower Upper
37209 Coefficients Error t Stat P-value 95% 95%
• The results in this table suggests that interactions are important. The p-value for the interaction
term TVradio is extremely low, indicating that there is strong evidence for HA : b3= 0.
• The R2 for the interaction model is 96:8%, compared to only 89:7% for the model that predicts
sales using TV and radio without an interaction term.
In this situation, given a fixed budget of $100; 000, spending half on radio and half on TV may
increase sales more than allocating the entire amount to either TV or to radio.
Interaction Effect
• Objective: To use multiple regression with an interaction variable to see whether the effect
of years of experience on salary is different across the two genders.
• Solution: The multiple regression output appears below.
• library(MASS)
• attach(Boston)
• LR=lm(medv~lstat)
• summary(LR)
• plot(LR)
• MLR=lm(medv~lstat+I(lstat^2))
• summary(MLR)
• plot(MLR)
• par(mfrow=c(2,2))
• #Change back to 1 x 1
• par(mfrow=c(1,1))
Assumptions of Multiple Regression Model
1.There should be a linear relationship between dependent (response) variable and independent (predictor)
variable(s). A linear relationship suggests that a change in response Y due to one unit change in X is
constant, regardless of the value of X.
2.The error terms must have constant variance. This phenomenon is known as homoscedasticity. The
presence of non-constant variance is referred to heteroscedasticity.
3. The error terms must be normally distributed.
4.The independent variables should not be correlated. Absence of this phenomenon is known as
multicollinearity.
5.There should be no correlation between the residual (error) terms. Absence of this phenomenon is known
as Autocorrelation.
Transformation of Variables
• Transformations are applied to accomplish certain objectives such as to ensure linearity, to achieve
normality, or to stabilize the variance.
• The necessity for transforming the data arises because the original variables, or the model in terms of the
original variables, violates one or more of the standard regression assumptions.
• Some common transformations are log transformation, square root transformation ,reciprocal square root
transformation and polynomial transformation.
Non-Linearity of the data
• Nonlinear transformations of variables are often used because of curvature detected in scatter plots.
The prediction accuracy of the model can be significantly reduced.
• Residual plots are a useful graphical tool for identifying non-linearity. plot the residuals versus the
predicted (fitted) values ˆyi. If there exist any pattern (may be, a parabolic shape) in this plot, consider
it as signs of non-linearity in the data.
• If the residual plot indicates that there are non-linear associations in the data, then a simple approach
is to use non-linear transformations of the dependent variable Y or any of the explanatory variables,
the Xs. Or we can do both, such as the natural logarithm, the square root, the reciprocal, and the
square. in the regression model.
If the errors are not normally distributed, non – linear transformation of the variables
(response or predictors) can bring improvement in the model.
Multicollinearity Test
• Multicollinearity means the presence of an accurate linear relationship between among
independent variables. No two independent variables (regressors) must share a strong correlation.
The multicollinearity problem, which drives to a less effective estimator of coefficients (Wooldridge,
2013). correlation matrixes and variance inflation factor (VIF) were used to test the presence of
multicollinearity.
• Collinearity reduces the accuracy of the estimates of the regression coefficients, it causes the
standard error for ˆb to grow. The t-statistic for each predictor is calculated by dividing ˆ bj by its
standard error. Consequently, collinearity results in a decline in the t-statistic. As a result, in the
presence of collinearity, we may fail to reject H0 : bj = 0.
• A better way to assess multicollinearity is to compute the variance inflation factor (VIF).
• VIF determines the strength of the correlation between the independent variables. It is predicted by
taking a variable and regressing it against every other variable.
• The VIF for each variable can be computed using the formula
• As a rule of thumb, a VIF value that exceeds 5 or 10 indicates a problematic amount of collinearity.
Dealing with Multi-collinearity
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -259.51752 55.88219 -4.644 4.66e-06 ***
Age -2.34575 0.66861 -3.508 0.000503 ***
Limit 0.01901 0.06296 0.302 0.762830
Rating 2.31046 0.93953 2.459 0.014352 *
The Credit data, a regression of balance on age, rating, and limit indicates that the
predictors have VIF values of 1.01, 160.67, and 160.59. As we suspected, there is
considerable collinearity in the data
• Model 1 is a regression of balance on age and limit, and Model 2 a regression of balance on
rating and limit. The standard error of ˆ βlimit increases 12-fold in the second regression,
due to collinearity.
• The first is to drop one of the problematic variables from the regression. So, dropping limit
from the set of predictors has effectively solved the collinearity problem without
compromising the fit.
• The second solution is to combine the collinear variables together into a single predictor. For
instance, we might take the average of standardized versions of limit and rating in order to
create a new variable that measures credit worthiness.
Autocorrelation
• The presence of correlation in error terms drastically reduces model’s accuracy. This usually
occurs in time series models where the next instant is dependent on previous instant. If the
error terms are correlated, the estimated standard errors tend to underestimate the true
standard error.
• If this happens, it causes confidence intervals and prediction intervals to be narrower.
Narrower confidence interval means that a 95% confidence interval would have lesser
probability than 0.95 that it would contain the actual value of coefficients. Also, lower
standard errors would cause the associated p-values to be lower than actual. This will make
us incorrectly conclude a parameter to be statistically significant
• The Durbin Watson d statistic applied to check the autocorrelation problem in the regression
model.
MLR=lm(Balance~Age+Rating)
• DW = 2 — →no autocorrelation, summary(MLR)
• 0 < DW < 2 — →positive autocorrelation library(lmtest)
• 2 < DW < 4 — →negative autocorrelation. library(zoo)
dwtest(MLR)
Assumptions with R
• library(MASS)
View(mtcars)
• attach(Boston)
LR=lm(mpg~hp)
• LR=lm(medv~lstat)
summary(LR)
• summary(LR) plot(LR)
• plot(LR) MLR=lm(mpg~hp+I(hp^2))
• MLR=lm(medv~lstat+I(lstat^2)) summary(MLR)
• summary(MLR) Credit=read_excel('Credit.xlsx')
• plot(MLR) View(Credit)
• par(mfrow=c(2,2)) attach(Credit)
• plot(MLR) MLR=lm(Balance~Age+Limit+Rating)
• #Change back to 1 x 1 summary(MLR)
• par(mfrow=c(1,1))
vif(MLR)
MLR=lm(Balance~Age+Rating)
• library(readxl)
summary(MLR)
• library(carData)
• library(car) library(lmtest)
• data("mtcars") library(zoo)
• head(mtcars) dwtest(MLR)
• attach(mtcars)
• plot(hp,mpg)
Include/Exclude Decisions
• The t-values of regression coefficients can be used to make include/exclude decisions for
explanatory variables in a regression equation.
• Always trying to get the best fit possible, but the principle of parsimony suggests using the
fewest number of variables.
• Look at a variable’s t-value and its associated p-value. If the p-value is above some accepted
significance level, such as 0.05, this variable is a candidate for exclusion.
• Check whether a variable’s t-value is less than 1 or greater than 1 in magnitude. If it is less
than 1, then it is a mathematical fact that se will decrease (and adjusted R2 will increase) if
this variable is excluded from the equation.
• Look at t-values and p-values, rather than correlations, when making include/exclude
decisions. An explanatory variable can have a fairly high correlation with the dependent
variable, but because of other variables included in the equation, it might not be needed.
Model Building
❖Model building is the process of deciding which independent variables to include in the model.
❖When building a model, it is best to start with a few IV’s and then begin adding other variables.
However, when adding a variable, check for:
• Improved prediction (increase in adjusted R2)
• Statistically significant estimated coefficients
• Do other coefficients change when adding the new one?
• Particularly look for sign changes for estimated coefficients.
There are three types of equation-building procedures:
Forward—begins with no explanatory variables in the equation and successfully adds one at a time
until no remaining variables make a significant contribution.
Backward—begins with all potential explanatory variables in the equation and deletes them one at
a time until further deletion would do more harm than good.
Stepwise—is much like a forward procedure, except that it also considers possible deletions along
the way.
All of these procedures have the same basic objective—to find an equation with a small se and a large R2
(or adjusted R2).
Stepwise Regression
• Stepwise regression is a combination of forward selection and backward elimination procedure.
• Stepwise regression is a way to build a model by adding or removing predictor variables. It involves
adding or removing potential explanatory variables in succession and testing for statistical significance
after each iteration/R^2.
Stepwise Regression in R
Build regression model from a set of candidate predictor variables by entering predictors based on p
values, in a stepwise manner until there is no variable left to enter any more. The model should include
all the candidate predictor variables.
install.packages("olsrr")
library(olsrr)
#import credit data
library(readxl)
credit=read_excel('Credit.xlsx')
attach(credit)
MLR=lm(Balance~ Age+Rating+Limit)
summary(MLR)
# Stepwise Regression in R
ols_step_both_p(MLR)
ols_step_both_p(MLR,details = TRUE)
Task with Auto data set
This question involves the use of multiple linear regression on the Auto data set.
• (a) Produce a scatterplot matrix which includes all of the variables in the data set.
• (b) Compute the matrix of correlations between the variables using the function cor(). You will need to exclude the
name variable, which is qualitative.
• (c) Use the lm() function to perform a multiple linear regression with mpg as the response and all other variables
except name as the predictors. Use the summary() function to print the results. Comment on the output. For instance:
• ii. Which predictors appear to have a statistically significant relationship to the response?
• iii. What does the coefficient for the year variable suggest?
• (d) Use the plot() function to produce diagnostic (residual) plots of the linear regression fit. Comment on any
problems you see with the fit.
• (f) Try a few different transformations of the variables, such as log(X), √ X, X^2. Comment on your findings
Happy Analyzing
Logistic Regression
Ramesh Kandela
[email protected]
Logistic Regression
Logistic regression is the appropriate regression analysis to conduct when the dependent
variable is Qualitative/Category (binary). This type of statistical model is often used for
classification and predictive analytics.
The quantity p(X)/[1−p(X)] is called the odds ratio. The odds in favor of event occurring is defined
as the probability the event will occur divided by the probability the event will not occur. (0 to ∞)
By taking the logarithm of both sides Multiple Logistic Regression
• The logit is also called the log-odds, since it is the log of the ratio between the estimated probability
for the positive class and the estimated probability for the negative class.(-∞ to ∞)
• in a logistic regression model, increasing X by one unit changes the log odds by β1 or equivalently it
multiplies the odds by e^β1.
(Odds Ratio=e^β)
Odds ratio the value is calculated by dividing the probability of success by the probability of
failure.
Odds Ratio=odds1/odds0
Interpret the logistic regression coefficients
• The interpretation of the odds ratio depends on whether the predictor(Independent
variable) is categorical or continuous.
• Odds ratios that are greater than 1 indicate that the event is more likely to occur as the
predictor increases. Odds ratios that are less than 1 indicate that the event is less likely to
occur as the predictor increases.
• Numerical feature: If you increase the value of feature xj by one unit, the estimated odds
change by a factor of exp(βj).
• Categorical feature: Changing the feature xj from the reference category to the other
category changes the estimated odds by a factor of exp(βj).
Odds Ratio Interpretation
exp(.1229589) = 1.13
exp(.979948) = 2.66
exp(. 0.0590632) = 1.06
• This fitted model says that, holding math and reading are constant, the odds of getting into an honors class
for females (female = 1)over the odds of getting into an honors class for males (female = 0) is exp(.979948)
= 2.66. In terms of percent change, we can say that the odds for females are 166% higher than the odds
for males.
• The coefficient for math says that, holding female and reading are constant, we will see 13% increase in
the odds of getting into an honors class for a one-unit increase in math score (exp(.1229589) = 1.13. )
• The coefficient for reading says that, holding female and math are constant, we will see 6% increase in
the odds of getting into an honors class for a one-unit increase in reading score (exp(. 0.0590632) = 1.06)
Estimating the Logistic Regression Coefficients (β0 and β1)
• Logistic regression uses method of maximum likelihood to find the best fitting model.
• Maximum likelihood to estimate the parameters β0 and β1
Most statistical packages can fit linear logistic regression models by maximum likelihood.
Coefficients(b) S.E. z value P-value Exp(b)
balance 0.0055 0.0002 -29.49 <2e-16 1.006
Constant -10.6513 0.361 24.95 <2e-16 0
• If a coefficient b is positive, then if its X increases, the log odds increases, so the probability
of being in category 1 increases. The opposite is true for a negative b.
• Xs are positively correlated with being in category 1 (the positive bs) and which are
positively correlated with being in group 0 (the negative bs).
Testing for Significance
Wald’s test :Wald’s test is used for checking statistical significance of individual predictor variables (equivalent to t-
test in MLR).
Likelihood Ratio (or Deviance) Test:The test for overall significance is based upon the value of G test statistic. The
sampling distribution of G follows a chi-square distribution with degrees of freedom equal to no. of independent
variable ((equivalent to F-test in MLR).
Wald’s test
• Wald’s test is used for checking statistical significance of individual independent variables .
The null and alternative hypotheses for Wald’s test are:
• Wald test for individual regression coefficient H0 : βk =0
• Wald test for individual regression coefficient βk not equal to zero
• P value<.05, reject H0. In other words, we conclude that there is indeed an association
between balance and probability of default.
• To be precise, a one-unit increase in balance is associated with an increase in the log odds of
default by 0.0055 units.
Testing Global Null Hypothesis Chi-squared test for
the complete
regression model
• Null Hypothesis:
H0: β1=β2=…=βK=0β
1=β2=…=βK=0
• Alternative
Hypothesis : H1: not
all regression
coefficients are
zero.
The p-value in Table is 0.830, we retain the null hypothesis, that is the logistic regression model fits the data
Making Predictions
Estimated coefficients of the logistic regression model that predicts the probability of default
using balance.
• Once the coefficients have been estimated, we can compute the probability of default for
any given credit card balance.
• What is our estimated probability of default for someone with a balance of $1000?
• The predicted probability of default for an individual with a balance of $2, 000 is 0.586
Making Predictions
• For the Default data, estimated coefficients of the logistic regression model that predicts the
probability of default using balance, income, and student status. Student status is encoded
as a dummy variable student[Yes], with a value of 1 for a student and a value of 0 for a non-
student. In fitting this model, income was measured in thousands of dollars.
The negative coefficient for student in the multiple logistic regression indicates that for a
fixed value of balance and income, a student is less likely to default than a non-student.
For example, a student with a credit card balance of $1, 500 and an income of $40, 000
has an estimated probability of default of
A non-student with the same balance and income has an estimated probability of default of
Logistic Regression with R
• # import the data
• library(ISLR) prob_pred = predict(logreg, type = 'response', newdata
=data.frame(balance=c(1000,1500,2000)))
• attach(Default)
prob_pred
• summary(Default)
y_pred = ifelse(prob_pred > 0.5, 1, 0)
• logreg=glm(default~balance, family = binomial("logit"))
y_pred
• summary(logreg)
• # Predicting mlogreg=glm(default~balance+income+student, family =
• prob_pred = predict(logreg, type = 'response', newdata binomial("logit"))
=data.frame(balance=1000)) summary(mlogreg)
• prob_pred
• y_pred = ifelse(prob_pred > 0.5, 1, 0) prob_pred = predict(mlogreg, type = 'response', newdata
• y_pred =data.frame(balance=1500,income=40000,
• prob_pred = predict(logreg, type = 'response', newdata student=c("Yes","No")))
=data.frame(balance=2000)) prob_pred
• prob_pred y_pred = ifelse(prob_pred > 0.5, 1, 0)
• y_pred = ifelse(prob_pred > 0.5, 1, 0) y_pred
• y_pred
Thank You
Time Series Analysis
Ramesh Kandela
[email protected]
TIME SERIES ANALYSIS
• A time series is a set of numerical values of some variable obtained at regular period over
time. The series is usually tabulated or graphed in a manner that readily conveys the
behaviour of the variable under study.
• A statistical technique that attempts to forecast future values of the time series by examining
past observations of the data only.
• Time series are monthly, trimestral, or annual, sometimes weekly, daily, or hourly (study of
road traffic).
Examples of Forecasting Applications
• When a company plans its ordering or production schedule for a product it sells to the public,
it must forecast the customer demand for this product so that it can stock appropriate
quantities—neither too many nor too few.
• When an organization plans to invest in stocks, bonds, or other financial instruments, it
typically attempts to forecast movements in stock prices and interest rates.
• When government officials plan policy, they attempt to forecast movements in
macroeconomic variables such as inflation, interest rates, and unemployment.
Components of a Time Series
The various reasons or the forces which affect the values of an observation in a time series
are the components of a time series. The time-series data contain four components: trend,
seasonality, cyclicality and irregularity. Not all time-series have all these components.
100
• Trend: which describe the movement along the term.
80
20
variations.
0
(a) Trend
25 35 80
20 30 70
60
25
15 50
20 40
10
15 30
5 20
10
10
0 5 0
1 3 5 7 9 11 13 15 17 19 21 23 1 3 5 7 9 11131517192123252729313335373941434547
0
1 2 3 4 5 6 7 8 9 101112131415161718192021222324
• As a result, the average will change, or move, as new observations become available.
• To use moving averages to forecast, we must first select the order k, or number of time series values,
to be included in the moving average.
• A smaller value of k will track shifts in a time series more quickly than a larger value of k.
• If more past observations are considered relevant, then a larger value of k is better.
Moving Averages Week Sales 3MA Forecast Absolute Squared Absolute %
Forecast Error Error Error Error
1 17
2 21
3 19
24
MAE = = 2.67 4 23 19 4 4 16 17.39
9 5 18
92 21 -3 3 9 16.67
MSE = = 10.22 6 16 20 -4 4 16 25.00
9
RMSE = MSE 7 20 19 1 1 1 5.00
RMSE=3.2 8 18 18 0 0 0 0.00
129.21 9 22 18 4 4 16 18.18
MAPE = = 14.36%
9 10 20 20 0 0 0 0.00
11 15 20 -5 5 25 33.33
12 22 19 3 3 9 13.64
24.00 92.00 129.21
Mean 2.67 10.22 14.36
RMSE 3.20
Weighted Moving Averages
• To use this method we must first select the number of data values to be Week Sales
1 17
included in the average. 2 21
• Next, we must choose the weight for each of the data values. 3 19
4 23
• The more recent observations are typically given more weight than 5 18
older observations. 6 16
7 20
• For convenience, the weights should sum to 1. 8 18
9 22
An example of a 3-period weighted moving average (3WMA) is: 10 20
• Using this weighted average, our forecast for week 4 is computed as follows: 11 15
• Forecast for week 4 3WMA = .2(17) + .3(21) + .5(19) = 19.2
12 22
Exponential smoothing is a forecasting method for univariate time series data. This method
produces forecasts that are weighted averages of past observations where the weights of
older observations exponentially decrease.
Exponential Smoothing
• SES assumes a fairly steady time-series data with no significant trend, seasonal or cyclical component
• Level (or Intercept) equation 𝐹𝑡+1 = 𝛼𝑌𝑡 + (1 − 𝛼)𝐹𝑡
• The closer is to 1 more importance is given to recent values and the closer it is to 0 more
importance to past values.
• The computer decides the optimal value of that minimizes the RMSE
• Since the model uses one smoothing constant, it is called single exponential smoothing.
Exponential Smoothing
α=.3
α=.2 Time
Week Time Serie Value Forecast Error Squared Error Serie Square
1 17 #N/A Week Value Forecast Error d Error
2 21 17.00 4.00 16
1 17 #N/A
3 19 17.80 1.20 1.44
2 21 17.00 4.00 16.00
4 23 18.04 4.96 24.6016
3 19 18.20 0.80 0.64
5 18 19.03 -1.03 1.065024
4 23 18.44 4.56 20.79
6 16 18.83 -2.83 7.984015
5 18 19.81 -1.81 3.27
7 20 18.26 1.74 3.02593
6 16 19.27 -3.27 10.66
8 18 18.61 -0.61 0.370131
7 20 18.29 1.71 2.94
9 22 18.49 3.51 12.34323
8 18 18.80 -0.80 0.64
10 20 19.19 0.81 0.657128
9 22 18.56 3.44 11.83
11 15 19.35 -4.35 18.93549
10 20 19.59 0.41 0.17
12 22 18.48 3.52 12.382
11 15 19.71 -4.71 22.23
Sum 10.92 98.80
12 22 18.30 3.70 13.69
MSE 8.98
Sum 8.03 102.86
RMSE 3.00
MSE 9.35
RMSE 3.06
Trend Projection
Linear Trend
• Many time series follow a long-term trend except for random variation.
• This trend can be upward or downward.
• A straightforward way to model this trend is to estimate a regression equation for Yt, using time
t(X) as the single explanatory variable.
• If a time series exhibits a linear trend, the method of least squares may be used to determine a trend
line (projection) for future forecasts.
• A linear trend means that the time series variable changes by a constant amount each time period.
• The independent variable is the time period and the dependent variable is the actual observed value
in the time series.
• Using the method of least squares, the formula for the trend projection is:
Tt = b0 + b1t
: where Tt = linear trend forecast in period t
b0 = Y-intercept of the linear trend line
b1 = slope of the linear trend line
t = the time period
For the trend projection equation Tt = b0 + b1t
ሜ
σ𝑛𝑡=1(𝑡 − 𝑡)lj (𝑌𝑡 − 𝑌) 𝑏0 = 𝑌ሜ − 𝑏1 𝑡lj
𝑏1 =
σ𝑛𝑡=1(𝑡 − 𝑡)lj 2
• library(forecast) accuracy(ma)
ma2=forecast(ma,12)
• #import data
accuracy(ma2)
• AirPassengers plot(ma2)
#Simple Exponential smoothing #Accuracy of all models
• #Naïve
es=ses(AirPassengers) #Naïve
• naivem=naive(AirPassengers,h accuracy(naivem)
summary(es)
=12,data=AirPassengers )
accuracy(es) #Moving Average
• naivem plot(es) accuracy(ma2)
• forecastnavie=forecast(naivem,12) #Linear Trend Model #Simple Exponential smoothing
month=1:144
• forecastnavie accuracy(es)
month
#Linear Trend Model
• accuracy(naivem) ltm=tslm(AirPassengers~month)
accuracy(ltm)
summary(ltm)
• accuracy(forecastnavie)
accuracy(ltm)
• plot(forecastnavie) summary(AirPassengers)
Case Study: Marriott Rooms Forecasting
Factor Extraction
Rotation of Factors
Interpretation of Factors
Formulate the Problem
• The objectives of factor analysis should be identified.
• The variables to be included in the factor analysis should be specified based on the researcher's past
research, theory, and judgment. The variables must be appropriately measured on an interval or
ratio scale.
• An appropriate sample size should be used. As a rough guideline, there should be at least four or
five times as many observations (sample size) as there are variables.
Conducting Factor Analysis with Airline Passenger Satisfaction
Download the airline passenger satisfaction data using the below link
https://www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction
Objective: Categorize the dimensions of passenger satisfaction into a smaller number of Factors.
Variables
1. Inflight wifi service 8. Inflight entertainment
2. Departure/Arrival time convenient 9. On-board service
3. Ease of Online booking 10. Leg room service
4. Gate location 11. Baggage handling
5. Food and drink 12. Check-in service
6. Online boarding 13. Inflight service
7. Seat comfort 14. Cleanliness
Construct the Correlation Matrix
The analytical process is based on a matrix of correlations between the variables.
Correlation matrix. A correlation matrix is a lower triangle matrix showing the simple correlations,
r, between all possible pairs of variables included in the analysis. The diagonal elements, which are
all 1, are usually omitted.
Bartlett's test of sphericity. Bartlett's test of sphericity is a test statistic used to examine the
hypothesis that the variables are uncorrelated in the population. In other words, the population
correlation matrix is an identity matrix; each variable correlates perfectly with itself (r = 1) but has
no correlation with the other variables (r = 0).
Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. The Kaiser-Meyer-Olkin (KMO)
measure of sampling adequacy is an index used to examine the appropriateness of factor analysis.
High values (between 0.5 and 1.0) indicate factor analysis is appropriate. Values below 0.5 imply
that factor analysis may not be appropriate. KMO and Bartlett's Test
Small values of the KMO statistic indicate that Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .781
Bartlett's Test of Sphericity Approx. Chi- 601676.
the correlations between pairs of variables
Square 894
cannot be explained by other variables and df 91
that factor analysis may not be appropriate. Sig. .000
Determine the Method of Factor Analysis
Factor Extraction
The primary objective of this stage is to determine the factors.
Estimates of initial factors are obtained using Principal components analysis and
common factor analysis.
• In principal components analysis, the total variance in the data is considered. The
diagonal of the correlation matrix consists of unities, and full variance is brought into the
factor matrix. Principal components analysis is recommended when the primary concern is
determining the minimum number of factors that will account for maximum variance in
the data for subsequent multivariate analysis. The factors are called principal components.
• In common factor analysis, the factors are estimated based only on the common variance.
Communalities are inserted in the diagonal of the correlation matrix. This method is
appropriate when the primary concern is to identify the underlying dimensions and the
common variance is of interest. This method is also known as principal axis factoring.
Factor Analysis Model
• Mathematically, each variable is expressed as a linear combination of underlying factors.
The amount of variance a variable shares with all other variables included in the analysis is
called communality.
• The covariation among the variables is described in terms of a small number of common
factors plus a unique factor for each variable. If the variables are standardized, the factor
model may be represented as:
Xi = Ai 1F1 + Ai 2F2 + Ai 3F3 + . . . + AimFm + ViUi
where
Xi = i th standardized variable
Aij = standardized multiple regression coefficient of variable i on common factor j
F = common factor
Vi = standardized regression coefficient of variable i on unique factor i
Ui = the unique factor for variable i
m = number of common factors
Factor Analysis Model
The unique factors are uncorrelated with each other and with the common factors. The
common factors themselves can be expressed as linear combinations of the observed
variables.
Fi = Wi1X1 + Wi2X2 + Wi3X3 + . . . + WikXk
Where:
Fi = estimate of i th factor
Wi = weight or factor score coefficient
k = number of variables
• It is possible to select weights or factor score coefficients so that the first factor explains the
largest portion of the total variance.
• Then a second set of weights can be selected, so that the second factor accounts for most of
the residual variance, subject to being uncorrelated with the first factor.
• This same principle could be applied to selecting additional weights for the additional factors.
Principal Components Analysis
• The principal components analysis is the most commonly used extraction method
• In principal components analysis, linear combinations of the observed variables are
formed.
The 1st principal component is the combination that accounts for the largest
amount of variance in the sample (1st extracted factor).
The 2nd principal component accounts for the next largest amount of variance and
is uncorrelated with the first (2nd extracted factor).
Successive components explain progressively smaller portions of the total sample
variance, and all are uncorrelated with each other.
Component Matrix (Factor Matrix) Component Matrix using Principal Component Analysis
• Factor matrix. A factor matrix contains the factor Component (Factor)
loadings of all the variables on all the factors extracted. Variables (Items) 1 2 3 4 h^2
Inflight
• Factor loadings. Factor loadings are simple correlations entertainment 0.834 -0.279 -0.102 0.205 0.825
between the variables and the factors. Cleanliness 0.695 -0.279 -0.469 0.13 0.798
• The components can be interpreted as the correlation Seat comfort 0.673 -0.243 -0.462 0.725
of each variable(item) with the component. The square Food and drink 0.599 -0.25 -0.513 0.217 0.732
of each loading represents the proportion of variance
Checkin service 0.356 0.196 -0.294 0.257
(think of it as an R2 statistic) explained by a particular
Ease of Online
component. booking 0.321 0.824 -0.151 0.811
• Communality. Communality is the amount of
Inflight wifi service 0.462 0.689 -0.221 0.743
variance a variable shares with all the other variables
being considered. This is also the proportion of Gate location 0.131 0.66 0.493 0.7
variance explained by the common factors. Departure/Arrival
• Summing the squared component loadings across the time convenient 0.201 0.64 0.369 0.588
components (columns) gives you the communality Inflight service 0.517 0.652 0.117 0.715
estimates for each item(variable). Baggage handling 0.511 0.638 0.1 0.685
Inflight On-board service 0.541 -0.105 0.563 0.621
entertainment
Leg room service 0.433 0.441 -0.119 0.396
Loading 0.834 -0.279 -0.102 0.205 Communality
Online boarding 0.549 0.23 -0.242 -0.618 0.795
Squared Loading 0.696 0.078 0.010 0.042 0.825
Extraction Method: Principal Component Analysis
Results of Principal Components Analysis % of Cumula
• Eigenvalue. The eigenvalue represents the total variance Factor Eigenvalues Variance tive %
explained by each factor. 1 3.8 27.144 27.144
• Eigenvalues are also the sum of squared component loadings 2 2.362 16.871 44.015
across all items (rows) for each component, which represent the 3 2.166 15.471 59.486
amount of variance in each item that can be explained by the 4 1.063 7.595 67.08
principal component. 5 0.951 6.792 73.873
• Percentage of variance. The percentage of the total variance 6 0.7 5.002 78.875
attributed to each factor. 7 0.54 3.857 82.732
8 0.515 3.676 86.408
• Scree plot. A scree plot is a plot of the Eigenvalues against the 9 0.469 3.353 89.762
number of factors in order of extraction. 10 0.369 2.633 92.395
11 0.328 2.346 94.741
12 0.295 2.108 96.848
13 0.253 1.808 98.657
14 0.188 1.343 100
Eigenvalues 3.8 2.36 2.17 1.06
% of =3.8/14 =2.36/14 =2.17/1 1.06/14
Variance 27.14 =16.87 4=15.47 =7.59
Results of Principal Components Analysis After removing Checkin service and Leg room service
Commun
Variables alities Component Matrix
% of Cumulati
Inflight wifi Component Component Eigenvalues Variance ve %
service 0.77 1 2 3 4 1 3.572 29.769 29.769
Departure/Arriva Inflight .849 -.299 .128 2 2.359 19.656 49.424
3 1.987 16.561 65.985
l time convenient 0.644 entertainment 4 1.038 8.646 74.632
Ease of Online Cleanliness .741 -.311 -.364 .148 5 .589 4.907 79.539
booking 0.819 6 .521 4.342 83.881
Seat comfort .714 -.274 -.372 7 .482 4.017 87.897
Gate location 0.722 Food and drink .659 -.286 -.408 .210 8 .369 3.072 90.970
Food and drink 0.726 Ease of Online .344 .813 -.177 9 .334 2.781 93.751
Online boarding 0.797 10 .296 2.464 96.214
booking 11 .254 2.113 98.327
Seat comfort 0.724 12 .201 1.673 100.000
Inflight wifi service .483 .676 -.276
Inflight
Gate location .158 .653 .517
entertainment 0.827
Departure/Arrival .210 .641 .434
On-board service 0.658
time convenient
Baggage
handling 0.718 Inflight service .444 .740
Inflight service 0.751 Baggage handling .439 .722
Cleanliness 0.8 On-board service .476 .648
Online boarding .568 .213 -.210 -.621
Determine the Number of Factors
• A Priori Determination
• The rotation is called oblique rotation when the axes are not maintained at right angles, and the
factors are correlated. Sometimes, allowing for correlations among factors can simplify the factor
pattern matrix. Oblique rotation should be used when factors in the population are likely to be
strongly correlated. promax
Factor Matrix Before and After Rotation
Variables Factor 1 Factor 2 Variables Factor 1 Factor 2
1 X 1 X
2 X X 2 X
3 X 3 X
4 X X 4 X
5 X X 5 X
6 X 6 X
A)High Loading Before Rotation B)High Loading After Rotation
Factor Matrix Before and After Rotation
Component Matrix Rotated Component Matrix
Component Component
1 2 3 1 2 3
Inflight entertainment .888 -.214 -.009 Cleanliness .891 .015 .048
Cleanliness .752 -.277 -.393 Food and drink .849 .027 -.034
Seat comfort .704 -.255 -.385 Seat comfort .841 .021 .030
Food and drink .678 -.255 -.444 Inflight entertainment .795 .036 .448
Ease of Online booking .250 .814 -.169 Ease of Online booking .018 .868 .010
Inflight wifi service .399 .689 -.132 Inflight wifi service .162 .782 .115
Gate location .138 .685 -.175 Gate location -.028 .717 -.057
Departure/Arrival time convenient .186 .678 -.067 Departure/Arrival time convenient -.039 .703 .059
Inflight service .495 .042 .708 Inflight service .046 .034 .863
Baggage handling .486 .048 .691 Baggage handling .045 .041 .844
On-board service .512 .014 .627 On-board service .108 .027 .802
Extraction Method: Principal Component Analysis
Rotation Method: Varimax.
Interpret Factors Rotated Component Matrix
Variables Comfort Convenience Service
Making final decisions Cleanliness 0.891 0.015 0.048
• The final decision about the number Food and drink 0.849 0.027 -0.034
of factors to choose is the number of Seat comfort 0.841 0.021 0.03
factors for the rotated solution that is Inflight
most interpretable.
entertainment 0.795 0.036 0.448
• A factor can then be interpreted in Ease of Online
terms of the variables that load high booking 0.018 0.868 0.01
on it.
Inflight wifi service 0.162 0.782 0.115
• To identify factors, group variables
Gate location -0.028 0.717 -0.057
that have large loadings for the same
factor. Departure/Arrival
time convenient -0.039 0.703 0.059
• Interpret factors according to the
meaning of the variables
Inflight service 0.046 0.034 0.863
Baggage handling 0.045 0.041 0.844
Reference: On-board service 0.108 0.027 0.802
Malhotra Naresh, K. and Dash, S. (2015) Marketing Research, An Applied Orientation. 7th Edition, Pearson, India.
Exploratory Factor Analysis (PCA) with R
Formulate the Problem
• #Import the data set Determination of Number of Factors
• library(readxl) ev <- eigen(cor(APS)) # get eigenvalues
ev$values
• APS<- read_excel("E:/Documents/APS.xlsx")
scree(APS, pc=FALSE) # Use pc=FALSE for factor analysis
• attach(APS)
Construction of the Correlation Matrix Rotation of Factors
# Varimax Rotated Principal Components
• #install.packages("psych")
# retaining 3 components
• library(psych) RPCA<- principal(APS, nfactors=3, rotate="varimax")
• # Bartletts test of spherecity RPCA
• bartlett.test(APS)
Interpretation of Factors
• # Kaiser-Meyer-Olkin measure
• KMO(APS)
Determine the Method of Factor Analysis & Factor Extraction
• #The princomp( ) function produces an unrotated principal component analysis.
• PCA <- princomp(APS, centre=True, scale=True)
• summary(PCA
• loadings(PCA)
Thank you
Cluster Analysis
Ramesh Kandela
[email protected]
Cluster Analysis
• Clustering refers to a very broad set of techniques for finding subgroups, or clusters, in a
data set. Clustering looks to find homogeneous subgroups among the observations.
• Cluster analysis is a class of techniques used to classify objects or cases into relatively
homogeneous groups called clusters. Objects in each cluster tend to be similar to each
other and dissimilar to objects in the other clusters. Cluster analysis is also called
classification analysis, or numerical taxonomy.
• in cluster analysis, there is no a priori information about the group or cluster membership
for any of the objects. Groups or clusters are suggested by the data, not defined a priori.
• cluster analysis is sometimes referred to as unsupervised classification.
• Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.
Applications of Clustering
• In Search Engines: Search engines also work on the clustering technique. The search result
appears based on the closest object to the search query. It does it by grouping similar data
objects in one group that is far from the other dissimilar objects. The accurate result of a
query depends on the quality of the clustering algorithm used.
• Customer Segmentation: It is used in market research to segment the customers based on
their choice and preferences.
An Ideal Clustering Situation
Assumptions
• sample is representative of the population.
Variable 1
• It is assumed that the variables are not correlated.
• There are no significant outliers.
• Data collected assumed standardized.
Variable 2
Customer Segmentation
• Customer segmentation is the process of dividing customers into groups based on common
characteristics. The most common ways in which businesses segment their customer base are:
1.Demographic information, such as gender, age, familial and marital status, income, education, and
occupation.
2.Geographical information, Examples of segmentation by geography include country, state, city, and
town.
3.Psychographics, such as social class, lifestyle, and personality traits.
4.Behavioral data, such as spending and consumption habits, product/service usage, and desired
benefits.
Clustering with Mall Customer Segmentation Data
• The dataset can be downloaded from the Kaggle website which can be found here.
• The data includes the following features:
Customer ID Gender Age Annual Income (k$) Spending Score (1-100)
1 Male 19 15 39
2 Male 21 15 81
Business Applications
• When Procter & Gamble test markets a new cosmetic, it may want to group U.S. cities into
groups that are similar on demographic attributes such as percentage of Asians, percentage
of Blacks, percentage of Hispanics, median age, unemployment rate, and median income
level.
• A marketing analyst at Coca-Cola wants to segment the soft drink market based on
consumer preferences for price sensitivity, preference of diet versus regular soda, and
preference of Coke versus Pepsi.
• Microsoft might cluster its corporate customers based on the price a given customer is
willing to pay for a product. For example, there might be a cluster of construction
companies that are willing to pay a lot for Microsoft Project but not so much for Power
Point.
• Samsung segmentation is one of the key elements of the marketing strategy of this
electronics company. The Samsung market segmentation consists of four segmentation
types: Geographic, Demographic, Behavioral, and Psychographic segmentation. Each form
of segmentation is further divided based on certain criteria.
Steps in Conducting Cluster Analysis
Formulate the Problem
• The Chebychev distance between two objects is the maximum absolute difference in values
for any variable.
• If the variables are measured in vastly different units, the clustering solution will be
influenced by the units of measurement. In these cases, before clustering respondents, we
must standardize the data by rescaling each variable to have a mean of zero and a standard
deviation of unity. It is also desirable to eliminate outliers (cases with typical values).
• Use of different distance measures may lead to different clustering results. Hence, it is
advisable to use different measures and compare the results.
A Classification of Clustering Procedures
Clustering Procedures
Ward’s
Method
• Of the hierarchical methods, Average linkage and Ward's methods have been shown to
perform better than the other procedures.
Dendrogram in Hierarchical clustering (Single Linkage)
A Dendrogram is a tree-like diagram that records the sequences of merges or splits.
Single Linkage
P1 P2 , P3 &P4
P1 0
P2 , P3 &P4 2.8 0
Dendrogram in Hierarchical clustering
• A Dendrogram is a tree-like diagram that records the sequences of merges or splits.
• The dendrogram is a tree-like structure that is mainly used to store each step as a memory
that the HC algorithm performs. In the dendrogram plot, the Y-axis shows the Euclidean
distances between the data points, and the x-axis shows all the data points of the given
dataset.
• The term hierarchical refers to the fact that clusters obtained by cutting the dendrogram at
a given height are necessarily nested within the clusters obtained by cutting the
dendrogram at any greater height.
• The number of clusters will be the number of vertical lines which are being intersected by
the line drawn using the threshold(Generally, we try to set the threshold in such a way that
it cuts the tallest vertical line).
• More the distance of the vertical lines in the dendrogram, more the distance between those
clusters.
How the Agglomerative Hierarchical clustering Work?
The primary objective of cluster analysis is to define the structure of the data by placing the most
similar observations in a group. To accomplish this task, we must address three basic questions.
1. How do we measure similarity?
2. How do we form clusters?
3. How many clusters do we form?
• Step-1: Create each data point as a single cluster. Let's say there are N data points, so the
number of clusters will also be N.
• Step-2: Take two closest data points or clusters and merge them to form one cluster. So,
there will now be N-1 clusters. (Using Euclidean distance and linkage methods)
• Step-3: Again, take the two closest clusters and merge them together to form one cluster.
There will be N-2 clusters.
• Step-4: Repeat Step 3 until only one cluster left.
• Step-5: Once all the clusters are combined into one big cluster, develop the dendrogram to
divide the clusters as per the problem.
Step-1: Create each data point as a single
cluster. Let's say there are N data points, so
the number of clusters will also be N.
Step-4: Repeat Step 3 until only one cluster left. So, we will get the following clusters.
Consider the below images:
Step-5: Develop the
dendrogram to divide the
clusters as per the problem.
• In the above diagram, the left part is showing how clusters are created in agglomerative
clustering, and the right part is showing the corresponding dendrogram.
• Firstly, the datapoints P2 and P3 combine together and form a cluster, correspondingly a
dendrogram is created, which connects P2 and P3 with a rectangular shape. The hight is
decided according to the Euclidean distance between the data points.
• In the next step, P5 and P6 form a cluster, and the corresponding dendrogram is created. It is
higher than of previous, as the Euclidean distance between P5 and P6 is a little bit greater than
the P2 and P3.
• Again, two new dendrograms are created that combine P1, P2, and P3 in one dendrogram, and
P4, P5, and P6, in another dendrogram.
• At last, the final dendrogram is created that combines all the data points together.
Decide on the Number of Clusters
• Theoretical, conceptual, or practical considerations may suggest a certain number of clusters.
• In hierarchical clustering, the distances at which clusters are combined can be used as criteria.
This information can be obtained from the agglomeration schedule or from the dendrogram.
• In nonhierarchical clustering, the ratio of total within-group variance to between-group variance
can be plotted against the number of clusters. The point at which an elbow or a sharp bend
occurs indicates an appropriate number of clusters.
• The relative sizes of the clusters should be meaningful.
• # Fitting Hierarchical Clustering to the dataset
• HC= hclust(d = dist(Mall_Customers, method = 'euclidean'),
method = 'ward.D')
• HC
• Cluster method : ward.D
• Distance : euclidean
• Number of objects: 200
• # Using the dendrogram to find the optimal number of clusters
• plot(HC)
Interpreting and Profiling the Clusters
• Interpreting and profiling
clusters involves examining the
cluster centroids. The centroids
enable us to describe each cluster 5. Target Customers
by assigning it a name or label.
Reference:
Malhotra Naresh, K. and Dash, S. (2015) Marketing Research, An Applied Orientation. 7th Edition, Pearson, India.
Hierarchical Clustering with R # Fitting Hierarchical Clustering to the dataset Using 5 clusters
• # Importing the dataset HC=cutree(HC, 5)
HC
• library(readxl)
table(HC)
• Mall_Customers <- read_excel("Mall_Customers.xlsx") # Visualising the clusters
library(cluster)
• attach(Mall_Customers) clusplot(Mall_Customers, HC, lines = 0, shade = TRUE,color =
• View(Mall_Customers) TRUE,labels= 2, plotchar = FALSE, span = TRUE, main =
paste('Clusters'), xlab = 'Annual Income',ylab = 'Spending
• head(Mall_Customers) Score')
• library(dplyr)
• Mall_Customers=select(Mall_Customers,`Annual Income (k$)`,`Spending Score (1-100)`)
• head(Mall_Customers)
• Mall_Customers=scale(Mall_Customers, center = TRUE, scale = TRUE)
• ## Fitting Hierarchical Clustering to the dataset
• HC= hclust(d = dist(Mall_Customers, method = 'euclidean'), method = 'ward.D')
• HC
• # Using the dendrogram to find the optimal number of clusters
• plot(HC)
Thank You
Linear Programming
Ramesh Kandela
[email protected]
Optimization Models
Whenever there is a hard job to be done I assign it to a lazy man; he is sure to
find an easy way of doing it. – Walter Chrysler
• Prescriptive analytics is the highest level of analytics capability which is used for choosing
optimal actions once an organization gains insights through descriptive and predictive
analytics.
• Optimization is a precise procedure using design constraints and criteria to enable the
planner to find the optimal solution.
• An optimization model is a translation of the key characteristics of the business problem you
are trying to solve. The model consists of three elements: the objective function, decision
variables and business constraints.
▪Linear Programming
▪Transportation & Assignment Problems
▪Decision Analysis
▪Markov Models
▪Simulation
Linear Programming
Linear Programming is a mathematical technique useful for allocation of scarce or limited
resources (such as labour, material, machine, time, warehouse space, capital, energy, etc.), to
several competing activities (such as products, services, jobs, new equipment, projects, etc.)
on the basis of given criterion of optimality.
2. Additivity:
The value of the objective function and the total amount of each resource used (or supplied), must be equal to the sum of
the respective individual contribution (profit or cost) of the decision variables. For example, the total profit earned from the
sale of two products A and B must be equal to the sum of the profits earned separately from A and B. Similarly, the amount
of a resource consumed for producing A and B must be equal to the total sum of resources used for A and B individually.
• cj s are coefficients representing the per unit profit (or cost) of decision variable xj to the value of
objective function.
• The aij’s are referred as technological coefficients (or input-output coefficients).These represent the
amount of resource, say i consumed per unit of variable (activity) xj. These coefficients can be
positive, negative or zero.
• The bi represents the total availability of the ith resource.
Mathematical Formulation of LP Model
The term formulation refers to the process of converting the verbal description and
numerical data into mathematical expressions, which represents the relationship among
relevant decision variables (or factors), objective and restrictions (constraints) on the use of
scarce resources to several competing activities on the basis of a given criterion of optimality.
• Step1: Study the given situation, find the key decision to be made. Hence, identify the
decision variables of the problem.
• Step2: Formulate the objective function to be optimized.
• Step3: Formulate the constraints of the problem.
• Step4: Add non-negativity restrictions.
The objective function, the set of constraints, and, the non-negativity restrictions together
form an LP model.
Examples of LP Model Formulation
A company manufactures two types of products, A and B, sell them at a profit of
Rs. 4 on type A and Rs. 5 on type B. each product is processed on two machines
X and Y. type A requires 2 minutes of processing time on X and 3 minutes on Y.
type B requires 2 minutes of processing time on X and 2 minutes on Y. the
machine, X is available for 5 hours 30 minutes, while Y is available for 8 hours
during any working day. Formulate the problem as a LP problem.
Suppose an industry is manufacturing two types of products P1 and P2. The
profits per Kg of the two products are Rs.30 and Rs.40 respectively. These two
products require processing in three types of machines. The following table
shows the available machine hours per day and the time required on each
machine to produce one Kg of P1 and P2. Formulate the problem in the form
of linear programming model.
Machine Product Total Available Machine hours/day
P1 P2
1 3 2 600
2 3 5 800
3 5 6 1100
Solution:
The procedure for linear programming problem formulation is as follows:
• Introduce the decision variable as follows:
• Let x1 = amount of P1 and x2 = amount of P2
• In order to maximize profits, we establish the objective function as 30x1 + 40x2
• Since one Kg of P1 requires 3 hours of processing time in machine 1 while the corresponding requirement of P2
is 2 hours. So, the first constraint can be expressed as 3x1 + 2x2 ≤ 600
• Similarly, corresponding to machine 2 and 3 the constraints are
• 3x1 + 5x2 ≤ 800
• 5x1 + 6x2 ≤ 1100
• In addition to the above there is no negative production, represented algebraically as x1 ≥ 0 ; x2 ≥ 0
Thus, the product mix problem in the linear programming model is as follows:
• Maximize 30x1 + 40x2
• Subject to:
• 3x1 + 2x2 ≤ 600
• 3x1 + 5x2 ≤ 800
• 5x1 + 6x2 ≤ 1100
• x1≥ 0, x2 ≥ 0
4.Reddy Mikks produces both interior and exterior paints from two raw materials, M1and M2.
The following table provides the basic data of the problem:
Tons of raw material per ton of
Exterior paint Interior paint Maximum daily availability (tons
Raw material, M1 6 4 24
Raw material, M2 1 2 6
Profit per ton ($1000) 5 4
A market survey indicates that the daily demand for interior paint cannot exceed that for
exterior paint by more than 1 ton. Also, the maximum daily demand for interior paint is 2 tons.
Reddy Mikks wants to determine the optimum (best) product mix of interior and exterior
paints that maximizes the total daily profit. Maximize Z = 5x1 + 4x2
subject to the constraints
(i) 6x1 + 4x2≤24
(ii) x1+2x2 ≤ 6
(iii) -x1+x2 ≤ 1
(iv) x2 ≤2
and x1, x2 ≥ 0.
Product 1 Product 2 Resource Availability
Material 1 1 1 5
Material 2 3 2 12
Profit/Unit 6 5
• LP formulation
• x1 : the amount of product type 1 produced (units)
• x2 : the amound ofproduct type 2 produced (units)
Total profit
Resource A limit
Resource B limit
Non-negativity
166
Linear Programming: Solution
Two basic solution approaches of linear programming exist.
1.The graphical Method
Simple, but limited to two decision variables.
2.The simplex method
More complex, but solves multiple decision variable problems.
Graphical Method
For LP problem that have only two variables, it is possible that the entire set of feasible
solutions can be displayed graphically by plotting linear constraints on graph paper in order
to locate the best(optimal) solution. The technique used to identify the optimal solution is
called the graphical solution method for an LP problem with two variables.
Optimal solution to an LP problem is obtained by
(i) Extreme (corner) point, and
(ii) Iso-profit (cost) function line method
Extreme (corner) point method
Step1: Plot constraints on graph paper and decide the feasible region
• Replace the inequality sign in each constraint by an equality sign
• Draw the straight line on the graph paper and decide each time the area of feasible solutions
according to inequality sign of the constraint.
• ≤ then shaded area is below the line
• ≥ then shaded area is above the line
• Shade the common portion of the graph paper that satisfies all the constraints simultaneously drawn
so far. The final shaded area is called the feasible region of the given LP problem. Any point inside this
region is called the feasible solution and this provide values of x1 and x2 that satisfies the all the
constraints.
Step2: Examine extreme points of the feasible region to find an optimal solution
A
O
Use the graphical method to solve the following LP problem.
Maximize Z = 5x1 + 4x2
subject to the constraints
(i) 6x1 + 4x2≤24
(ii) x1+2x2 ≤ 6
(iii)-x1+x2 ≤ 1
(iv)x2 ≤2
and x1, x2 ≥ 0.
X1=3 and
X2=1.5
Z=21
Use the graphical method to solve the
following LP problem.
Maximize Z = 15x1 + 10x2
subject to the constraints
(i) 4x1 + 6x2≤360, (ii) 3x1 ≤ 180, (iii) 5x2 ≤ 200
and x1, x2 ≥ 0.
• Solution: Treat x1 as the horizontal axis and x2 as the vertical axis.
Plot each constraint on the graph by treating it as a linear equation.
• Consider the first constraint 4x1 + 6x2 ≤ 360. Treat this as the equation 4x1 + 6x2 = 360. For this find any two
points that satisfy the equation and then draw a straight line through them. The two points are generally the
points at which the line intersects the x1 and x2 axes. For example, whenx1 = 0 we get 6x2 = 360 or x2 = 60.
Similarly, when x2 = 0, 4x1 = 360, x1 = 90. These two points are then connected by a straight line as shown in
Fig.
• Similarly, the constraints 3x1 ≤ 180 and 5x2 ≤ 200 are also plotted on the graph and are indicated by the shaded
area as shown in Fig.
• Since all constraints have been graphed, the area which is bounded by all the constraints lines including all the
boundary points is called the feasible region (or solution space). The feasible region is shown in Fig. by the
shaded area OABCD.
• Since the optimal value of the objective function occurs at
one of the extreme points of the feasible region, it is
necessary to determine their coordinates. The coordinates
of extreme points of the feasible region are: O = (0, 0), A =
(60, 0), B = (60, 20), C = (30, 40), D = (0, 40).
• Evaluate objective function value at each extreme point
of the feasible region as shown in the Table
Since objective function Z is to be maximized, from Table we conclude that maximum value of Z = 1,100 is
achieved at the point extreme B (60, 20). Hence the optimal solution to the given LP problem is: x1 = 60, x2 =
20 and Max Z = 1,100.
Minimization LP Problem
• Use the graphical method to solve the following LP problem.
The minimum (optimal) value of the objective function Z = 13 occurs at the extreme point C
(1, 5). Hence, the optimal solution to the given LP problem is: x1 = 1, x2 = 5, and Min Z = 13
Slack and Surplus Variables
• Slack variable represents an unused quantity of resource; it is added to less-than or equal-
to type constraints in order to get an equality constraint.
• Surplus variable represents the amount of resource usage above the minimum required
and is subtracted to greater-than or equal-to constraints in order to get equality
constraint.
• A linear program in which all the variables are non-negative and all the constraints are
equalities is said to be in standard form.
• Slack and surplus variables represent the difference between the left and right sides of the
constraints.
• Slack and surplus variables have objective function coefficients equal to 0.
Slack Variables (for < constraints) Surplus Variables (for ≥ constraints)
Max 1 5x1 + 10x2 + 0s1 + 0s2 + 0s3
Min 3x1 + 2x2 + 0s1 + 0s2 + 0s3
• S.t
s.t. 5x1 + x2 - s1 = 10
• 4 x1 + 6 x2 +s1 = 360
x1 + x2 - s2 = 6
• 3 x1 +s2 = 180,
x1 + 4x2 - s3 = 12
• 5 x2 +s3 = 200
x1, x2, s1, s2, s3 > 0
• x1, x2 , s1 , s2 , s3 > 0
s1 , s2 , and s3 are surplus variables
• s1 , s2 , and s3 are slack variables
The optimal solution to the given LP The optimal solution to the given LP problem
problem is: x1 = 60, x2 = 20 and Max Z = is: x1 = 1, x2 = 5, and Min Z = 13
1,100
Constraint Value of slack variable Constraint Value of surplus variable
4x1 + 6 x2 =360 100 5x1 + x2 =10 0
3 x1 +s2 =180 0 x1 + x2 =6 0
5 x2 +s3 ≤ 200 0 x1 + 4x2 =12 9
Iso-profit (Cost) Function Line Method
Iso-profit (or cost)function line is a straight line that represents all nonnegative combinations
of x1 and x2 variable values for a particular profit (or cost) level
Step 1: Identify the feasible region and extreme points of the feasible region.
Step 2: Draw an iso-profit (iso-cost) line for an arbitrary but small value of the objective function
without violating any of the constraints of the given LP problem. However, it is simple to pick a value
that gives an integer value to x1 when we set x2 = 0 and vice-versa. A good choice is to use a number
that is divided by the coefficients of both variables.
Step 3: Move iso-profit (iso-cost) lines parallel in the direction of increasing (decreasing) objective
function values. The farthest iso-profit line may intersect only at one corner point of feasible region
providing a single optimal solution. Also, this line may coincide with one of the boundary lines of the
feasible area. Then at least two optimal solutions must lie on two adjoining corners and others will lie
on the boundary connecting them. However, if the iso-profit line goes on without limit from the
constraints, then an unbounded solution would exist. This usually indicates that an error has been
made in formulating the LP model.
Step 4: An extreme (corner) point touched by an iso-profit (or cost) line is considered as the optimal
solution point. The coordinates of this extreme point give the value of the objective function.
Iso-profit (Cost) Function Line Method
The coordinates x1 = 60 and x2 = 20 of corner point B satisfy the given constraints and the
total profit obtained is Z = 1,100
Graphical Method -Special Cases
• Alternative (or Multiple) Optimal Solutions
• Infeasible Solution
• Unbounded Solution
• Since value (maximum) of objective function, Z = 60 at two different extreme points B and
C is same, therefore two alternative solutions: x1 = 6/7, x2 = 60/7 and x1= 6, x2 = 0 exist
Infeasible Solution
• An infeasible solution to an LP problem arises when there is no solution that satisfies all the
constraints simultaneously. This happens when there is no unique (single) feasible region.
This situation arises when a LP model that has conflicting constraints.
The constraints are plotted on graph as usual as shown in above Figure. Since there is no
unique feasible solution space, therefore a unique set of values of variables x1 and x2 that
satisfy all the constraints cannot be determined. Hence, there is no feasible solution to this LP
problem because of the conflicting constraints.
Unbounded Solution
Sometimes an LP problem may have an infinite solution. Such a solution is referred as an
unbounded solution. It happens when value of certain decision variables and the value of the
objective function (maximization case) are permitted to increase infinitely, without violating
the feasibility condition.
Maximize = 40x1 + 60x2
subject to
2x1+x2 ≥70
x1 +x2 ≥40
x1 + 3x2 ≥ 9
x1, x2 ≥ 0
• The point (x1, x2) must be somewhere in the solution space as shown in the figure by shaded portion.
• The three extreme points (corner points) in the finite plane are:
P = (90, 0); Q = (24, 22) and R = (0, 70) The values of the objective function at these extreme points are: Z(P)
= 3600, Z(Q) = 2280 and Z(R) = 4200.
• In this case, no maximum of the objective function exists because the region has no boundary for increasing
values of x1 and x2. Thus, it is not possible to maximize the objective function in this case and the solution is
unbounded.
Find the optimal solution using graphical method
Product Resource Available
Base Fuel Additive Solvent Base for Production
Material 1 (tons) 0.4 0.520 tons
Material 2 (tons) 0.25 tons
Material 3 (tons) 0.6 0.321 tons
Profit/ton 40 30
Max 40F+30S
S.t
0.4F+0.5S ≤ 20
0.2S ≤ 5
0.6F+0.3S ≤ 21
F,S>0
• The objective function 40F+30S takes on its
maximum value at the extreme point F = 25 and S =
20. Thus, F =25 and S = 20 is the optimal solution
and z=1600 is the value of the optimal solution.
LPP with Excel Solver Click Solver button on the Excel Data tab
1 B C Max 40F+30S
2 S.t .4F+.5S ≤ 20
3 .2S ≤ 5
4 .6F+.3S ≤ 21
5 F,S>0
6 Variables F S MaxZ
7 Optimal Sol =SUMPRODUCT(B8:C8,B7:C7)
8 Obj Fun Coeff 40 30
9
10 S.t LHS RHS
11 Constraint 1 0.4 0.5 =SUMPRODUCT(B11:C11,$B$7:$C$7) <= 20
• Now suppose RMC learns that a price reduction in the fuel additive has reduced its profit
contribution to $30 per ton.
Max 30F+30S
S.t
.4F+.5S ≤ 20
.2S ≤ 5
.6F+.3S ≤ 21
F,S>0
The total profit contribution decreased to 30(25)
+30(20)=1350, the decrease in the profit contribution for
the fuel additive from $40 per ton to $30 per ton does
not change the optimal solution F=25 and S =20.
Changes in the Objective Function Coefficients
• Decreasing the profit contribution for the fuel additive to
$20 per ton changes the optimal solution. The solution F
=25 tons and S =20 tons is no longer optimal.
Sensitivity Report (Range of Optimality for c1 and c2)
Simultaneous Changes
100% Rule for objective function coefficients
The 100% rule states that simultaneous changes in objective function coefficients will not
change the optimal solution as long as the sum of the percentages of the change divided by
the corresponding maximum allowable change in the range of optimality for each coefficient
does not exceed 100%.
Change in the Right-Hand Sides
• Let us consider how a change in the right-hand side for a constraint might affect the feasible
region and perhaps cause a change in the optimal solution.
• The improvement in the value of the optimal solution per unit increase in the right-hand
side is called the shadow price.
Max 40F+30S
S.t 0.4F+0.5S ≤ 20
0.2S ≤ 5
0.6F+0.3S ≤ 25.5
F,S>0
• The additional 4.5 tons of material 3 in the
revised problem provides a new optimal solution
and increases the value of the optimal solution by
$1800 $1600 $200. On a per-ton basis, the
additional 4.5 tons of material 3 increases the
value of the optimal solution at the rate of
$200/4.5 $44.44 per ton.
Shadow Price (Dual Price)
• The shadow price is the change in the optimal objective function value per unit increase in
the right-hand side of a constraint.
• Hence, the shadow price for the material 3 constraint is $44.44 per ton.
• The shadow price for a nonbinding constraint is 0.
• A negative shadow price indicates that the objective function will not improve if the RHS is
increased
Simultaneous Changes
100% Rule for right hand side
For all right hand sies that are changed, sum the percentages of the allowable increases and
the allowable decrease. If the sum of the percentages is less than or equal to 100%, the
shadow prices do not change.
Sensitivity Analysis with Excel Solver
Before you click OK, select Sensitivity from the Reports section.
Thank You
Transportation & Assignment
Problems
Ramesh Kandela
[email protected]
Network Models
• A network model is one which can be represented by a set
of nodes, a set of arcs, and functions (e.g. costs, supplies,
demands, etc.) associated with the arcs and/or nodes.
• Transportation, Assignment, transshipment, shortest-
route, maximal flow and PERT/CPM problems are all
examples of network problems.
• For each of the problems, if the right-hand side of the
linear programming formulations are all integers, the
optimal solution will be in terms of integer values for the
decision variables.
Transportation Problem
Transportation Problem
Supply Demand
Minimize
locations locations
the Cost
(Origins) (Destinations)
• The transportation problem arises frequently in planning for the distribution of goods
and services from several supply locations to several demand locations.. The usual
objective in a transportation problem is to minimize the cost of shipping goods from the
origins to the destinations.
Transportation Problem
Linear Programming Formulation
• Network Representation 𝑚 𝑛
4. Unacceptable routes
Remove the corresponding decision variable.
Assignment Problem
Assignment Problem
• An assignment problem seeks to minimize the total cost assignment of m workers to m
jobs, given that the cost of worker i performing job j is cij.
• It assumes all workers are assigned and each job is performed.
• An assignment problem is a special case of a transportation problem in which all supplies
and all demands are equal to 1; hence assignment problems may be solved as linear
program.
c11
1 1
c12
c13
Agents Tasks
c21
c22
2 2
c23
c31
c32
3 3
c33
Assignment Problem: Linear Programming Formulation
𝑚 𝑛
The sum of the completion times for the three project leaders will provide the total days required to complete the three
assignments. Thus, the objective function is Min 10x11 +15x12 +9x13 +9x21 +18x22 +5x23 +6x31 +14x32 +3x33
The constraints for the assignment problem reflect the conditions that each project leader can be assigned to at
most one client and that each client must have one assigned project leader. These constraints are written as
follows:
x11 +x12 +x13 ≤1 Terry’s assignment
x21 +x22 +x23 ≤1 Carle’s assignment
x31 +x32 +x33 ≤1 McClymonds’s assignment
x11 +x21 +x31 =1 Client 1
x12 +x22 +x32 =1 Client 2
x13 + x23 +x33 =1 Client 3
Combining the objective function and constraints into one model provides the following nine-variable,
six-constraint linear programming model of the Fowle Marketing Research assignment problem
Min 10x11 +15x12 +9x13 +9x21 +18x22 +5x23 +6x31 +14x32 +3x33
x11 +x12 +x13 ≤1
x21 +x22 +x23 ≤1
x31 +x32 +x33 ≤1
x11 +x21 +x31 =1
x12 +x22 +x32 =1
x13 + x23 +x33 =1
Network Representation
50
West. A
36
16
Subcontractors Projects
28
30
Fed. B
18
35 32
Gol.
20
C
25
25
Univ. 14
Assignment Problem: Example
Linear Programming Formulation
Min 50x11+36x12+16x13+28x21+30x22+18x23
+35x31+32x32+20x33+25x41+25x42+14x43
s.t. x11+x12+x13 < 1
x21+x22+x23 < 1 The optimal assignment is:
Agents Subcontractor Project Distance
x31+x32+x33 < 1
x41+x42+x43 < 1 Westside C 16
x11+x21+x31+x41 = 1 Federated A 28
x12+x22+x32+x42 = 1 Tasks Goliath (unassigned)
x13+x23+x33+x43 = 1 Universal B 25
xij = 0 or 1 for all i and j Total Distance = 69 miles
Thank You
Decision Analysis
Ramesh Kandela
[email protected]
Decision Analysis
• Decision analysis is an analytical approach of comparing decision alternatives in terms of
expected outcomes.
• Decision analysis can be used to develop an optimal strategy when a decision maker is
faced with several decision alternatives and an uncertain or risk-filled pattern of future
events.
• Even when a careful decision analysis has been conducted, the uncertain future events
make the final consequence uncertain.
• The risk associated with any decision alternative is a direct result of the uncertainty
associated with the final consequence.
• Good decision analysis includes risk analysis that provides probability information about
the favorable as well as the unfavorable consequences that may occur.
A decision problem is characterized by
• Decision alternatives
• States of nature
• Payoffs
Problem Formulation
• A decision problem is characterized by decision alternatives, states of nature, and resulting
payoffs.
• The decision alternatives are the different possible strategies the decision maker can employ.
• The states of nature refer to future events, not under the control of the decision maker, which
may occur. States of nature should be defined so that they are mutually exclusive and
collectively exhaustive.
• Payoff It is a numerical value (outcome) obtained due to the application of each possible
combination of decision alternatives and states of nature.
• The consequence resulting from a specific combination of a decision alternative and a state of
nature is a payoff.
• A table showing payoffs for all combinations of decision alternatives and states of nature is a
payoff table.
• Payoffs can be expressed in terms of profit, cost, time, distance or any other appropriate
measure.
Example: Pittsburgh Development Corp.
• Pittsburgh Development Corporation (PDC) purchased land that will be the site of a new
luxury condominium complex. PDC commissioned preliminary architectural drawings for
three different projects: one with 30, one with 60, and one with 90 condominiums.
• The financial success of the project depends upon the size of the condominium complex and
the chance event concerning the demand for the condominiums. The statement of the PDC
decision problem is to select the size of the new complex that will lead to the largest profit
given the uncertainty concerning the demand for the condominiums.
Consider the following problem with three decision alternatives and two states of nature
with the following payoff table representing profits:
Payoff table
States of Nature
Decision Alternative Strong Demand s1 Weak Demand s2
Small complex, d1 8 7
Medium complex, d2 14 5
Large complex, d3 20 -9
Influence Diagrams
• An influence diagram is a graphical device showing the relationships among
the decisions, the chance events, and the consequences.
• Squares or rectangles depict decision nodes.
• Circles or ovals depict chance nodes.
• Lines or arcs connecting the nodes show the direction of influence.
• Diamonds depict consequence nodes.
Decision Trees
• A decision tree is a chronological
representation of the decision problem.
• Decision tree is the graphical display of the
progression of decision and random events.
• Each decision tree has two types of nodes;
round nodes correspond to the states of
nature while square nodes correspond to the
decision alternatives.
• The branches leaving each round node
represent the different states of nature while
the branches leaving each square node
represent the different decision alternatives.
• At the end of each limb of a tree are the
payoffs attained from the series of branches
making up that limb.
Types of Decision-making Environments
Decision-Making under Certainty
In this decision-making environment, decision-maker has complete knowledge (perfect
information) of outcome due to each decision-alternative (course of action). In such a case he
would select a decision alternative that yields the maximum return (payoff) under known state of
nature. For example, the decision to invest in National Saving Certificate, Indira Vikas Patra, Public
Provident Fund, etc., is where complete information about the future return due and the principal
at maturity is know.
Decision-Making under Risk
In this decision-environment, decision-maker does not have perfect knowledge about possible
outcome of every decision alternative. It may be due to more than one states of nature.
Decision-Making under Uncertainty
In this decision environment, decision-maker is unable to specify the probability for occurrence of
particular state of nature. However, this is not the case of decision-making under ignorance,
because the possible states of nature are known. Thus, decisions under uncertainty are taken even
with less information than decisions under risk. For example, the probability that Mr X will be the
prime minister of the country 15 years from now is not known.
Decision Making without Probabilities (Decision-making Under Uncertainty)
In this decision environment, decision-maker is unable to specify the probability for occurrence
of particular state of nature.
• Three commonly used criteria for decision making when probability information
regarding the likelihood of the states of nature is unavailable are:
• Optimistic approach (maximax or minimin)
• Conservative approach (maximin or minimax)
• Minimax regret approach.
Optimistic Approach
In this criterion the decision-maker ensures that he should not miss the opportunity to
achieve the largest possible profit (maximax) or the lowest possible cost (minimin). Thus, he
selects the decision alternative that represents the maximum of the maxima (or minimum of
the minima) payoffs (consequences or outcomes).
• The optimistic approach would be used by an optimistic decision maker.
• The decision with the largest possible payoff is chosen.
• If the payoff table was in terms of costs, the decision with the lowest cost would be
chosen.
Conservative Approach
• The conservative approach would be used by a conservative decision maker.
• For each decision the minimum payoff is listed and then the decision corresponding to the
maximum of these minimum payoffs is selected. (Hence, the minimum possible payoff is
maximized.)
• If the payoff was in terms of costs, the maximum costs would be determined for each
decision and then the decision corresponding to the minimum of these maximum costs is
selected. (Hence, the maximum possible cost is minimized.)
in this criterion the decision-maker is conservative about the future and always anticipate
the worst possible outcome (minimum for profit and maximum for cost or loss), it is called
pessimistic decision criterion. This criterion is also known as Wald’s criterion.
Minimax Regret Approach
• The minimax regret approach requires the construction of a regret table or an opportunity
loss table. This is done by calculating for each state of nature the difference between each
payoff and the largest payoff for that state of nature.
• Then, using this regret table, the maximum regret for each possible decision is listed.
• The decision chosen is the one corresponding to the minimum of the maximum regrets.
Example: Minimax Regret Approach
• The expected value (EV) of decision alternative di is defined as: EV( d i ) = P( s j )Vij
j =1
Payoffs
s1 .8
$8 mil
2 s2 .2
d1 $7 mil
s1 .8
d2 $14 mil
1 3 s2 .2
d3 $5 mil
s1 .8
$20 mil
4 s2 .2
-$9 mil
Expected Value for Each Decision
Small d1 2
Large d3 4
Choose the decision alternative with the largest EV. Build the large complex.
• Suppose that the decision maker obtained the probability assessments P(s1)
0.65, P(s2) 0.15, and P(s3) 0.20. Use the expected value approach to determine
the optimal decision
Expected Value of Perfect Information
• Frequently information is available which can improve the probability estimates for the
states of nature.
• The expected value of perfect information (EVPI) is the increase in the expected profit that
would result if one knew with certainty which state of nature would occur.
• The EVPI provides an upper bound on the expected value of any sample or survey
information.
EVPI = (Expected profit with perfect information) – (Expected profit without perfect information)
• EVPI Calculation
• Step 1:
Determine the optimal return corresponding to each state of nature.
• Step 2:
Compute the expected value of these optimal returns.
• Step 3:
Subtract the EMV of the optimal decision from the amount determined in step (2).
Expected Value of Perfect Information
PDC’s optimal decision strategy when the perfect information becomes available as follows:
If s1, select d3 and receive a payoff of $20 million.
If s2, select d1 and receive a payoff of $7 million.
Since each joint probability can be expressed as the product of a known marginal (prior) and conditional
probability, i.e., P( Ai ∩ B) = P( Ai ) x P( B | Ai )
Posterior Probabilities
Favorable
State of Prior Conditional Joint Posterior
Nature Probability Probability Probability Probability
sj P(sj) P(F|sj) P(F I sj) P(sj |F)
s1 0.8 0.90 0.72
s2 0.2 0.25 0.05
P(favorable) = P(F) =
Unfavorable
State of Prior Conditional Joint Posterior
Nature Probability Probability Probability Probability
sj P(sj) P(U|sj) P(U I sj) P(sj |U)
s1 0.8 0.10 0.08
s2 0.2 0.75 0.15
P(unfavorable) = P(U) =
Posterior Probabilities
Favorable
State of Prior Conditional Joint Posterior
Nature Probability Probability Probability Probability
sj P(sj) P(F|sj) P(F I sj) P(sj |F)
s1 0.8 0.90 0.72 0.94
s2 0.2 0.25 0.05 0.06
P(favorable) = P(F) = 0.77 1.00
Unfavorable
State of Prior Conditional Joint Posterior
Nature Probability Probability Probability Probability
sj P(sj) P(U|sj) P(U I sj) P(sj |U)
s1 0.8 0.10 0.08 0.35
s2 0.2 0.75 0.15 0.65
P(unfavorable) = P(U) = 0.23 1.00
Sample Information
PDC has developed the following branch probabilities.
If the market research study is undertaken:
P(Favorable report) = P(F) = .77
P(Unfavorable report) = P(U) = .23
If the market research report is favorable:
P(Strong demand | favorable report) = P(s1|F) = .94
P(Weak demand | favorable report) = P(s2|F) = .06
If the market research report is unfavorable:
P(Strong demand | unfavorable report) = P(s1|U) = .35
P(Weak demand | unfavorable report) = P(s2|U) = .65
If the market research study is not undertaken, the prior
probabilities are applicable:
P(Favorable report) = P(F) = .80
P(Unfavorable report) = P(U) = .20
Decision Tree
s1 P(s1) = .94 $ 8 mil
d1 6 s2
P(s2) = .06 $ 7 mil
d2 s1
s2 P(s1) = .94 $14 mil
F 3 7
d3 P(s2) = .06 $ 5 mil
(.77) s1
s2 P(s1) = .94 $20 mil
8
Conduct P(s2) = .06 -$ 9 mil
2 s1 P(s1) = .35
Market d1 s2 $ 8 mil
9
Research U P(s2) = .65 $ 7 mil
d2 s1
Study (.23) 10 s2 P(s1) = .35 $14 mil
4
d3 s1 P(s2) = .65 $ 5 mil
1 $20 mil
11 s2 P(s1) = .35
s1 P(s2) = .65 -$ 9 mil
d1 12 s2 P(s1) = .80 $ 8 mil
Do Not Conduct
s1 P(s2) = .20 $ 7 mil
Market Research d2
5 13 s2 P(s1) = .80 $14 mil
Study d3 s1 P(s2) = .20 $ 5 mil
14 s2 P(s1) = .80 $20 mil
P(s2) = .20 -$ 9 mil
Decision Strategy
• A decision strategy is a sequence of decisions and chance outcomes where the decisions
chosen depend on the yet-to-be-determined outcomes of chance events.
• The approach used to determine the optimal decision strategy is based on a backward
pass through the decision tree using the following steps:
• At chance nodes, compute the expected value by multiplying the payoff at the end
of each branch by the corresponding branch probabilities.
• At decision nodes, select the decision branch that leads to the best expected value.
This expected value becomes the expected value at the decision node.
Decision Tree
EV = d1 6 EV = .94(8) + .06(7) = $7.94 mil
$18.26 mil
d2
F 3 7 EV = .94(14) + .06(5) = $13.46 mil
d3
(.77)
EV = 8 EV = .94(20) + .06(-9) = $18.26 mil
$15.93 2
mil d1 9 EV = .35(8) + .65(7) = $7.35 mil
U
d2
(.23) 4 10 EV = .35(14) + .65(5) = $8.15 mil
d3
1 EV =
11 EV = .35(20) + .65(-9) = $1.15 mil
EV = $8.15 mil
$15.93 d1 12 EV = .8(8) + .2(7) = $7.80 mil
mil
d2
5 13 EV = .8(14) + .2(5) = $12.20 mil
d3
EV = $14.20 mil
14 EV = .8(20) + .2(-9) = $14.20 mil
Decision Tree
Decision Strategy
• PDC’s optimal decision strategy is:
• Conduct the market research study.
• If the market research report is favorable, construct the large condominium complex.
• If the market research report is unfavorable, construct the medium condominium
complex.
EVSI=EVwSI EVwoSI
where
EVSI = expected value of sample information
EVwSI= expected value with sample information about the
states of nature
EVwoSI = expected value without sample information about
the states of nature
EVwSI
EV(Node 2)= 0.77EV(Node 3) + 0.23EV(Node 4)
=0.77(18.26) + 0.23(8.15)
= 15.93
Expected Value of Sample Information
• The expected value of sample information (EVSI) is the additional expected profit possible
through knowledge of the sample or survey information.
• The expected value associated with the market research study is $15.93.
• The best expected value if the market research study is not undertaken is $14.20.
• We can conclude that the difference, $15.93 − $14.20 = $1.73, is the expected value of
sample information.
• Conducting the market research study adds $1.73 million to the PDC expected value.
Efficiency of Sample Information
• Efficiency of sample information is the ratio of EVSI to EVPI.
• As the EVPI provides an upper bound for the EVSI, efficiency is always a number between 0 and 1.
The efficiency of the survey:
E = (EVSI/EVPI) X 100
= [($1.73 mil)/($3.20 mil)] X 100
= 54.1%
The information from the market research study is 54.1% as efficient as perfect information.
Thank You
Markov Models
Ramesh Kandela
[email protected]
Markov Processes
• Markov chain models (also known as stochastic processes) are useful to study a system in
which the system’s current state depends on all of its previous states.
• Markov process models are useful in studying the evolution of systems over repeated trials
or sequential time periods or stages.
• the promotion of managers to various positions within an organization
• the migration of people into and out of various regions of the country
• the progression of students through the years of college, including eventually dropping
out or graduating
Markov processes have been used to describe the probability that:
• a machine that is functioning in one period will function or break down in the next period.
• a consumer purchasing brand A in one period will purchase brand B in the next period.
Examples
For example, consider the following few systems:
I. Market share of a product and its competitive brands.
II. Cash collection procedures involved in converting accounts receivable from the product’s
sales into cash.
III. Machines used to manufacture a product.
IV. Area of specialization by a management student at one time.
In all these examples, each process (or system) may be in one of several possible states. These
states describe all possible conditions of the given system. For example,
I. the brand of the product that a customer is presently using is termed as a state.
II. the accounts receivable can be in one of the two states: cash sale or credit sale.
III. the machine condition can be in one of the two possible states: working or not working
IV. the few areas in which a student can specialize at one time represent states.
Transition Probabilities
• State : This is the position at a specific time-step in the environment.
• State probability of an event is the probability of its occurrence at a point in time.
• Transition : Moving from one state to another is called Transition.
• Transition probabilities govern the manner in which the state of the system changes from
one stage to the next. These are often represented in a transition matrix.
• A system has a finite Markov chain with stationary transition probabilities if:
• there are a finite number of states,
• the transition probabilities remain constant from stage to stage, and
• the probability of the process being in a particular state at stage n+1 is completely
determined by the state of the process at stage n (and not the state at stage n-1). This
is referred to as the memory-less property.
Student Markov Process
Student
Markov
Process
Example: Market Share Analysis
• Suppose we are interested in analyzing the market share and customer loyalty for
Murphy’s Foodliner and Ashley’s Supermarket, the only two grocery stores in a small
town. We focus on the sequence of shopping trips of one customer and assume that the
customer makes one shopping trip each week to either Murphy’s Foodliner or Ashley’s
Supermarket, but not both.
• We refer to the weekly periods or shopping trips as the trials of the process. Thus, at each
trial, the customer will shop at either Murphy’s Foodliner or Ashley’s Supermarket. The
particular store selected in a given week is referred to as the state of the system in that
period. Because the customer has two shopping alternatives at each trial, we say the
system has two states.
State 1. The customer shops at Murphy’s Foodliner.
State 2. The customer shops at Ashley’s Supermarket.
Example: Market Share Analysis
• Suppose that, as part of a market research study, we collect data from 100 shoppers over a
10-week period. In reviewing the data, suppose that we find that of all customers who
shopped at Murphy’s in a given week, 90% shopped at Murphy’s the following week while
10% switched to Ashley’s.
• Suppose that similar data for the customers who shopped at Ashley’s in a given week show
that 80% shopped at Ashley’s the following week while 20% switched to Murphy’s.
Transition Probabilities
pij = probability of making a transition from state I in a given period to state j in the next period
p11 p12 0.9 0.1
P= =
p21 p22 0.2 0.8
• The terms and will denote the probability of the system being in state 1 or state 2
at some initial or starting period.
• Week 0 =represents the most recent period, when we are beginning the analysis of a
Markov process.
• If we set (0) =1 and (0) = 0, we are saying that as an initial condition the customer
shopped last week at Murphy’s.
• Alternatively, if we set (0)= 0 and (0) =1, we would be starting the system with a
customer who shopped last week at Ashley’s.
(n) = [(n) (n)]
to denote the vector of state probabilities for the system in period n.
(next period) = (current period)P
or
(n+1) = (1)P
State Probabilities for Future Periods
Beginning with the system in state 1 at period 0, we have Π(0) = [1 0]. We can compute the
state probabilities for period 1 as follows:
(1) = (0)P
The state probabilities (1)= 0.9 and (1) =0.1 are the probabilities that a customer who
shopped at Murphy’s during week 0 will shop at Murphy’s or at Ashley’s during week 1.
We see that the probability of shopping at Murphy’s during the second week is 0.83, while the
probability of shopping at Ashley’s during the second week is 0.17.
We can compute the state probabilities for any future period; that is
Example: Market Share Analysis
State Probabilities
Murphy’s
.9 P = .9(.9) = .81
Murphy’s
Ashley’s
.9
Murphy’s .1 P = .9(.1) = .09
Murphy’s P = .1(.2) = .02
Ashley’s .2
.1
Ashley’s P = .1(.8) = .08
.8
Example: Market Share Analysis
State Probabilities for Future Periods Beginning Initially with a Murphy’s Customer
State Probabilities for Future Periods Beginning Initially with an Ashley’s Customer
In the market share analysis, suppose that we are considering the Markov process
associated with the shopping trips of one customer, but we do not know where the
customer shopped during the last week. Thus, we might assume a 0.5 probability that the
customer shopped at Murphy’s and a 0.5 probability that the customer shopped at Ashley’s
at period 0; that is, (0) =0.5 and (0) =0.5. Given these initial state probabilities, Find the
probability of each state in 3 future periods.
Steady-State (Equilibrium) Probabilities
• The state probabilities at any stage of the process can be recursively calculated by
multiplying the initial state probabilities by the state of the process at stage n.
• The probability of the system being in a particular state after a large number of stages is
called a steady-state probability.
• Since the problem is not trivial, your program will likely become a long list of complex rules—
pretty hard to maintain.
Why Machine Learning
• In contrast, a spam filter based on Machine Learning
techniques automatically learns which words and
phrases are good predictors of spam by detecting
unusually frequent patterns of words in the spam
examples compared to the ham examples). The
program is much shorter, easier to maintain, and most
likely more accurate.
• Moreover, if spammers notice that all their emails
containing “4U” are blocked, they might start writing
“For U” instead. A spam filter using traditional Machine Learning approach
programming techniques would need to be updated to
flag “For U” emails. If spammers keep working around
your spam filter, you will need to keep writing new rules
forever.
• In contrast, a spam filter based on Machine Learning
techniques automatically notices that “For U” has
become unusually frequent in spam flagged by users,
and it starts flagging them without your intervention
Why Machine Learning
To summarize, Machine Learning is great for:
• Problems for which existing solutions require a lot of hand-tuning or long lists of rules: one
Machine Learning algorithm can often simplify code and perform better.
• Complex problems for which there is no good solution at all using a traditional approach: the
best Machine Learning techniques can find a solution.
• Fluctuating environments: a Machine Learning system can adapt to new data.
• Getting insights about complex problems and large amounts of data.
• Although machine learning is continuously evolving with so many new technologies, it is still
used in various industries.
• Machine learning is important because it gives enterprises a view of trends in customer
behavior and operational business patterns, as well as supports the development of new
products. Many of today's leading companies, such as Amazon, Flipkart, Facebook,
Google, Netflix and Uber, make machine learning a central part of their operations. Machine
learning has become a significant competitive differentiator for many companies.
Scope of Machine Learning in the future
• The scope of Machine Learning (ML) is vast, and in the near future, it will deepen its reach
into various fields like medical, finance, social media, facial and voice recognition, online
fraud detection, and biometrics. . As the Machine Learning scope is very high, there are some
areas where researchers are working toward revolutionizing the world for the future.
• Medical
• Cybersecurity
• Digital voice assistants
• Education
• Search engines
• Automotive Industry
• Quantum Computing
Types of Machine Learning Systems
Machine Learning systems can be classified
according to the amount and type of
supervision they get during training.
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
Supervised Learning
• These algorithms require the knowledge of both the outcome variable (dependent variable)
and the independent variable (input variables).
• In supervised learning, the training data you feed to the algorithm includes the desired
solutions, called label.
• Regression and Classification algorithms are Supervised Learning algorithms. Both the
algorithms are used for prediction in machine learning and work with the labeled datasets.
• The main difference between Regression and Classification algorithms that Regression
algorithms are used to predict the continuous values such as price, salary, age, etc. predict
a car price is an example of regression . to predict a target numeric value, such as the price
of a car, given a set of features (mileage, age, brand, etc.) called predictors.
• Classification algorithms are used to predict/Classify the discrete values such as Positive or
Negative, Spam or Not Spam, etc. The spam filter is a good example of this: it is trained
with many example emails along with their class (spam or ham), and it must learn how to
classify new emails.
Most important supervised learning algorithms
• Linear Regression
• Logistic Regression
• k-Nearest Neighbors
• Support Vector Machines (SVMs)
• Naïve Bayes
• Decision Trees and Random Forests
• Neural Networks
Unsupervised Learning
• Unsupervised learning: In unsupervised learning, the training data is unlabeled.
The system tries to learn without a teacher.
• These algorithms are set of algorithms which do not have the knowledge of the
outcome variable in the dataset.
• These algorithms discover hidden patterns or data groupings without the need
for human intervention. Its ability to discover similarities and differences in
information make it the ideal solution for exploratory data analysis, cross-selling
strategies, customer segmentation, and image recognition.
Most Important Unsupervised Algorithms
Here, are some of the most important unsupervised learning algorithms
Clustering
• — K-Means
• — DBSCAN
• — Hierarchical Cluster Analysis (HCA)
Dimensionality Reduction/ Feature Selection
• — Principal Component Analysis (PCA)
Association Rule Learning
• — Apriori
• — Eclat
Supervised vs. Unsupervised Learning
Supervised Learning Unsupervised Learning
Supervised learning algorithms are trained using Unsupervised learning algorithms are trained using
labeled data. unlabeled data.
Supervised learning model takes direct feedback to Unsupervised learning model does not take any
check if it is predicting the correct output or not. feedback.
Unsupervised learning model finds the hidden
Supervised learning model predicts the output. patterns in data.
In supervised learning, input data is provided to In unsupervised learning, only input data is provided
the model along with the output. to the model.
The goal of supervised learning is to train the The goal of unsupervised learning is to find the
model so that it can predict the output when it is hidden patterns and useful insights from the
given new data. unknown dataset.
Supervised learning needs supervision to train the Unsupervised learning does not need any
model. supervision to train the model.
Supervised learning can be categorized Unsupervised Learning can be classified
in Classification and Regression problems. in Clustering and Associations problems.
It includes various algorithms such as Linear
Regression, Logistic Regression, Support Vector It includes various algorithms such as Clustering,
Machine, Decision tree, etc. Principal Component Analysis, and Apriori algorithm.
Applications and Technique
Task Technique
Forecasting your company’s revenue next year, based on many performance metrics Regression
Classifying emails as spam or not spam Classification
Neural
Analyzing images of products on a production line to automatically classify them Networks(CNNs)
Neural
Detecting tumours in brain scans Networks(CNNs)
Automatically classifying news articles
Automatically flagging offensive comments on discussion forums
Summarizing long documents automatically Natural Language
Creating a chatbot or a personal assistant Processing (NLP)
Detecting credit card fraud Anomaly Detection
Segmenting clients based on their purchases so that you can design a different marketing
strategy for each segment Clustering
Representing a complex, high-dimensional dataset in a clear and insightful diagram PCA
Recommending a product that a client may be interested in, based on past purchases Recommender System
Reinforcement
Building an intelligent bot for a game Learning
Machine Learning Applications
Image Recognition
• Image recognition is one of the most common applications of machine learning. It is used to identify
objects, persons, places, digital images, etc. The popular use case of image recognition and face
detection is, Automatic friend tagging suggestion:
• Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo with
our Facebook friends, then we automatically get a tagging suggestion with name, and the technology
behind this is machine learning's face detection and recognition algorithm.
• It is based on the Facebook project named "Deep Face," which is responsible for face recognition and
person identification in the picture.
Speech Recognition
• While using Google, we get an option of "Search by voice," it comes under speech recognition, and
it's a popular application of machine learning.
• Speech recognition is a process of converting voice instructions into text, and it is also known as
"Speech to text", or "Computer speech recognition." At present, machine learning algorithms are
widely used by various applications of speech recognition. Google assistant, Siri, Cortana,
and Alexa are using speech recognition technology to follow the voice instructions.
Machine Learning Applications
Traffic prediction
• If we want to visit a new place, we take help of Google Maps, which shows us the correct path with the shortest route and
predicts the traffic conditions.
• It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily congested with the help of two
ways:
• Real Time location of the vehicle form Google Map app and sensors. Average time has taken on past days at the same time.
Self-driving cars
• One of the most exciting applications of machine learning is self-driving cars. Machine learning plays a significant role in
self-driving cars. Tesla, the most popular car manufacturing company is working on self-driving car. It is using unsupervised
learning method to train the car models to detect people and objects while driving.
Product recommendations
• Machine learning is widely used by various e-commerce and entertainment companies such as Amazon, Netflix, etc., for
product recommendation to the user. Whenever we search for some product on Amazon, then we started getting an
advertisement for the same product while internet surfing on the same browser and this is because of machine learning.
• Google understands the user interest using various machine learning algorithms and suggests the product as per customer
interest.
• As similar, when we use Netflix, we find some recommendations for entertainment series, movies, etc., and this is also done
with the help of machine learning.
Machine Learning Applications
Virtual Personal Assistant:
• We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As the name suggests, they help
us in finding the information using our voice instruction. These assistants can help us in various ways just by our voice
instructions such as Play music, call someone, Open an email, Scheduling an appointment, etc.
• These virtual assistants use machine learning algorithms as an important part.
• These assistant record our voice instructions, send it over the server on a cloud, and decode it using ML algorithms and act
accordingly.
Online Fraud Detection:
• Machine learning is making our online transaction safe and secure by detecting fraud transaction. Whenever we perform
some online transaction, there may be various ways that a fraudulent transaction can take place such as fake accounts, fake
ids, and steal money in the middle of a transaction. So to detect this, Feed Forward Neural network helps us by checking
whether it is a genuine transaction or a fraud transaction.
• For each genuine transaction, the output is converted into some hash values, and these values become the input for the
next round. For each genuine transaction, there is a specific pattern which gets change for the fraud transaction hence, it
detects it and makes our online transactions more secure.
Automatic Language Translation:
• Nowadays, if we visit a new place and we are not aware of the language then it is not a problem at all, as for this also
machine learning helps us by converting the text into our known languages. Google's GNMT (Google Neural Machine
Translation) provide this feature, which is a Neural Machine Learning that translates the text into our familiar language, and
it called as automatic translation.
• The technology behind the automatic translation is a sequence to sequence learning algorithm, which is used with image
recognition and translates the text from one language to another language
Applications of ML
Warren McCulloch Walter Pitts
Self driving cars on the roads
Movies recommendations
• Amazon product recommendations
• Speech recognition in your smartphone
Steps of Machine Learning
Get the data
Make Predictions
Model Deployment
Happy Analyzing