MBA 643 - Final Exam Notes
MBA 643 - Final Exam Notes
Classification Accuracy:
Actual Class 1 0
Best Subsets extraction methods to identify the best two logistic regression models:
Best subset is defined by:
1. First sort the Best subset data by decreasing order of Mallow’s Cp value
2. Look for lowest positive Mallow’s Cp value
3. Mallow’s Cp value should be greater than or equal to #Coefficients (# of independent variables)
Note: When Cp is less than or equal to p, it suggests the model is unbiased
Determine the best logistic regression model. Justify. What is the classification rule?
Best logistic regression model is defined by max sensitivity.
Classification: If the estimated probability 𝑝 ̂ > 0.5, then the observation is classified as Class 1, otherwise,
it is classified as Class 0.
Most significant predictor to identify churners? Justify: Based on output, check the lowest p-value (most
significant).
Interpret the meaning of the decile-lift: Our classifier is XX.XX times better in predicting in class 1 than
taking random variables
To achieve a sensitivity of at least 0.70, how much Class 0 error can be tolerated?
Check the test data.
PPR Predictive Positive Rate (Support)
TPR True Positive Rate (Sensitivity, Recall) (1 – Class 1 error rate)
FPR False Positive Rate (1-Specificity) (Class 0 error rate)
CART: Series of questions that successively narrow down observations into smaller and smaller groups of
decreasing impurity.
Overfitting: A full grown tree based on the training data leads to complete overfitting of the data.
How to avoid overfitting: Two ways.
1. Stop the tree growth: by controlling number of splits, minimum of records in a terminal node,
and minimum reduction in impurity
2. Pruning a full-grown tree: It means removing the weakest branches.
Chapter 4:
Table 4.2 shows that Cluster 3 is the smallest, most homogeneous cluster, whereas Cluster 2 is the largest,
most heterogeneous cluster.
In Table 4.3, we compare the average distances between clusters to the average distance within clusters
in Table 4.2.
Hierarchical Clustering - Very sensitive to outliers (Suitable for small set of data <500)
k-Means Clustering - Generally not appropriate for binary or ordinal data because the average is not
meaningful (for larger data set)
Support(apple)= 4/8
Text mining:
Chapter 5: Probability: Introduction to modeling uncertainty
Probability rules:
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
If A & B are two mutually exclusive events: P (𝑨 ∩ 𝑩) = 0
P(A|B) = The probability of event A given the condition that event B has already occurred
Standard deviance = 𝜎
Excel: =BINOM.DIST(x, n, π, FALSE) = P(X = x l n, π) Ex: Exact 4 =BINOM.DIST(4, 20, 0.2, FALSE)
𝑎+𝑏 (𝑏−𝑎)2
𝐸(𝑥) = Var(𝑥) =
2 12
𝑿 − 𝝁
Normal Probability Distribution: 𝒁=
𝝈
The standardized normal distribution (Z) has a
mean of 0 and a standard deviation of 1
To find prob of flight time more than 40,000
40,000−36,500
P(X > 40,000) = P ቀ𝑍 > 5,000
ቁ
=NORM.DIST(X, µ, σ, TRUE)
Prob. for greater than 30,000 but less than 40,000 hours:
The sampling distribution of 𝑋̅ is the probability distribution of all possible values of the sample mean 𝑋̅.
𝝈
̅ = 𝑬(𝑿
EXPECTED VALUE OF 𝑿 ̅) = 𝝁 𝝈𝑿̅ =
√𝒏
Note: When the expected value of a point estimator equals the population parameter, we say the point
estimator is unbiased
𝑃−𝜋
Z= where P = claimed probability, 𝜋 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑔𝑖𝑣𝑒𝑛 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
√𝜋(1−𝜋)
𝑛
Assumptions:
Chapter 6B
Test statistics:
1. Understand H0 & H1. H0 will always be with equal to sign
2. If anything, less than or greater than is mentioned in the statement, consider that as H1.
7. Conclusion
8. P-value
Hypothesis Testing: σ Known
p-Value Approach to Testing:
• The p-value is also called the observed level
of significance
• It is the smallest value of α for which H0 can
be rejected
o If p-value < α , reject H0
o If p-value ≥ α , do not reject H0
To calculate P-value:
=NORM.S.DIST(z-score, TRUE)
• To calculate P-value:
• =T.DIST(t, n-1, TRUE) H1 <
• =1-T.DIST(t, n-1, TRUE) H1 >
• =2 x min(T.DIST(│t│, n-1, TRUE), 1-T.DIST(t, n-1,
TRUE)
If 0.01 < p-value < 0.05, we have strong evidence to conclude that 𝐻1 is true
If 0.05 < p-value < 0.10, we have weak evidence to conclude that 𝐻1 is true
• If 0.01 < p-value < 0.05, we have strong evidence to conclude that 𝐻1 is true
• If 0.05 < p-value < 0.10, we have weak evidence to conclude that 𝐻1 is true