0% found this document useful (0 votes)
27 views

Business Analytics Question Paper Solution

Uploaded by

Abhinav Vijay
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Business Analytics Question Paper Solution

Uploaded by

Abhinav Vijay
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Answer-

Q1(A):What do you mean by the multiple linear regression model? Write a stochastic
equation of a multiple regression model where the dependent variable is 'debt
payments' and the independent variables are (a) income, and (b) unemployment.

A **multiple linear regression** model predicts the value of a dependent variable


based on two or more independent variables. It assumes a linear relationship between
the dependent and independent variables. The stochastic equation adds a random
error term to account for unexplained variability.

The stochastic equation would be:

Debt Payments = beta_0 + beta_1(Income) + beta_2(Unemployment) + epsilon

Where:

- beta_0: Intercept (constant term)

- beta_1, beta_2: Coefficients for Income and Unemployment

- epsilon: Random error term (stochastic component)

---

Q1(B): Interpret the output from the R-produced table.

The table provides the output for the linear regression model where "Debt Payments" is
the dependent variable, and "Income" and "Unemployment" are the independent
variables.

- **Intercept**: The value of debt payments when income and unemployment are both
zero is approximately 198.9956, but it is not statistically significant (p = 0.215).

- **Income**: The coefficient for income is 10.5122, which means for every unit
increase in income, debt payments increase by approximately 10.51 units. This variable
is highly significant (p < 0.001).

- **Unemployment**: The coefficient for unemployment is 0.6186, indicating a positive


but non-significant relationship with debt payments (p = 0.929).

- **R-squared (Multiple R-squared: 0.7527)**: About 75.27% of the variance in debt


payments is explained by the model.

- **Adjusted R-squared (0.7312)**: After adjusting for the number of predictors, 73.12%
of the variability in debt payments is explained by the model.
- **F-statistic (35 on 2 and 23 DF, p-value: 1.054e-07)**: The model is statistically
significant overall, meaning the independent variables jointly influence the dependent
variable.

---

Q1(C): *If income is Rs. 80,000 and the unemployment rate is 7.5%, how much will be
the debt payments?*

Using the regression equation:

Debt Payments = 198.9956 + 10.5122 * Income + 0.6186 * Unemployment

For income = 80,000 and unemployment = 7.5,

Debt Payments = 198.9956 + 10.5122 * 80,000 + 0.6186 * 7.5

---

Q2(A): Discuss the various assumptions of a linear multiple regression model.

The key assumptions of a multiple regression model are:

1. Linearity: The relationship between dependent and independent variables is linear.

2. Independence: Observations should be independent of each other.

3. Homoscedasticity: The variance of residuals (errors) should be constant across all


levels of the independent variables.

4. Normality of residuals: The residuals should be approximately normally distributed.

5. No multicollinearity: Independent variables should not be highly correlated with each


other.

---
Q2(B): How will you test for these assumptions using graphical methods?

1. Linearity: Check residual plots. The plot of residuals vs. predicted values should
show no pattern.

2. **Independence**: Durbin-Watson test can be applied; graphically, residual plots


can indicate autocorrelation if there are visible patterns.

3. **Homoscedasticity**: Residuals vs. fitted values plot should show constant spread.

4. **Normality**: Q-Q plot of residuals should closely follow a straight line.

5. **Multicollinearity**: Variance Inflation Factor (VIF) can detect multicollinearity. A


VIF > 10 indicates a potential problem.

---

### **Q2(C)**: *Suggest some remedies to get rid of 'multicollinearity' in a linear


multiple regression model.*

Some remedies include:

1. **Remove highly correlated predictors**: Identify and remove one of the correlated
variables.

2. **Combine predictors**: Use principal component analysis (PCA) or factor analysis


to reduce dimensionality.

3. **Ridge regression**: Use regularization techniques like Ridge or Lasso regression to


mitigate multicollinearity.

---

### **Q3(A)**: *Discuss the use of (a) Quadratic Regression Model, (b) Log-Log Model,
(c) Semi-Log Model, and (d) Exponential Model.*

1. **Quadratic Regression Model**: Used when the relationship between the


dependent and independent variable is non-linear, specifically when there’s a curved
relationship. The model includes a squared term for one or more independent
variables.

Y = beta_0 + beta_1X + beta_2(X)^2 + epsilon

2. Log-Log Model: Both the dependent and independent variables are transformed
using logarithms. It is used when the relationship between the variables follows a
power law.

log(Y) = beta_0 + beta_1log(X) + epsilon

3. **Semi-Log Model**: Either the dependent variable or independent variable is


transformed into a logarithmic form. It is used when the relationship is exponential.

- For dependent variable:

log(Y) = beta_0 + beta_1(X) + epsilon

- For independent variable:

Y = beta_0 + beta_1(log(X)) + epsilon

4. **Exponential Model**: This model is used when the growth rate of the dependent
variable is proportional to its current value, often represented by:

Y = beta_0 e^(beta_1(X)) + epsilon


Q3(B)

Question:

A business analyst is interested in examining how an individual's cigarette consumption


C may be influenced by the price P of a pack of cigarettes and the individual's annual
income I. Using data from 500 individuals, the analyst estimates a log-log model and
obtains the following regression results:
ln(C) = 3.90 - 1.25ln(P) + 0.18ln(I)

p-values:

- Intercept: 0.000

- Price (P): 0.005

- Income (I): 0.400

(a) Interpret the value of the elasticity of demand for cigarettes with respect to price.

- The price elasticity of demand is given by the coefficient of ln(P), which is -1.25.

- This implies that a 1% increase in the price of cigarettes leads to a 1.25% decrease in
cigarette consumption, holding income constant.

- Since the coefficient is negative, cigarettes are a normal good, and the demand
decreases as the price increases.

(b) Interpret the value of the income elasticity of demand for cigarettes.

- The income elasticity of demand is given by the coefficient of ln(I), which is 0.18.

- This implies that a 1% increase in an individual’s income increases cigarette


consumption by 0.18%, holding the price constant.

- Since this is a small positive number, it suggests that cigarettes are a normal good but
not highly sensitive to changes in income.

(c) At the 5% significance level, is the price elasticity of demand statistically significant?

- The p-value for the price elasticity is 0.005, which is less than 0.05.

- Therefore, the price elasticity is statistically significant at the 5% level.

(d) At the 5% significance level, is the income elasticity of demand statistically


significant? Is this result surprising?
- The p-value for the income elasticity is 0.400, which is greater than 0.05.

- Therefore, the income elasticity is ‘not’ statistically significant at the 5% level.

- It may be surprising that income does not significantly affect cigarette consumption,
but this could be due to various factors such as cigarettes being considered a necessity
by certain consumers, regardless of income changes.

(e) Write the R-code for estimating the above regression model:

```R

Assuming data is in a dataframe 'data' with columns C (consumption), P (price), I


(income)

model <- lm(log(C) ~ log(P) + log(I), data = data)

summary(model)

```

Q4(A)

(a) What is a dummy variable and what is a dummy variable trap?

- A ‘dummy variable’ is a numerical variable used in regression analysis to represent


subgroups or categories of the data. For instance, a variable could be 1 for males and 0
for females to represent gender.

- The ‘dummy variable trap’ occurs when two or more dummy variables are highly
correlated, typically when a full set of dummy variables is used to represent all
categories, causing multicollinearity. For example, if both “male” and “female” are
used as dummy variables, one can predict the other. This results in perfect
multicollinearity, making the regression model unstable.

Q4(B)

An analyst interviews 50 employees to determine whether males are paid more than
females, on average, holding experience constant. The regression equation is:

Wage = 40.6060 + 1.1279 * EXPER + 13.9240 * Male


p-values:

- Intercept: 0.000

- Experience: 0.000

- Male: 0.000

(a) Write the estimated model:

Wage = 40.6060 + 1.1279 * EXPER + 13.9240 * Male

Where:

- Wage is the hourly wage in dollars.

- EXPER is years of experience.

- Male is a dummy variable (1 if male, 0 if female).

(b) Do you find the difference in the salary of male and female significant, holding years
of experience constant?

- The coefficient for the dummy variable Male is 13.9240 with a p-value of 0.000, which
is less than 0.05.

- This means that the difference in salary between males and females is statistically
significant, holding years of experience constant.

- Specifically, males earn approximately $13.92 more per hour than females with the
same level of experience.

(c) Predict the salary of a male with 10 years of experience:

Wage = 40.6060 + 1.1279 * 10 + 13.9240 * 1 = 40.6060 + 11.279 + 13.9240 = 65.809

So, the estimated wage is $65.81.


(d) Predict the salary of a female with 10 years of experience:

Wage = 40.6060 + 1.1279 *10 + 13.9240 * 0 = 40.6060 + 11.279 = 51.885

So, the estimated wage is $51.89.

Q4(C)

(a) What is a logistic regression model?

- A ‘logistic regression model’ is a statistical model used for binary classification


problems, where the dependent variable is categorical, typically coded as 0 or 1. Unlike
linear regression, logistic regression predicts the probability of the occurrence of an
event (e.g., success/failure, yes/no).

(b) State the properties of a logistic regression model graphically.

- The graph of a logistic regression model is S-shaped (a sigmoid curve). It represents


how the probability of the dependent variable being 1 changes with the independent
variables.

- The model is bounded between 0 and 1, meaning the predicted probabilities never go
below 0 or above 1.

(c) Discuss the use of the regression model in financial decision-making.

- Logistic regression models are widely used in financial decision-making, especially in


risk analysis, credit scoring, and fraud detection. For instance, a logistic model can be
used to estimate the probability that a borrower will default on a loan based on their
financial history and other characteristics. This helps banks make informed decisions
regarding loan approvals and risk management.

You might also like