Big Data Scienceassigment
Big Data Scienceassigment
Regression Statistics
0.80717
Multiple R 5
0.65153
R Square 1
Adjusted R 0.64669
Square 1
Standard Error 3.43889
Observations 74
ANOVA
df SS MS F Significance F
Regressio
n 1 1591.99 1591.99 134.6182422 3.79849E-18
851.469 11.8259
Residual 72 3 6
2443.45
Total 73 9
The independent variable is jointly Significant Since P is 3.7E-18 i.e., it means that there is relationship
between independent variable and dependent variable. It is significant since the anything below 0.05
which is significant level, is significant.
Significance of variables is important since it explains the relationship between variables and if that
variable as impact on the outcome of dependent variable, as well as its vital in interpretation of
results.
2. Create the scatterplot of the two variables and add the regression line and the regression
equation to it. Copy the graph here. Do you see any relationship with OLS results? If yes,
comment on it. (2 points) NOTE: Make sure to have correct variables on X and Y axis on the
graph
15 Linear (Predicted Y)
10
5
0
1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000 5,500
X Variable 1
There is a relationship in OLS results in this regression equation and line that means OLS is employed
to find the best-fitting line through the data points. This line is the one that minimizes the overall
vertical distance between the observed data points and the values predicted by the regression
equation.
3. Run the regression with mpg as dependent variable (Y) and all other variables as independent
ones (X). Copy the regression output here.
Regression Statistics
Multiple R 0.8373
0.70107
R Square 1
Adjusted R 0.66936
Square 6
3.32670
Standard Error 8
Observations 74
The model, predictive power Adjusted R square is 0.669366, it means that the model has
moderate a predictive power therefore there is a chance that there is a good fit of this
model to the data and that the model is effective in explaining and predicting the
variation in the dependent variable.
5. Is the model significant overall, or not? Why? Justify your answer. (2 points) HINT: Check the
“significance F” in the output and refer to the slides for its meaning.
This is the value of this model significance this will explain the overall model significance
ANOVA
Significanc
df SS MS F eF
Regressio
n 7 1713.038 244.7198 22.11259 4.49E-15
Residual 66 730.4211 11.06699
Total 73 2443.459
The model overall is Significant since the significance level is 4.49E-15 in 95% confidence level
indicating that it’s below 0.05, therefore the model has statistically significant relationship with
dependent variable.
6. Comment on the significance of the price, rep78 and length variables. Are they significant, or
not? Justify your answer
A P-value of Price P-value is 0.7632 and length 0.165129, indicating that they are
relatively high therefore Price variable and length have insufficient evidence that they
may have any impact on the dependent variable hence, insignificant since any variable
that’s has more than 0.05 considered to have no connection with the dependent variable
and insignificant and the null hypothesis regarding that specific variable should be
rejected. On the other hand, Rep78 P-value is 0.016406 indicating that it is significant and has
relationship with the dependent variable since it is below 0.05.
7. Imagine you have a car which is characterized with the following parameters:
• Price – 5,000
• Rep78 – 4
• Headroom – 3
• Trunk – 12
• Weight – 3,000
• Foreign – 1
• Length – 150
a. What will be the fuel consumption of this car based on the model you obtained?
b. Explain what would be different if the confidence level was 90% instead of 95% (in terms of
model and the predicted value of fuel consumption)?
AT 95% confidence level
Foreca Coefficie
st nt Value
Intercep 43.942
t 1 43.9424 4
-
0.3044
price 5,000 -6.1E-05 5
4.8785
rep78 4 1.219646 85
headroo 0.0043
m 3 0.00145 49
-
0.8392
trunk 12 -0.06994 5
-
11.265
weight 3000 -0.00376 7
foreigne 2.9127
r 1 2.912733 33
-
12.856
Length 150 -0.08571 3
26.472
FUEL 4
Foreca Coefficie
st nt Value
Intercep 43.942
t 1 43.9424 4
-
0.3044
price 5,000 -6.1E-05 5
4.8785
rep78 4 1.219646 85
headroo 0.0043
m 3 0.00145 49
-
0.8392
trunk 12 -0.06994 5
-
11.265
weight 3000 -0.00376 7
foreigne 2.9127
r 1 2.912733 33
-
12.856
Length 150 -0.08571 3
26.472
FUEL 4
8. Comment on the coefficient of the rep78 variable based on the results you obtained in the last
model. What does it show?
The coefficient of rep78 is 1.219646 with a total P level of 0.016406 that shown there is a greater
significance and that this variable has impact on the dependent variable, hence, this regression model
indicates that while all other factors remain constant, a one-unit increase in the rep78 variable
corresponds to a 1.22 unit increase in the car's estimated mpg.
9. Comment on the coefficient of the foreign variable based on the results you obtained in the last
model. What does it show?
In the regression model, the coefficient of the foreign variable is 2.91 and significant at 0.03 that
means foreign variables has impact on the dependent variable. Therefore, it has positive relationship
and has higher anticipated mileage.