0% found this document useful (0 votes)
6 views

FInal Assignment

Uploaded by

niranth sai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

FInal Assignment

Uploaded by

niranth sai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

26/07/2024, 09:30 Untitled17.

ipynb - Colab

a.)

1
2 import pandas as pd
3 from scipy import stats
4 import statsmodels.formula.api as sm
5
6 # Load the data
7 df = pd.read_excel("HousePrices.xls")
8
9 # Fit the linear model
10 model = sm.ols('sell ~ lot + bdms + fb + sty + drv + rec + ffin + ghw + ca + gar + reg', data=df).fit()
11
12 # Perform the Harvey-Collier test for linearity
13 test_statistic = stats.t.ppf(0.975, df=model.df_resid) * model.resid.std() / (df.shape[0] ** 0.5)
14 p_value = 2 * (1 - stats.t.cdf(abs(test_statistic), df=model.df_resid))
15
16 print("Harvey-Collier test statistic:", test_statistic)
17 print("p-value:", p_value)
18
19 # Interpret the results
20 if p_value < 0.05:
21 print("Reject the null hypothesis of linearity.")
22 else:
23 print("Fail to reject the null hypothesis of linearity.")
24

Harvey-Collier test statistic: 1283.4640041194803


p-value: 0.0
Reject the null hypothesis of linearity.

b.) It suggests that after taking a log transformation, the data is linear

1
2 # Fit the linear model with log of sales price
3 model = sm.ols('np.log(sell) ~ lot + bdms + fb + sty + drv + rec + ffin + ghw + ca + gar + reg', data=df).fit()
4
5 # Perform the Harvey-Collier test for linearity
6 test_statistic = stats.t.ppf(0.975, df=model.df_resid) * model.resid.std() / (df.shape[0] ** 0.5)
7 p_value = 2 * (1 - stats.t.cdf(abs(test_statistic), df=model.df_resid))
8
9 print("Harvey-Collier test statistic:", test_statistic)
10 print("p-value:", p_value)
11
12 # Interpret the results
13 if p_value < 0.05:
14 print("Reject the null hypothesis of linearity.")
15 else:
16 print("Fail to reject the null hypothesis of linearity.")
17

Harvey-Collier test statistic: 0.0177843752056881


p-value: 0.9858175122621733
Fail to reject the null hypothesis of linearity.

c.)

https://colab.research.google.com/drive/14bGc65T62nRZlUHX3xrAK7JYBm-D8MqG?authuser=1#scrollTo=w5jRlxpgMRg9&printMode=true 1/9
26/07/2024, 09:30 Untitled17.ipynb - Colab
1
2 model = sm.ols('np.log(sell) ~ lot + np.log(lot) + bdms + fb + sty + drv + rec + ffin + ghw + ca + gar + reg', d
3
4 # Perform the Harvey-Collier test for linearity
5 test_statistic = stats.t.ppf(0.975, df=model.df_resid) * model.resid.std() / (df.shape[0] ** 0.5)
6 p_value = 2 * (1 - stats.t.cdf(abs(test_statistic), df=model.df_resid))
7
8 print("Harvey-Collier test statistic:", test_statistic)
9 print("p-value:", p_value)
10
11 # Interpret the results
12 if p_value < 0.05:
13 print("Reject the null hypothesis of linearity.")
14 else:
15 print("Fail to reject the null hypothesis of linearity.")
16

Harvey-Collier test statistic: 0.017494672337961523


p-value: 0.9860485297639092
Fail to reject the null hypothesis of linearity.

d.)

1
2 import numpy as np
3 # Create interaction terms
4 df['lot_bdms'] = np.log(df['lot']) * df['bdms']
5 df['lot_fb'] = np.log(df['lot']) * df['fb']
6 df['lot_sty'] = np.log(df['lot']) * df['sty']
7 df['lot_drv'] = np.log(df['lot']) * df['drv']
8 df['lot_rec'] = np.log(df['lot']) * df['rec']
9 df['lot_ffin'] = np.log(df['lot']) * df['ffin']
10 df['lot_ghw'] = np.log(df['lot']) * df['ghw']
11 df['lot_ca'] = np.log(df['lot']) * df['ca']
12 df['lot_gar'] = np.log(df['lot']) * df['gar']
13 df['lot_reg'] = np.log(df['lot']) * df['reg']
14
15 # Fit the model with interaction terms
16 model_interaction = sm.ols('np.log(sell) ~ lot + np.log(lot) + bdms + fb + sty + drv + rec + ffin + ghw + ca + g
17
18 # Count significant interaction terms
19 significant_interactions = 0
20 for i in range(13, len(model_interaction.pvalues)):
21 if model_interaction.pvalues[i] < 0.05:
22 significant_interactions += 1
23
24 print("Number of individually significant interaction terms:", significant_interactions)
25

Number of individually significant interaction terms: 2

e.)

1
2 null_model = sm.ols('np.log(sell) ~ lot + np.log(lot) + bdms + fb + sty + drv + rec + ffin + ghw + ca + gar + re
3 f_statistic = ((null_model.ssr - model_interaction.ssr) / 11) / (model_interaction.ssr / (df.shape[0] - 24))
4 p_value = 1 - stats.f.cdf(f_statistic, dfn=11, dfd=df.shape[0] - 24)
5
6 print("F-statistic:", f_statistic)
7 print("p-value:", p_value)
8
9 # Interpret the results
10 if p_value < 0.05:
11 print("Reject the null hypothesis of no joint significance of interaction effects.")
12 else:
13 print("Fail to reject the null hypothesis of no joint significance of interaction effects.")
14

F-statistic: 1.7161547474045966
p-value: 0.0666292461322634
Fail to reject the null hypothesis of no joint significance of interaction effects.

f.)

https://colab.research.google.com/drive/14bGc65T62nRZlUHX3xrAK7JYBm-D8MqG?authuser=1#scrollTo=w5jRlxpgMRg9&printMode=true 2/9
26/07/2024, 09:30 Untitled17.ipynb - Colab

1 print(model_interaction.summary())

OLS Regression Results


==============================================================================
Dep. Variable: np.log(sell) R-squared: 0.698
Model: OLS Adj. R-squared: 0.685
Method: Least Squares F-statistic: 54.94
Date: Fri, 26 Jul 2024 Prob (F-statistic): 1.53e-120
Time: 03:59:52 Log-Likelihood: 92.541
No. Observations: 546 AIC: -139.1
Df Residuals: 523 BIC: -40.12
Df Model: 22
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept 8.0174 1.149 6.979 0.000 5.761 10.274
lot -4.364e-05 1.96e-05 -2.224 0.027 -8.22e-05 -5.1e-06
np.log(lot) 0.2896 0.142 2.041 0.042 0.011 0.568
bdms -0.1092 0.331 -0.330 0.741 -0.759 0.540
fb -0.4677 0.430 -1.088 0.277 -1.312 0.377
sty 0.6722 0.319 2.105 0.036 0.045 1.300
drv -2.0564 0.763 -2.696 0.007 -3.555 -0.558
rec 1.6827 0.653 2.575 0.010 0.399 2.966
ffin 0.1235 0.449 0.275 0.784 -0.759 1.006
ghw -0.5977 0.900 -0.664 0.507 -2.366 1.171
ca -0.4237 0.496 -0.855 0.393 -1.397 0.550
gar 0.2174 0.271 0.803 0.422 -0.314 0.749
reg 0.1626 0.478 0.340 0.734 -0.777 1.103
lot_bdms 0.0173 0.039 0.443 0.658 -0.060 0.094
lot_fb 0.0737 0.050 1.466 0.143 -0.025 0.172
lot_sty -0.0680 0.037 -1.832 0.067 -0.141 0.005
lot_drv 0.2629 0.093 2.834 0.005 0.081 0.445
lot_rec -0.1905 0.076 -2.503 0.013 -0.340 -0.041
lot_ffin -0.0024 0.053 -0.045 0.964 -0.107 0.102
lot_ghw 0.0923 0.107 0.865 0.387 -0.117 0.302
lot_ca 0.0687 0.058 1.186 0.236 -0.045 0.183
lot_gar -0.0198 0.032 -0.627 0.531 -0.082 0.042
lot_reg -0.0031 0.056 -0.056 0.956 -0.113 0.107
==============================================================================
Omnibus: 7.359 Durbin-Watson: 1.517
Prob(Omnibus): 0.025 Jarque-Bera (JB): 8.488
Skew: -0.176 Prob(JB): 0.0144
Kurtosis: 3.499 Cond. No. 7.69e+05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.69e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

1 #We can eliminate log_ffin and refit


2 model_interaction = sm.ols('np.log(sell) ~ lot + np.log(lot) + bdms + fb + sty + drv + rec + ffin + ghw + ca + g
3 print(model_interaction.summary())

OLS Regression Results


==============================================================================
Dep. Variable: np.log(sell) R-squared: 0.698
Model: OLS Adj. R-squared: 0.686
Method: Least Squares F-statistic: 57.66
Date: Fri, 26 Jul 2024 Prob (F-statistic): 1.99e-121
Time: 03:59:54 Log-Likelihood: 92.540
No. Observations: 546 AIC: -141.1
Df Residuals: 524 BIC: -46.42
Df Model: 21
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept 8.0256 1.133 7.082 0.000 5.799 10.252
lot -4.35e-05 1.94e-05 -2.246 0.025 -8.15e-05 -5.46e-06
np.log(lot) 0.2885 0.140 2.065 0.039 0.014 0.563
bdms -0.1070 0.327 -0.327 0.743 -0.749 0.535
fb -0.4675 0.429 -1.089 0.277 -1.311 0.376
sty 0.6696 0.314 2.134 0.033 0.053 1.286
drv -2.0576 0.761 -2.702 0.007 -3.554 -0.562
rec 1.6902 0.631 2.677 0.008 0.450 2.931
ffin 0.1033 0.022 4.730 0.000 0.060 0.146
ghw -0.5946 0.897 -0.663 0.508 -2.356 1.167
ca -0.4210 0.491 -0.857 0.392 -1.387 0.544
gar 0.2180 0.270 0.807 0.420 -0.313 0.749

https://colab.research.google.com/drive/14bGc65T62nRZlUHX3xrAK7JYBm-D8MqG?authuser=1#scrollTo=w5jRlxpgMRg9&printMode=true 3/9
26/07/2024, 09:30 Untitled17.ipynb - Colab
reg 0.1679 0.463 0.362 0.717 -0.742 1.078
lot_bdms 0.0171 0.039 0.442 0.659 -0.059 0.093
lot_fb 0.0736 0.050 1.468 0.143 -0.025 0.172
lot_sty -0.0677 0.036 -1.859 0.064 -0.139 0.004
lot_drv 0.2631 0.093 2.840 0.005 0.081 0.445
lot_rec -0.1914 0.073 -2.606 0.009 -0.336 -0.047
lot_ghw 0.0919 0.106 0.865 0.387 -0.117 0.301
lot_ca 0.0684 0.057 1.191 0.234 -0.044 0.181
lot_gar -0.0199 0.031 -0.630 0.529 -0.082 0.042
lot_reg -0.0037 0.054 -0.069 0.945 -0.110 0.102
==============================================================================
Omnibus: 7.366 Durbin-Watson: 1.517
Prob(Omnibus): 0.025 Jarque-Bera (JB): 8.495
Skew: -0.177 Prob(JB): 0.0143
Kurtosis: 3.499 Cond. No. 7.61e+05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.61e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

1 #Next removing log_reg


2 model_interaction = sm.ols('np.log(sell) ~ lot + np.log(lot) + bdms + fb + sty + drv + rec + ffin + ghw + ca + g
3 print(model_interaction.summary())

OLS Regression Results


==============================================================================
Dep. Variable: np.log(sell) R-squared: 0.698
Model: OLS Adj. R-squared: 0.686
Method: Least Squares F-statistic: 60.66
Date: Fri, 26 Jul 2024 Prob (F-statistic): 2.53e-122
Time: 03:59:55 Log-Likelihood: 92.538
No. Observations: 546 AIC: -143.1
Df Residuals: 525 BIC: -52.72
Df Model: 20
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept 8.0262 1.132 7.089 0.000 5.802 10.250
lot -4.339e-05 1.93e-05 -2.250 0.025 -8.13e-05 -5.51e-06
np.log(lot) 0.2883 0.140 2.066 0.039 0.014 0.563
bdms -0.1071 0.327 -0.328 0.743 -0.749 0.534
fb -0.4686 0.429 -1.093 0.275 -1.311 0.373
sty 0.6728 0.310 2.169 0.031 0.063 1.282
drv -2.0455 0.741 -2.762 0.006 -3.501 -0.590
rec 1.6914 0.631 2.682 0.008 0.452 2.930
ffin 0.1032 0.022 4.742 0.000 0.060 0.146
ghw -0.5975 0.895 -0.668 0.505 -2.356 1.161
ca -0.4218 0.491 -0.859 0.391 -1.386 0.543
gar 0.2174 0.270 0.806 0.421 -0.312 0.747
reg 0.1358 0.023 5.944 0.000 0.091 0.181
lot_bdms 0.0171 0.039 0.443 0.658 -0.059 0.093
lot_fb 0.0738 0.050 1.473 0.141 -0.025 0.172
lot_sty -0.0680 0.036 -1.890 0.059 -0.139 0.003
lot_drv 0.2616 0.090 2.900 0.004 0.084 0.439
lot_rec -0.1915 0.073 -2.611 0.009 -0.336 -0.047
lot_ghw 0.0923 0.106 0.871 0.384 -0.116 0.300
lot_ca 0.0685 0.057 1.194 0.233 -0.044 0.181
lot_gar -0.0198 0.031 -0.629 0.530 -0.082 0.042
==============================================================================
Omnibus: 7.425 Durbin-Watson: 1.516
Prob(Omnibus): 0.024 Jarque-Bera (JB): 8.574
Skew: -0.178 Prob(JB): 0.0137
Kurtosis: 3.501 Cond. No. 7.60e+05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.6e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

1 #Next removing lot_bdms


2 model_interaction = sm.ols('np.log(sell) ~ lot + np.log(lot) + bdms + fb + sty + drv + rec + ffin + ghw + ca + g
3 print(model_interaction.summary())

OLS Regression Results


==============================================================================
Dep. Variable: np.log(sell) R-squared: 0.698
Model: OLS Adj. R-squared: 0.687
Method: Least Squares F-statistic: 63.94
https://colab.research.google.com/drive/14bGc65T62nRZlUHX3xrAK7JYBm-D8MqG?authuser=1#scrollTo=w5jRlxpgMRg9&printMode=true 4/9
26/07/2024, 09:30 Untitled17.ipynb - Colab
Date: Fri, 26 Jul 2024 Prob (F-statistic): 3.43e-123
Time: 03:59:56 Log-Likelihood: 92.436
No. Observations: 546 AIC: -144.9
Df Residuals: 526 BIC: -58.82
Df Model: 19
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept 7.7726 0.976 7.965 0.000 5.856 9.690
lot -4.205e-05 1.9e-05 -2.210 0.028 -7.94e-05 -4.67e-06
np.log(lot) 0.3179 0.122 2.597 0.010 0.077 0.558
bdms 0.0372 0.015 2.562 0.011 0.009 0.066
fb -0.5462 0.391 -1.398 0.163 -1.314 0.221
sty 0.6253 0.291 2.150 0.032 0.054 1.197
drv -1.9816 0.726 -2.730 0.007 -3.408 -0.556
rec 1.6914 0.630 2.684 0.008 0.453 2.929
ffin 0.1031 0.022 4.745 0.000 0.060 0.146
ghw -0.5753 0.893 -0.644 0.520 -2.329 1.179
ca -0.4150 0.490 -0.846 0.398 -1.378 0.548
gar 0.2124 0.269 0.789 0.431 -0.317 0.741
reg 0.1357 0.023 5.944 0.000 0.091 0.181
lot_fb 0.0830 0.046 1.821 0.069 -0.007 0.172
lot_sty -0.0625 0.034 -1.852 0.065 -0.129 0.004
lot_drv 0.2539 0.088 2.871 0.004 0.080 0.428
lot_rec -0.1913 0.073 -2.611 0.009 -0.335 -0.047
lot_ghw 0.0897 0.106 0.848 0.397 -0.118 0.297
lot_ca 0.0677 0.057 1.180 0.238 -0.045 0.180
lot_gar -0.0192 0.031 -0.611 0.541 -0.081 0.042
==============================================================================
Omnibus: 7.502 Durbin-Watson: 1.522
Prob(Omnibus): 0.023 Jarque-Bera (JB): 8.679
Skew: -0.179 Prob(JB): 0.0130
Kurtosis: 3.504 Cond. No. 6.62e+05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6.62e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

Hence thisbis the final model after removing variables with high p-values

1 #Removing lot_gar
2 model_interaction = sm.ols('np.log(sell) ~ lot + np.log(lot) + bdms + fb + sty + drv + rec + ffin + ghw + ca + g
3 print(model_interaction.summary())

OLS Regression Results


==============================================================================
Dep. Variable: np.log(sell) R-squared: 0.698
Model: OLS Adj. R-squared: 0.687
Method: Least Squares F-statistic: 67.55
Date: Fri, 26 Jul 2024 Prob (F-statistic): 4.94e-124
Time: 03:59:57 Log-Likelihood: 92.242
No. Observations: 546 AIC: -146.5
Df Residuals: 527 BIC: -64.73
Df Model: 18
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept 7.6671 0.960 7.988 0.000 5.781 9.553
lot -4.575e-05 1.8e-05 -2.538 0.011 -8.12e-05 -1.03e-05
np.log(lot) 0.3326 0.120 2.772 0.006 0.097 0.568
bdms 0.0371 0.015 2.558 0.011 0.009 0.066
fb -0.5418 0.390 -1.387 0.166 -1.309 0.225
sty 0.6332 0.290 2.181 0.030 0.063 1.204
drv -1.9351 0.721 -2.682 0.008 -3.352 -0.518
rec 1.6678 0.629 2.653 0.008 0.433 2.903
ffin 0.1029 0.022 4.738 0.000 0.060 0.146
ghw -0.5696 0.892 -0.638 0.524 -2.323 1.183
ca -0.3560 0.480 -0.741 0.459 -1.300 0.588
gar 0.0479 0.011 4.185 0.000 0.025 0.070
reg 0.1347 0.023 5.920 0.000 0.090 0.179
lot_fb 0.0825 0.046 1.813 0.070 -0.007 0.172
lot_sty -0.0635 0.034 -1.883 0.060 -0.130 0.003
lot_drv 0.2484 0.088 2.825 0.005 0.076 0.421
lot_rec -0.1885 0.073 -2.579 0.010 -0.332 -0.045
lot_ghw 0.0891 0.106 0.843 0.399 -0.118 0.297
lot_ca 0.0607 0.056 1.081 0.280 -0.050 0.171
==============================================================================
Omnibus: 7.711 Durbin-Watson: 1.519
https://colab.research.google.com/drive/14bGc65T62nRZlUHX3xrAK7JYBm-D8MqG?authuser=1#scrollTo=w5jRlxpgMRg9&printMode=true 5/9
26/07/2024, 09:30 Untitled17.ipynb - Colab
Prob(Omnibus): 0.021 Jarque-Bera (JB): 8.977
Skew: -0.181 Prob(JB): 0.0112
Kurtosis: 3.513 Cond. No. 6.51e+05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6.51e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

1 # Removing lot_ghw
2 model_interaction = sm.ols('np.log(sell) ~ lot + np.log(lot) + bdms + fb + sty + drv + rec + ffin + ghw + ca + g
3 print(model_interaction.summary())

OLS Regression Results


==============================================================================
Dep. Variable: np.log(sell) R-squared: 0.697
Model: OLS Adj. R-squared: 0.687
Method: Least Squares F-statistic: 71.52
Date: Fri, 26 Jul 2024 Prob (F-statistic): 8.17e-125
Time: 03:59:58 Log-Likelihood: 91.874
No. Observations: 546 AIC: -147.7
Df Residuals: 528 BIC: -70.30
Df Model: 17
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept 7.6079 0.957 7.950 0.000 5.728 9.488
lot -4.522e-05 1.8e-05 -2.511 0.012 -8.06e-05 -9.84e-06
np.log(lot) 0.3392 0.120 2.834 0.005 0.104 0.574
bdms 0.0372 0.015 2.563 0.011 0.009 0.066
fb -0.5339 0.390 -1.368 0.172 -1.301 0.233
sty 0.6341 0.290 2.184 0.029 0.064 1.204
drv -1.9104 0.721 -2.651 0.008 -3.326 -0.495
rec 1.6771 0.628 2.669 0.008 0.443 2.911
ffin 0.1043 0.022 4.817 0.000 0.062 0.147
ghw 0.1821 0.044 4.167 0.000 0.096 0.268
ca -0.3400 0.480 -0.709 0.479 -1.283 0.603
gar 0.0476 0.011 4.157 0.000 0.025 0.070
reg 0.1335 0.023 5.879 0.000 0.089 0.178
lot_fb 0.0817 0.046 1.795 0.073 -0.008 0.171
lot_sty -0.0636 0.034 -1.887 0.060 -0.130 0.003
lot_drv 0.2455 0.088 2.795 0.005 0.073 0.418
lot_rec -0.1895 0.073 -2.594 0.010 -0.333 -0.046
lot_ca 0.0586 0.056 1.046 0.296 -0.052 0.169
==============================================================================
Omnibus: 7.815 Durbin-Watson: 1.524
Prob(Omnibus): 0.020 Jarque-Bera (JB): 8.800
Skew: -0.197 Prob(JB): 0.0123
Kurtosis: 3.481 Cond. No. 6.44e+05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6.44e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

1 #Removing lot_ca
2 model_interaction = sm.ols('np.log(sell) ~ lot + np.log(lot) + bdms + fb + sty + drv + rec + ffin + ghw + ca + g
3 print(model_interaction.summary())

OLS Regression Results


==============================================================================
Dep. Variable: np.log(sell) R-squared: 0.697
Model: OLS Adj. R-squared: 0.687
Method: Least Squares F-statistic: 75.91
Date: Fri, 26 Jul 2024 Prob (F-statistic): 1.58e-125
Time: 03:59:58 Log-Likelihood: 91.309
No. Observations: 546 AIC: -148.6
Df Residuals: 529 BIC: -75.47
Df Model: 16
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept 7.6437 0.956 7.991 0.000 5.765 9.523
lot -4.293e-05 1.79e-05 -2.401 0.017 -7.81e-05 -7.81e-06
np.log(lot) 0.3338 0.120 2.791 0.005 0.099 0.569
bdms 0.0358 0.014 2.477 0.014 0.007 0.064
fb -0.5356 0.390 -1.372 0.171 -1.302 0.231
sty 0.6086 0.289 2.104 0.036 0.040 1.177
https://colab.research.google.com/drive/14bGc65T62nRZlUHX3xrAK7JYBm-D8MqG?authuser=1#scrollTo=w5jRlxpgMRg9&printMode=true 6/9
26/07/2024, 09:30 Untitled17.ipynb - Colab
drv -1.9097 0.721 -2.650 0.008 -3.326 -0.494
rec 1.6078 0.625 2.573 0.010 0.380 2.835
ffin 0.1042 0.022 4.810 0.000 0.062 0.147
ghw 0.1789 0.044 4.104 0.000 0.093 0.265
ca 0.1612 0.021 7.585 0.000 0.119 0.203
gar 0.0488 0.011 4.282 0.000 0.026 0.071
reg 0.1356 0.023 5.998 0.000 0.091 0.180
lot_fb 0.0820 0.046 1.801 0.072 -0.007 0.171
lot_sty -0.0604 0.034 -1.801 0.072 -0.126 0.005
lot_drv 0.2454 0.088 2.793 0.005 0.073 0.418
lot_rec -0.1815 0.073 -2.498 0.013 -0.324 -0.039
==============================================================================
Omnibus: 7.552 Durbin-Watson: 1.522
Prob(Omnibus): 0.023 Jarque-Bera (JB): 8.588
Skew: -0.187 Prob(JB): 0.0137
Kurtosis: 3.488 Cond. No. 6.44e+05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6.44e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

1 #Remvoign lot-sty
2 model_interaction = sm.ols('np.log(sell) ~ lot + np.log(lot) + bdms + fb + sty + drv + rec + ffin + ghw + ca + g
3 print(model_interaction.summary())

OLS Regression Results


==============================================================================
Dep. Variable: np.log(sell) R-squared: 0.695
Model: OLS Adj. R-squared: 0.686
Method: Least Squares F-statistic: 80.42
Date: Fri, 26 Jul 2024 Prob (F-statistic): 8.55e-126
Time: 03:59:59 Log-Likelihood: 89.641
No. Observations: 546 AIC: -147.3
Df Residuals: 530 BIC: -78.44
Df Model: 15
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept 8.3895 0.864 9.710 0.000 6.692 10.087
lot -3.634e-05 1.75e-05 -2.072 0.039 -7.08e-05 -1.89e-06
np.log(lot) 0.2425 0.109 2.234 0.026 0.029 0.456
bdms 0.0402 0.014 2.814 0.005 0.012 0.068
fb -0.3594 0.379 -0.949 0.343 -1.103 0.384
sty 0.0882 0.013 7.019 0.000 0.064 0.113
drv -1.7074 0.713 -2.393 0.017 -3.109 -0.306
rec 1.6293 0.626 2.602 0.010 0.399 2.859
ffin 0.1070 0.022 4.945 0.000 0.064 0.150
ghw 0.1851 0.044 4.250 0.000 0.100 0.271
ca 0.1603 0.021 7.529 0.000 0.119 0.202
gar 0.0475 0.011 4.171 0.000 0.025 0.070
reg 0.1373 0.023 6.065 0.000 0.093 0.182
lot_fb 0.0606 0.044 1.377 0.169 -0.026 0.147
lot_drv 0.2216 0.087 2.546 0.011 0.051 0.393
lot_rec -0.1840 0.073 -2.527 0.012 -0.327 -0.041
==============================================================================
Omnibus: 7.661 Durbin-Watson: 1.522
Prob(Omnibus): 0.022 Jarque-Bera (JB): 8.931
Skew: -0.179 Prob(JB): 0.0115
Kurtosis: 3.514 Cond. No. 6.12e+05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6.12e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

1 #Removing lot_fb
2 model_interaction = sm.ols('np.log(sell) ~ lot + np.log(lot) + bdms + fb + sty + drv + rec + ffin + ghw + ca + g
3 print(model_interaction.summary())

OLS Regression Results


==============================================================================
Dep. Variable: np.log(sell) R-squared: 0.694
Model: OLS Adj. R-squared: 0.686
Method: Least Squares F-statistic: 85.88
Date: Fri, 26 Jul 2024 Prob (F-statistic): 2.31e-126
Time: 03:59:59 Log-Likelihood: 88.667
No. Observations: 546 AIC: -147.3
Df Residuals: 531 BIC: -82.79
https://colab.research.google.com/drive/14bGc65T62nRZlUHX3xrAK7JYBm-D8MqG?authuser=1#scrollTo=w5jRlxpgMRg9&printMode=true 7/9
26/07/2024, 09:30 Untitled17.ipynb - Colab
Df Model: 14
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept 7.8757 0.780 10.099 0.000 6.344 9.408
lot -3.234e-05 1.73e-05 -1.868 0.062 -6.63e-05 1.66e-06
np.log(lot) 0.2996 0.100 2.985 0.003 0.102 0.497
bdms 0.0402 0.014 2.812 0.005 0.012 0.068
fb 0.1611 0.020 7.972 0.000 0.121 0.201
sty 0.0898 0.013 7.168 0.000 0.065 0.114
drv -1.6839 0.714 -2.359 0.019 -3.086 -0.282
rec 1.5694 0.625 2.511 0.012 0.341 2.797
ffin 0.1038 0.022 4.819 0.000 0.061 0.146
ghw 0.1849 0.044 4.242 0.000 0.099 0.271
ca 0.1605 0.021 7.529 0.000 0.119 0.202
gar 0.0471 0.011 4.136 0.000 0.025 0.070
reg 0.1377 0.023 6.076 0.000 0.093 0.182
lot_drv 0.2190 0.087 2.514 0.012 0.048 0.390
lot_rec -0.1770 0.073 -2.435 0.015 -0.320 -0.034
==============================================================================
Omnibus: 8.226 Durbin-Watson: 1.525
Prob(Omnibus): 0.016 Jarque-Bera (JB): 9.680
Skew: -0.189 Prob(JB): 0.00791
Kurtosis: 3.532 Cond. No. 5.74e+05
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.74e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

g.) this can lead to biased estimates of the coefficients of the included variables. This bias occurs because the omitted variable may be
correlated with both the dependent variable and one or more of the included explanatory variables. The sale price will be likely
overestimated

f.)

1
2 train_df = df.iloc[:400]
3 model = sm.ols('np.log(sell) ~ lot + np.log(lot) + bdms + fb + sty + drv + rec + ffin + ghw + ca + gar + reg', da
4
5 # Make predictions for the remaining 146 observations
6 test_df = df.iloc[400:]
7 predictions = model.predict(test_df)
8
9 # Calculate the mean squared prediction error
10 mse = np.mean((np.log(test_df['sell']) - predictions) ** 2)
11 print("Mean squared prediction error:", mse)
12

Mean squared prediction error: 0.030038061219574093

1 print(np.log(df["sell"]))

0 10.645425
1 10.558414
2 10.809728
3 11.010399
4 11.018629
...
541 11.424094
542 11.451050
543 11.542484
544 11.561716
545 11.561716
Name: sell, Length: 546, dtype: float64

1 #The error is pretty high considering the distriubution and variance of log of predicted prices

1 Start coding or generate with AI.

https://colab.research.google.com/drive/14bGc65T62nRZlUHX3xrAK7JYBm-D8MqG?authuser=1#scrollTo=w5jRlxpgMRg9&printMode=true 8/9
26/07/2024, 09:30 Untitled17.ipynb - Colab

https://colab.research.google.com/drive/14bGc65T62nRZlUHX3xrAK7JYBm-D8MqG?authuser=1#scrollTo=w5jRlxpgMRg9&printMode=true 9/9

You might also like