07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
07bRegresionLinealBostonVerdConEstandarizacion
localhost:8888/notebooks/Desktop/Ultimo-RegresionLineal/02RegresionLineal/07bRegresionLinealBostonVerdConEstandarizacion.ipynb# 1/17
24/10/22, 13:10 07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
In [2]:
C:\Users\Jimmy\anaconda3\lib\site-packages\sklearn\utils\deprecation.py:87:
FutureWarning: Function load_boston is deprecated; `load_boston` is deprecat
ed in 1.0 and will be removed in 1.2.
The Boston housing prices dataset has an ethical problem. You can refer
to
dataset unless the purpose of the code is to study and educate about
In this special case, you can fetch the dataset from the original
source::
import pandas as pd
import numpy as np
data_url = "http://lib.stat.cmu.edu/datasets/boston"
target = raw_df.values[1::2, 2]
housing = fetch_california_housing()
warnings.warn(msg, category=FutureWarning)
In [3]:
Out[3]:
localhost:8888/notebooks/Desktop/Ultimo-RegresionLineal/02RegresionLineal/07bRegresionLinealBostonVerdConEstandarizacion.ipynb# 2/17
24/10/22, 13:10 07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
In [4]:
print(boston['DESCR'])
.. _boston_dataset:
---------------------------
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/ (https://
archive.ics.uci.edu/ml/machine-learning-databases/housing/)
This dataset was taken from the StatLib library which is maintained at Carne
gie Mellon University.
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostic
s
...', Wiley, 1980. N.B. Various transformations are used in the table on
The Boston house-price data has been used in many machine learning papers th
at address regression
problems.
.. topic:: References
In [5]:
In [6]:
df_entrada.head()
Out[6]:
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5
In [7]:
df_salida=pd.DataFrame(data=boston['target'],columns=['valor m'])
In [8]:
df_salida.head()
Out[8]:
valor m
0 24.0
1 21.6
2 34.7
3 33.4
4 36.2
In [9]:
df = pd.concat([df_entrada,df_salida],axis=1)
localhost:8888/notebooks/Desktop/Ultimo-RegresionLineal/02RegresionLineal/07bRegresionLinealBostonVerdConEstandarizacion.ipynb# 4/17
24/10/22, 13:10 07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
In [10]:
df.head()
Out[10]:
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5
Exploración de datos
In [11]:
sns.displot(df['valor m'],bins=30)
Out[11]:
<seaborn.axisgrid.FacetGrid at 0x1bed6189c10>
localhost:8888/notebooks/Desktop/Ultimo-RegresionLineal/02RegresionLineal/07bRegresionLinealBostonVerdConEstandarizacion.ipynb# 5/17
24/10/22, 13:10 07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
In [12]:
sns.pairplot(df)
Out[12]:
<seaborn.axisgrid.PairGrid at 0x1bed61f4910>
localhost:8888/notebooks/Desktop/Ultimo-RegresionLineal/02RegresionLineal/07bRegresionLinealBostonVerdConEstandarizacion.ipynb# 6/17
24/10/22, 13:10 07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
In [13]:
sns.heatmap(df.corr(),annot=True)
Out[13]:
<AxesSubplot:>
X = df.drop('valor m',axis=1)
#X
In [15]:
y = df['valor m']
#y
localhost:8888/notebooks/Desktop/Ultimo-RegresionLineal/02RegresionLineal/07bRegresionLinealBostonVerdConEstandarizacion.ipynb# 7/17
24/10/22, 13:10 07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
In [17]:
In [18]:
Estandarizacion
In [19]:
In [20]:
escala = StandardScaler()
In [21]:
escala.fit(X_train)
Out[21]:
StandardScaler()
In [22]:
X_train_escala = escala.transform(X_train)
In [23]:
X_test_escala = escala.transform(X_test)
localhost:8888/notebooks/Desktop/Ultimo-RegresionLineal/02RegresionLineal/07bRegresionLinealBostonVerdConEstandarizacion.ipynb# 8/17
24/10/22, 13:10 07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
In [24]:
Out[24]:
251 0.21409 22.0 5.86 0.0 0.431 6.438 8.9 7.3967 7.0 330.0 19.1 377.07
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63
257 0.61154 20.0 3.97 0.0 0.647 8.704 86.9 1.8010 5.0 264.0 13.0 389.70
35 0.06417 0.0 5.96 0.0 0.499 5.933 68.2 3.3603 5.0 279.0 19.2 396.90
339 0.05497 0.0 5.19 0.0 0.515 5.985 45.4 4.8122 5.0 224.0 20.2 396.90
In [25]:
Out[25]:
Entrenar el modelo
In [26]:
In [27]:
localhost:8888/notebooks/Desktop/Ultimo-RegresionLineal/02RegresionLineal/07bRegresionLinealBostonVerdConEstandarizacion.ipynb# 9/17
24/10/22, 13:10 07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
In [28]:
Out[28]:
LinearRegression()
In [29]:
# Determina el Beta_0
rl.intercept_
Out[29]:
22.358757062146893
In [30]:
Out[30]:
In [31]:
X.columns
Out[31]:
Index(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TA
X',
dtype='object')
localhost:8888/notebooks/Desktop/Ultimo-RegresionLineal/02RegresionLineal/07bRegresionLinealBostonVerdConEstandarizacion.ipynb# 10/17
24/10/22, 13:10 07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
In [32]:
Out[32]:
Coeficiente (Beta)
CRIM -0.901143
ZN 0.751365
INDUS 0.098643
CHAS 0.374034
NOX -1.837714
RM 3.346638
AGE -0.020302
DIS -2.806479
RAD 2.193295
TAX -2.054967
PTRATIO -1.928815
B 0.869300
LSTAT -2.952897
Evaluar el modelo
In [33]:
Out[33]:
localhost:8888/notebooks/Desktop/Ultimo-RegresionLineal/02RegresionLineal/07bRegresionLinealBostonVerdConEstandarizacion.ipynb# 11/17
24/10/22, 13:10 07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
In [34]:
predicciones_test = rl.predict(X_test_escala)
In [ ]:
In [35]:
Out[35]:
In [36]:
Out[36]:
365 27.5
313 21.6
461 17.7
158 24.3
333 22.2
In [37]:
plt.scatter(y_test,predicciones_test)
plt.xlabel('Valores verdaderos')
plt.ylabel('Valores predichos')
Out[37]:
localhost:8888/notebooks/Desktop/Ultimo-RegresionLineal/02RegresionLineal/07bRegresionLinealBostonVerdConEstandarizacion.ipynb# 12/17
24/10/22, 13:10 07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
In [38]:
In [39]:
# residuales
res_test = y_test - predicciones_test
localhost:8888/notebooks/Desktop/Ultimo-RegresionLineal/02RegresionLineal/07bRegresionLinealBostonVerdConEstandarizacion.ipynb# 13/17
24/10/22, 13:10 07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
In [40]:
sns.scatterplot(x=y_test,y=res_test)
plt.axhline(y=0, color='r', linestyle='--')
Out[40]:
<matplotlib.lines.Line2D at 0x1bed20ad280>
In [41]:
import scipy as sp
localhost:8888/notebooks/Desktop/Ultimo-RegresionLineal/02RegresionLineal/07bRegresionLinealBostonVerdConEstandarizacion.ipynb# 14/17
24/10/22, 13:10 07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
In [42]:
fig, ax = plt.subplots(figsize=(6,8),dpi=100)
_ = sp.stats.probplot(res_test,plot=ax)
localhost:8888/notebooks/Desktop/Ultimo-RegresionLineal/02RegresionLineal/07bRegresionLinealBostonVerdConEstandarizacion.ipynb# 15/17
24/10/22, 13:10 07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
Mean Absolute Error (MAE) es la media del valor absoluto de los errores:
1 𝑛 |𝑦𝑖 − 𝑦𝑖̂ |
𝑛∑𝑖=1
Mean Squared Error (MSE) es la media de los errores al cuadrado:
1 𝑛 (𝑦𝑖 − 𝑦𝑖̂ )2
𝑛∑𝑖=1
Root Mean Squared Error (RMSE) es la raíz cuadrada de la media de los errores al cuadrado:
⎯1⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
𝑛 ⎯
𝑛 ∑(𝑦𝑖 − 𝑦𝑖̂ ) 2
𝑖=1
Comparing these metrics:
In [43]:
localhost:8888/notebooks/Desktop/Ultimo-RegresionLineal/02RegresionLineal/07bRegresionLinealBostonVerdConEstandarizacion.ipynb# 16/17
24/10/22, 13:10 07bRegresionLinealBostonVerdConEstandarizacion - Jupyter Notebook
In [44]:
MAE: 3.6789775344994413
MSE: 33.86803399667011
RMSE: 5.819624901715755
In [45]:
# R^2 está entre 0 y 1, 1 significa ajuste perfecto, 0 no hay relación entre entrada salida
r_squared = rl.score(X_test_escala, y_test)
r_squared
Out[45]:
0.6685538790447977
localhost:8888/notebooks/Desktop/Ultimo-RegresionLineal/02RegresionLineal/07bRegresionLinealBostonVerdConEstandarizacion.ipynb# 17/17