0% found this document useful (0 votes)
25 views4 pages

Id Sepallengthcm Sepalwidthcm Petallengthcm Petalwidthcm Species 0 1 2 3 4

The document analyzes iris flower data using machine learning models in Python. It loads iris data, explores correlations, splits the data into training and test sets, builds a linear regression model to predict petal width using other features, calculates prediction errors, and adds species dummy variables to improve the model's accuracy.

Uploaded by

Sebastian Calles
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views4 pages

Id Sepallengthcm Sepalwidthcm Petallengthcm Petalwidthcm Species 0 1 2 3 4

The document analyzes iris flower data using machine learning models in Python. It loads iris data, explores correlations, splits the data into training and test sets, builds a linear regression model to predict petal width using other features, calculates prediction errors, and adds species dummy variables to improve the model's accuracy.

Uploaded by

Sebastian Calles
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

14/11/23, 20:56 Untitled1.

ipynb - Colaboratory

1 import pandas as pd
2 import numpy as np

1 datos=pd.read_excel('/content/iris-1.xlsx')

1 datos.head()

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris_setosa

1 2 4.9 3.0 1.4 0.2 Iris_setosa

2 3 4.7 3.2 1.3 0.2 Iris_setosa

3 4 4.6 3.1 1.5 0.2 Iris_setosa

4 5 5.0 3.6 1.4 0.2 Iris_setosa

1 datos.corr()

<ipython-input-4-055e81c3bfab>:2: FutureWarning: The default value of numeric_only in Da


datos.corr()
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm

Id 1.000000 0.716676 -0.397729 0.882747 0.899759

SepalLengthCm 0.716676 1.000000 -0.109369 0.871754 0.817954

SepalWidthCm -0.397729 -0.109369 1.000000 -0.420516 -0.356544

PetalLengthCm 0.882747 0.871754 -0.420516 1.000000 0.962757

PetalWidthCm 0.899759 0.817954 -0.356544 0.962757 1.000000

1 from sklearn.model_selection import train_test_split


2 train, test =train_test_split(datos,test_size=0.20,random_state=42)

1 import statsmodels.formula.api as smf


2 modelo=smf.ols(formula='PetalWidthCm~SepalLengthCm+SepalWidthCm+PetalLengthCm',data=train)
3 modelo=modelo.fit()
4 print(modelo.summary())

OLS Regression Results


==============================================================================
Dep. Variable: PetalWidthCm R-squared: 0.941
Model: OLS Adj. R-squared: 0.939
Method: Least Squares F-statistic: 613.0
Date: Wed, 15 Nov 2023 Prob (F-statistic): 5.97e-71
Time: 03:45:38 Log-Likelihood: 33.675
No. Observations: 120 AIC: -59.35
Df Residuals: 116 BIC: -48.20
Df Model: 3
Covariance Type: nonrobust
=================================================================================
coef std err t P>|t| [0.025 0.975]
---------------------------------------------------------------------------------
Intercept -0.1791 0.188 -0.954 0.342 -0.551 0.193
SepalLengthCm -0.2379 0.050 -4.739 0.000 -0.337 -0.138
SepalWidthCm 0.2430 0.052 4.693 0.000 0.140 0.345
PetalLengthCm 0.5367 0.026 20.702 0.000 0.485 0.588
==============================================================================
Omnibus: 2.295 Durbin-Watson: 2.077
Prob(Omnibus): 0.317 Jarque-Bera (JB): 2.050
Skew: 0.029 Prob(JB): 0.359
Kurtosis: 3.638 Cond. No. 87.1
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

1 y_aprox= -0.1791-0.2379*test['SepalLengthCm']+0.2430*test['SepalWidthCm']+0.5367*test['PetalLengthCm']
2 errores=test['PetalWidthCm']-y_aprox
3 errores

https://colab.research.google.com/drive/1mCnQ0VLotGzdJj2oI44wlNHfSgiBzKu0?authuser=1#scrollTo=WwcPClToHqT6&printMode=true 1/4
14/11/23, 20:56 Untitled1.ipynb - Colaboratory

73 -0.37260
18 -0.00066
118 -0.02410
78 -0.01335
76 -0.05974
31 0.23251
64 0.17452
141 0.63014
68 0.20433
82 0.00969
110 0.21068
12 -0.05936
36 0.13934
9 -0.11354
19 -0.03606
56 -0.04652
104 0.08359
69 -0.08929
55 -0.26042
132 0.21574
29 -0.13909
127 0.07146
26 0.08368
128 0.11574
131 -0.29977
145 0.55319
108 -0.14733
143 0.15269
45 0.14064
30 -0.09100
dtype: float64

1 tabla=pd.DataFrame({'Real':test['PetalWidthCm'],'Predicción':y_aprox,'Errores':errores})
2 tabla

https://colab.research.google.com/drive/1mCnQ0VLotGzdJj2oI44wlNHfSgiBzKu0?authuser=1#scrollTo=WwcPClToHqT6&printMode=true 2/4
14/11/23, 20:56 Untitled1.ipynb - Colaboratory

Real Predicción Errores

73 1.2 1.57260 -0.37260

18 0.3 0.30066 -0.00066

118 2.3 2.32410 -0.02410

78 1.5 1.51335 -0.01335

1 dummies=pd.get_dummies(datos['Species'])
76 1.4 1.45974 -0.05974
2 dummies
31 0.4 0.16749 0.23251

64 Iris_setosa
1.3 Iris_versicolor
1.12548 0.17452 Iris_virginica

0
141 2.3 1
1.66986 0.63014 0 0

1
68 1.5 1
1.29567 0.20433 0 0

2
82 1.2 1
1.19031 0.00969 0 0

3
110 2.0 1
1.78932 0.21068 0 0

4
12 0.1 1
0.15936 -0.05936 0 0

...
36 0.2 ...
0.06066 0.13934 ... ...

145
9 0.1 0
0.21354 -0.11354 0 1

146
19 0.3 0
0.33606 -0.03606 0 1

147
56 1.6 0
1.64652 -0.04652 0 1

148
104 2.2 0
2.11641 0.08359 0 1

149
69 1.1 0
1.18929 -0.08929 0 1

150
55 rows1.3
× 3 columns
1.56042 -0.26042

132 2.2 1.98426 0.21574


1 datos=pd.concat([datos,dummies],axis=1)
29 0.2 0.33909 -0.13909
2 datos
127 1.8 1.72854 0.07146
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species Iris_set
26 0.4 0.31632 0.08368
0 1 5.1 3.5 1.4 0.2 Iris_setosa
128 2.1 1.98426 0.11574
1 2 4.9 3.0 1.4 0.2 Iris_setosa
131 2.0 2.29977 -0.29977
2 3 4.7 3.2 1.3 0.2 Iris_setosa
145 2.3 1.74681 0.55319
3 4 4.6 3.1 1.5 0.2 Iris_setosa
108 1.8 1.94733 -0.14733
4 5 5.0 3.6 1.4 0.2 Iris_setosa
143 2.3 2.14731 0.15269
... ... ... ... ... ... ...
45 0.3 0.15936 0.14064
145 146 6.7 3.0 5.2 2.3 Iris_virginica
30 0.2 0.29100 -0.09100
146 147 6.3 2.5 5.0 1.9 Iris_virginica

147 148 6.5 3.0 5.2 2.0 Iris_virginica

148 149 6.2 3.4 5.4 2.3 Iris_virginica

149 150 5.9 3.0 5.1 1.8 Iris_virginica

150 rows × 9 columns

1 train_m, test_m =train_test_split(datos,test_size=0.20,random_state=42)


2 modelo=smf.ols(formula='PetalWidthCm~SepalLengthCm+SepalWidthCm+PetalLengthCm+Species',data= train_m)
3 modelo=modelo.fit()
4 print(modelo.summary())

OLS Regression Results


==============================================================================
Dep. Variable: PetalWidthCm R-squared: 0.953
Model: OLS Adj. R-squared: 0.951
Method: Least Squares F-statistic: 465.3
Date: Wed, 15 Nov 2023 Prob (F-statistic): 4.49e-74
Time: 03:49:13 Log-Likelihood: 48.024
No. Observations: 120 AIC: -84.05
Df Residuals: 114 BIC: -67.32
Df Model: 5
Covariance Type: nonrobust
==============================================================================================

https://colab.research.google.com/drive/1mCnQ0VLotGzdJj2oI44wlNHfSgiBzKu0?authuser=1#scrollTo=WwcPClToHqT6&printMode=true 3/4
14/11/23, 20:56 Untitled1.ipynb - Colaboratory
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------------
Intercept -0.4711 0.195 -2.418 0.017 -0.857 -0.085
Species[T.Iris_versicolor] 0.6301 0.144 4.387 0.000 0.346 0.915
Species[T.Iris_virginica] 0.9831 0.193 5.104 0.000 0.602 1.365
SepalLengthCm -0.1346 0.049 -2.763 0.007 -0.231 -0.038
SepalWidthCm 0.2867 0.053 5.361 0.000 0.181 0.393
PetalLengthCm 0.2748 0.056 4.941 0.000 0.165 0.385
==============================================================================
Omnibus: 5.687 Durbin-Watson: 2.007
Prob(Omnibus): 0.058 Jarque-Bera (JB): 8.108
Skew: -0.143 Prob(JB): 0.0174
Kurtosis: 4.241 Cond. No. 139.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Mirando los dos modelos, me di cuenta de que son bastante parecidos. E


primero, la R^2 no es tan alta, pero la F está mejor que en el segundo Mirando los dos modelos, me di cuenta de que son bastante parecidos. En
hace pensar que el segundo modelo tiene la ventaja. Tambien cheque las el primero, la R^2 no es tan alta, pero la F está mejor que en el segundo.
del segundo estan mas pegadas al 0.005, mientras que en el primero hay Esto me hace pensar que el segundo modelo tiene la ventaja. Tambien
se pasa. Así que, al final del día, parece que el segundo modelo es un
mejor. cheque las P, y las del segundo estan mas pegadas al 0.005, mientras que
en el primero hay una que se pasa. Así que, al final del día, parece que el
segundo modelo es un poquito mejor.

https://colab.research.google.com/drive/1mCnQ0VLotGzdJj2oI44wlNHfSgiBzKu0?authuser=1#scrollTo=WwcPClToHqT6&printMode=true 4/4

You might also like