T2_summary_VHA
T2_summary_VHA
June 4, 2024
2 For data cleaning, you can refer to the pandas (uni-1) Note
[1]: import pandas as pd
#defining features
room_length = [18, 20, 10, 12, 18, 11]
room_breadth = [20, 20, 10, 11, 19, 10]
room_type = ['Big', 'Big', 'Normal', 'Normal', 'Big', 'Normal']
data
1
age = [18, 20, 23, 19, 18, 22]
city = ['City A', 'City B', 'City B', 'City A', 'City C', 'City B']
df=pd.get_dummies(data=data1, drop_first=True)
[5]: df
data4
2
2 4368 Medium
3 3969 Low
4 6142 High
5 7912 High
3
df=pd.read_csv("HousingData.csv")
df=df.dropna()
X = df[['RM']]
y = df[['MEDV']]
# Make predictions
y_pred = linear_regressor.predict(X_test)
4
[2]: df.columns
[2]: Index(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
'PTRATIO', 'B', 'LSTAT', 'MEDV'],
dtype='object')
[3]: df.head()
[3]: CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO \
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1 296 15.3
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2 242 17.8
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2 242 17.8
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3 222 18.7
5 0.02985 0.0 2.18 0.0 0.458 6.430 58.7 6.0622 3 222 18.7
B LSTAT MEDV
0 396.90 4.98 24.0
1 396.90 9.14 21.6
2 392.83 4.03 34.7
3 394.63 2.94 33.4
5
5 394.12 5.21 28.7
# Make predictions
y_pred = linear_regressor.predict(X_test)
6
print("=" ,linear_regressor.intercept_[0])
7
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,␣
↪random_state=42)
# Make predictions
y_train_pred = poly_regressor.predict(X_train_poly)
y_test_pred = poly_regressor.predict(X_test_poly)
B LSTAT MEDV
0 396.90 4.98 24.0
1 396.90 9.14 21.6
2 392.83 4.03 34.7
3 394.63 2.94 33.4
5 394.12 5.21 28.7
Training Mean Squared Error: 55.2252324506448
Testing Mean Squared Error: 95.57171547225752
Training R²: 0.33036607859294875
Testing R²: -0.13308554792106753
8
9 K-Nearest Neighbors (KNN)
K-Nearest Neighbors (KNN) is a classification algorithm that assigns the class of a data point based
on the classes of its k nearest neighbors.
# Make predictions
y_pred = knn.predict(X_test)
# Specificity calculation
tn, fp, fn, tp = conf_matrix.ravel()[:4]
specificity = tn / (tn + fp)
print(f"Confusion Matrix:\n{conf_matrix}")
print(f"Accuracy: {accuracy}")
print(f"Error Rate: {error_rate}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"Specificity: {specificity}")
print("\nClassification Report:\n", classification_report(y_test, y_pred,␣
↪target_names=iris.target_names))
9
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
Accuracy: 1.0
Error Rate: 0.0
Precision: 1.0
Recall: 1.0
Specificity: 1.0
Classification Report:
precision recall f1-score support
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
10
X = data.data
y = data.target
# Make predictions
y_pred = knn.predict(X_test)
# Specificity calculation
tn, fp, fn, tp = conf_matrix.ravel()
specificity = tn / (tn + fp)
print(f"Confusion Matrix:\n{conf_matrix}")
print(f"Accuracy: {accuracy}")
print(f"Error Rate: {error_rate}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"Specificity: {specificity}")
print("\nClassification Report:\n", classification_report(y_test, y_pred,␣
↪target_names=data.target_names))
Confusion Matrix:
[[38 5]
[ 0 71]]
Accuracy: 0.956140350877193
Error Rate: 0.04385964912280704
Precision: 0.9342105263157895
Recall: 1.0
Specificity: 0.8837209302325582
Classification Report:
precision recall f1-score support
11
accuracy 0.96 114
macro avg 0.97 0.94 0.95 114
weighted avg 0.96 0.96 0.96 114
# Make predictions
y_pred = decision_tree.predict(X_test)
# Specificity calculation
tn, fp, fn, tp = conf_matrix.ravel()[:4]
specificity = tn / (tn + fp)
12
print(f"Confusion Matrix:\n{conf_matrix}")
print(f"Accuracy: {accuracy}")
print(f"Error Rate: {error_rate}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"Specificity: {specificity}")
print("\nClassification Report:\n", classification_report(y_test, y_pred,␣
↪target_names=iris.target_names))
plt.show()
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
Accuracy: 1.0
Error Rate: 0.0
Precision: 1.0
Recall: 1.0
Specificity: 1.0
Classification Report:
precision recall f1-score support
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
13
12 Summary
• Simple Linear Regression: Models the relationship between two variables.
• Multiple Linear Regression: Models the relationship between one dependent variable and
multiple independent variables.
• K-Nearest Neighbors (KNN): Classifies a data point based on the classes of its nearest neigh-
bors.
• Decision Tree with Entropy: Splits data into subsets to minimize impurity or randomness.
[ ]:
14