Classification ML Project Report
Classification ML Project Report
survives based on age, gender, speed, helmet and seatbelt use. This prediction could be used
to help stakeholders improve safety regulations.
There were 200 accident cases in a csv file I downloaded from Kaggle. I cleaned the data
changing it to binary values, filled in missing info, and created a training testing split 80% to
20%.
I trained it using logistic regression, decision tree, and random forest. The
accuracy/Precision/Recall/F1 score are as follows:
● Accuracy: 57.5%
● Precision: 60%
● Recall: 45%
● F1 Score: 51.4%
● Accuracy: 47.5%
● Precision: 47.4%
● Recall: 45%
● F1 Score: 46.1%
● Accuracy: 55%
● Precision: 55%
● Recall: 55%
● F1 Score: 55%
The Random Forest model performed the best overall, with a balanced precision and recall
(55%), making it the most reliable predictor. While Logistic Regression had higher precision
(60%), it suffered from lower recall (45%), meaning it failed to correctly identify some survival
cases. The Decision Tree performed the worst, likely due to overfitting.
Some key findings and insights were that helmet and seatbelt significantly impacted the survival
rates, higher speed is correlated with lower survival, age plays a moderate role, younger people
survive more, and gender did not really have much of an impact.
Some recommendations were adding more conditions such as weather conditions or time of
day, adjusting the model's hyperparameters, and trying other models like gradient boosting or
neural networks. Using SMOTE. And eventually deploy the model to help people.
This analysis provides valuable insights into factors affecting road accident survival. The
Random Forest model proved to be the best predictor, but further improvements can be made
with additional data and tuning. These insights can be used to inform public safety policies,
vehicle safety features, and accident response strategies.
My python code:
from sklearn.linear_model import LogisticRegression
import pandas as pd
df = pd.read_csv("C:\\Users\sigle\OneDrive\Desktop\\accident.csv") # Replace
with your file path
df["Gender"].fillna(df["Gender"].mode()[0], inplace=True)
df["Speed_of_Impact"].fillna(df["Speed_of_Impact"].median(), inplace=True)
X = df.drop(columns=["Survived"])
y = df["Survived"]
# Initialize models
log_reg = LogisticRegression()
decision_tree = DecisionTreeClassifier(random_state=42)
random_forest = RandomForestClassifier(random_state=42)
# Train models
log_reg.fit(X_train, y_train)
decision_tree.fit(X_train, y_train)
random_forest.fit(X_train, y_train)
# Make predictions
log_reg_preds = log_reg.predict(X_test)
decision_tree_preds = decision_tree.predict(X_test)
random_forest_preds = random_forest.predict(X_test)
return {
"Accuracy": accuracy_score(y_true, y_pred),
# Evaluate models
# Print results