0% found this document useful (0 votes)
8 views

Decision Tree

Decision Tree

Uploaded by

Hải my
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Decision Tree

Decision Tree

Uploaded by

Hải my
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Date: 14/10/2024

To: Mitch Cochran


From: Nguyen An Quynh
Subject: Prediction on Survived Passenger of Titanic Using Decision Tree
Background: Leveraging the well-known Titanic dataset, this study utilizes machine learning to
predict survival outcomes based on passenger characteristics such as socio-economic status
(Pclass), age, gender, and travel fare. In particular, we explore decision tree models with two
splitting criteria: the Gini Index and Gain Ratio. These criteria are widely used in classification
tasks, and a comparison of their performances offers insights into their relative effectiveness for
survival prediction in this context.

The initial dataset preprocessing included handling missing values, particularly in the Age and
Embarked columns, where average imputation and mode imputation were applied, respectively.
To prepare categorical variables, we used one-hot encoding for the "Sex" and "Embarked"
columns, creating separate binary columns and avoiding arbitrary numerical assignments that
might mislead the model. Non-predictive columns, such as PassengerID, Name, Ticket, and
Cabin, were removed from the feature set, as they do not directly influence survival outcomes.
Thus, the selected feature set consisted of Pclass, Sex, Age, Parch, Fare, and Embarked.

Two decision tree models were implemented: one based on the Gini Index and the other on Gain
Ratio. The models were trained on 80% of the data and evaluated on the remaining 20%. Key
performance metrics—accuracy, precision, recall, and F1-score—were used to assess each
model’s effectiveness.

In conclusion, both Gini Index and Gain Ratio are effective criteria for decision tree
classification on this dataset, with minimal differences in performance; however the Gini Index
model slightly outperformed the Gain Ratio model, making it a marginally better choice for
survival prediction in this case. The Gini Index model achieved an accuracy of 0.810, with
precision, recall, and F1-score all at 0.770. In comparison, the Gain Ratio model reached an
accuracy of 0.804, with a precision of 0.760, recall of 0.770, and F1-score of 0.765. Future work
could explore ensemble methods like random forests to potentially increase accuracy and
generalizability. This study highlights the utility of decision trees in survival analysis and their
adaptability to different criteria, offering a foundational approach for further predictive analytics
in similar datasets.

You might also like