0% found this document useful (0 votes)
35 views

Machine Learnig - Mini Project

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Machine Learnig - Mini Project

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

K. K.

Wagh Institute of Engineering Education and Research,


Nashik. Department of Computer Engineering
Academic Year 2023-24
Course: LP-3 Lab Course Code: 410246

Mini Project

Title: Mini Project :- Machine Learning


Objective :
1. Clean, handle missing values, encode categorical variables, and engineer new
features.
2. Train a machine learning model (e.g., Random Forest) and evaluate its
performance.
3. Make predictions on unseen data and ensure consistency in features used.
4. Create a submission file with passengerId and the model's survival predictions.
Problem Statement:
Build a machine learning model that predicts the type of people who survived the Titanic
shipwreck using passenger data (i.e. name, age, gender, socio-economic class, etc).

Requirements:Programming Language,libraries/package,Dataset,Data,model,Environment

Theory :
1. Supervised Learning : The Titanic Survival Prediction project is a classic example of
supervised learning where the model is trained using labeled data. The training set contains
both the features (passenger data) and the corresponding target labels (whether the passenger
survived or not).

2. Feature Engineering: Machine learning models rely heavily on the quality and relevance of
the features provided. In this project, features like age, sex, and passenger class are used to predict the
target. Proper feature selection, transformation, and encoding are essential for better model
performance.

3) Model Selection and Training: Different machine learning algorithms (such as Logistic
Regression, Random Forests, and Decision Trees) are trained to learn from the data. The process of
selecting the best model and tuning hyperparameters is critical for achieving the highest possible
accuracy.

4 Overfitting and Underfitting: It's important to avoid overfitting (where the model performs well
on training data but poorly on new data) and underfitting (where the model is too simple and cannot
capture the underlying patterns). Finding the right balance is key to ensuring that the model works
well in real-world scenarios.

Evaluation Metrics:

● Evaluating a machine learning model involves using metrics like accuracy, precision, recall,
and F1 score to gauge how well the model performs on unseen data. These metrics help
assess the model's ability to generalize and predict accurately
● Recall measures the ability of the model to identify all actual survivors (true positives). High
recall ensures that most survivors are predicted correctly, even if some non-survivors are
incorrectly classified as survivors.
● Recall measures the ability of the model to identify all actual survivors (true positives). High
recall ensures that most survivors are predicted correctly, even if some non-survivors are
incorrectly classified as survivors.
● The F1-score is the harmonic mean of precision and recall, balancing both metrics. It's useful
when there's an uneven class distribution or when both false positives and false negatives are
critical.
● A confusion matrix provides a detailed breakdown of true positives, true negatives, false
positives, and false negatives. It helps visualize model performance by showing how many of
each class (survived or not) were correctly or incorrectly predicted.

4. Approach

● Collect and preprocess the Titanic dataset by handling missing values and encoding
categorical features.
● Perform feature engineering and drop irrelevant columns.
● Split the dataset into training and testing sets.
● Train machine learning models such as Logistic Regression or Random Forest.
● Apply hyperparameter tuning to improve model accuracy.
● Use the best-performing model to predict survival on the test dataset..
Results :

Accuracy :- 81.56%

Precision :- 78.93%

Recall :- 80.85%

Conclusion/analysis :
The Titanic Survival Prediction Project demonstrates how machine learning techniques can
be applied to real-world datasets to predict outcomes. By following proper data
preprocessing, feature engineering, and model selection methods, the project achieves
satisfactory prediction accuracy on the Titanic dataset.
Screen shots :
Program codes with sample output

You might also like