0% found this document useful (0 votes)

36 views

Loan Approval Model Prediction

Machine Learning Project about Loan Approval Prediction

Uploaded by

24im0002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Loan Approval Model Prediction

Machine Learning Project about Loan Approval Prediction

Uploaded by

24im0002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

LOAN APPROVAL MODEL PREDICTION

Submitted by

Moumita Paul (24IM0002)

Under the Guidance of Assistant Prof. Manisha Verma

DATA ANALYTICS
Department of Mathematics and Computing

INDIAN INSTITUTE OF TECHNOLOGY

(Indian School of Mines), Dhanbad
Aim: To determine the loan approval system using
Machine Learning Algorithms.

Synopsis: LOANS are the major requirement of the modern world. By this only, Banks get a
major part of the total profit. It is beneficial for students to manage their education and living
expenses, and for people to buy any kind of luxury like houses, cars, etc. But when it comes to
deciding whether the applicant’s profile is relevant to be granted with loan or not. Banks have to
look after many aspects.

We are going to develop one such model that can predict whether a person will get his/her loan
approved or not by using some of the background information of the applicant like the applicant’s
gender, marital status, income, etc.

Life Cycle of ML Project

 Data Collection: Download the dataset from here: Loan Prediction Problem Dataset
(kaggle.com)
 This dataset contains information about loan applications, including various attributes
related to applicants and whether their loan applications were approved or denied. The
dataset is designed for predictive modeling tasks, specifically for predicting whether a loan
application will be approved or not based on the provided features.
 The Dataset Contains 13 features.

1 Loan A unique id
2 Gender Gender of the applicant Male/female
3 Married Marital Status of the applicant, values will be Yes/ No
4 Dependents It tells whether the applicant has any dependents or not.
5 Education It will tell us whether the applicant is Graduated or not.
6 Self-Employed This defines that the applicant is self-employed i.e. Yes/ No
7 Applicant Income Applicant income
8 Coapplicant Income Co-applicant income
9 Loan Amount Loan amount (in thousands)
10 Loan_Amount_Term Terms of loan (in months)
11 Credit_History Credit history of individual’s repayment of their debts
12 Property_Area Area of property i.e. Rural/Urban/Semi-urban
13 Loan_Status Status of Loan Approved or not i.e. Y- Yes, N-No
Importing Libraries
 Pandas: To load the Data frame
 Matplotlib: To visualize the data features i.e. bar plot
 Seaborn: To see the correlation between features using heat map

 Data Cleaning: Clean the data to handle missing values, outliers, and inconsistencies.
This step is crucial for the model's accuracy and generalization. We may need to impute
missing values, standardize or normalize features, and deal with any data anomalies.
 Outlier: An outlier is a data point that significantly deviates from the other data points in
a dataset. Outliers can be unusually high or low values and can distort statistical analyses
and model training.
 Data Visualization: Exploratory data analysis is performed using visualizations like
count plots and box plots to gain insights into the distribution and relationships between
features.

The code analyzes loan applications by gender, providing the frequency of each gender and
visualizing the distribution with a count plot. This analysis reveals that there are significantly more
male applicants than female applicants seeking loans.

Visualize all the unique values in columns using bar plot. This will simply show which value is
dominating per our dataset.
 The next step that involves creating a heat map to visualize correlation typically belongs to
the Exploratory Data Analysis (EDA) phase in a machine learning project.

 Exploratory Data Analysis (EDA) focuses on understanding the data, identifying patterns,
relationships, and potential issues before proceeding with model building.

 The code calculates the correlation between numerical features in the loan application
dataset and visualizes it using a heat map. The heat map reveals the strength and direction
of relationships, with darker shades indicating stronger correlations. Positive correlations
are shown in blue shades, negative correlations in lighter shades, allowing for insights into
feature dependencies.

 Data Preprocessing/Feature Engineering: This phase focuses on preparing the

data for model training by transforming it into a suitable format for the algorithms. Log
transformation is a common preprocessing technique that helps address issues like
skewness, outliers, and non-linear relationships in the data.
 The log transformation applied to Loan Amount, Loan Amount Term, Total Income, and
Applicant Income resulted in noticeable changes in their distributions.
 Loan Amount and Total Income showed clear reductions in positive skew, suggesting a
more balanced distribution with reduced influence of high-value outliers.
 Loan Amount Term exhibited a modified distribution, potentially impacting its central
tendency and skewness depending on the original data's characteristics.
 Applicant Income displayed a shift in its distribution, potentially addressing positive
skewness and enhancing its suitability for certain modeling techniques.
 These transformations generally enhance the data's suitability for statistical modeling by
mitigating the negative impact of extreme values and improving the data's alignment with
normality assumptions in some models.
Categorical Feature Encoding: Convert categorical features into numerical form using Label
Encoding or One-Hot Encoding. The code uses Label Encoding to convert categorical columns into
numerical representations for machine learning.
Label Encoding can impose unwanted ordinal relationships, so One-Hot Encoding is often
preferred to avoid this. One-Hot Encoding creates binary columns for each category, representing
presence (1) or absence (0).

Data Splitting: It
prepares data for both
model training and
evaluation. The project
aims to predict loan
approval status using
machine learning models
trained on a dataset split
into 75% for training and 25% for testing.
test_size=0.25 means 25% of the data is allocated for testing.

 This is a common preprocessing step.

 Aims to improve the performance of machine learning
models by scaling numerical features to have zero mean
and unit variance.
 The code standardizes the features in the training and
testing data using Standard Scaler for better model
performance
Model Training: XGBoost (Extreme Gradient Boosting)
The code trains an XGBoost classifier model to
predict loan approval status, evaluates its
performance with accuracy, confusion matrix,
and classification report, and aims for a robust
and accurate prediction model. XGBoost is a
powerful gradient boosting algorithm known
for its speed and performance. The output
provides insights into model accuracy,
prediction errors (confusion matrix), and
detailed performance metrics like precision,
recall, and F1-score (classification report).

MODEL TRAINING: Random Forest Classifier

In the given code, Random

Forest Classifier is used to build a
model for predicting loan
approval status. It leverages the
advantages mentioned above to
potentially create a robust and
accurate prediction tool. Random Forest Classifier is an ensemble learning method used for
classification tasks.

MODEL TRAINING: Logistic Regression

Logistic Regression is a statistical method used for binary classification, where the outcome
variable is categorical (e.g., loan approval: yes/no)
The Logistic Regression model achieved an overall
accuracy of 81.82% in predicting loan approvals.
While it shows good performance in identifying
approved loans (high recall of 0.99 for class 1), it
struggles with correctly classifying rejected loans (low
precision of 0.93 and recall of 0.34 for class 0). This
suggests potential areas for improvement in
identifying rejection cases. The weighted average F1-
score of 0.79 indicates a decent overall performance, considering the class imbalance.
MODEL TRAINING: Decision Tree Classifier

The Decision Tree model provides a visual representation of the loan approval prediction process,
highlighting key features and decision rules. By analyzing the tree structure, feature importance,
and decision paths, valuable insights can be gained into the factors influencing loan approval
decisions. This interpretability is a significant advantage of Decision Trees, allowing for better
understanding and transparency in the prediction process.

MODEL TRAINING: K Neighbors Classifier

 KNN is a simple but powerful machine learning algorithm used for both classification and
regression tasks.
 The K Neighbors Classifier is used to predict loan approvals based on the features of the
loan applicants.

 The KNN model achieved an accuracy of 79.87% in predicting loan approvals. While
demonstrating good overall performance, it shows a slightly lower recall for rejected loans
(0.41 for class 0). Further optimization of the 'k' neighbor’s parameter may improve
performance. The weighted average F1-score of 0.78 suggests a decent overall performance,
making it a potential candidate for loan approval prediction.
Conclusion
This project explored various machine learning models to predict loan approval status
using a provided dataset. I investigated algorithms like Logistic Regression, Decision Tree,
Random Forest, K-Nearest Neighbors, and XGBoost. Each model was trained, evaluated
using metrics such as accuracy, confusion matrix, and classification report, and compared
with others.
While all models demonstrated reasonable performance, XGBoost emerged as a strong
contender with high accuracy and robust predictions. The Decision Tree provided valuable
insights into feature importance and decision rules through its interpretable visualization.
Further improvements could be explored through hyperparameter tuning, feature
engineering, and addressing class imbalance. This project provides a solid foundation for
developing a reliable loan approval prediction system, enabling faster and more informed
decision-making in the loan application process.

Project Report - Lendingclub - FINAL
No ratings yet
Project Report - Lendingclub - FINAL
24 pages
Total Quality Management MCQ's
0% (1)
Total Quality Management MCQ's
51 pages
Comm 223
100% (1)
Comm 223
26 pages
Edafinal 1
No ratings yet
Edafinal 1
32 pages
Prediciton of Loan Apprval-Project Report
No ratings yet
Prediciton of Loan Apprval-Project Report
82 pages
Prediction of Modernized Loan Approval System Based On Machine Learning Approach
No ratings yet
Prediction of Modernized Loan Approval System Based On Machine Learning Approach
22 pages
ranvijay12203409 (1)
No ratings yet
ranvijay12203409 (1)
13 pages
minipptPOWER.1pdf
No ratings yet
minipptPOWER.1pdf
16 pages
Loan Approval - PPT
No ratings yet
Loan Approval - PPT
19 pages
Loan Prediction
No ratings yet
Loan Prediction
20 pages
Predicting Personal Loan Approval Using Machine Learning Handbook
No ratings yet
Predicting Personal Loan Approval Using Machine Learning Handbook
31 pages
Finance Project Proposal
No ratings yet
Finance Project Proposal
7 pages
Paper 3
No ratings yet
Paper 3
5 pages
Project
No ratings yet
Project
11 pages
Loan Eligibility Prediction
No ratings yet
Loan Eligibility Prediction
12 pages
Ml Report1
No ratings yet
Ml Report1
19 pages
Loan-Prediction Using Machine Learning
No ratings yet
Loan-Prediction Using Machine Learning
31 pages
Ihic-2022 PPT Paper - Id 100
No ratings yet
Ihic-2022 PPT Paper - Id 100
11 pages
Project Stage I Report
No ratings yet
Project Stage I Report
17 pages
Report on Loan Eligibility Analysis
No ratings yet
Report on Loan Eligibility Analysis
5 pages
anu_internshipreport
No ratings yet
anu_internshipreport
28 pages
Python Code For Loan Default Prediction
No ratings yet
Python Code For Loan Default Prediction
4 pages
5_6055381653696549297
No ratings yet
5_6055381653696549297
22 pages
Paper 1
No ratings yet
Paper 1
10 pages
LOAN APPROVAL PREDICTION SYSTEM (2)
No ratings yet
LOAN APPROVAL PREDICTION SYSTEM (2)
21 pages
Loan Prediction System
No ratings yet
Loan Prediction System
8 pages
For Loan Approval Prediction
100% (1)
For Loan Approval Prediction
14 pages
School of Information Technology and Engineering M.Tech Software Engineering (Integrated) FALL SEMESTER 2020 - 2021
No ratings yet
School of Information Technology and Engineering M.Tech Software Engineering (Integrated) FALL SEMESTER 2020 - 2021
36 pages
Arpit_Pal_E2_17_Report_Loan-Prediction-System
No ratings yet
Arpit_Pal_E2_17_Report_Loan-Prediction-System
34 pages
Research Paper
No ratings yet
Research Paper
14 pages
Loan Approval
No ratings yet
Loan Approval
12 pages
Data Analysis On Loan Prediction
No ratings yet
Data Analysis On Loan Prediction
20 pages
Report 2
No ratings yet
Report 2
26 pages
Loan Status Prediction
No ratings yet
Loan Status Prediction
23 pages
Data Science Real World Applications
100% (1)
Data Science Real World Applications
19 pages
FINAL PPT VIVA
No ratings yet
FINAL PPT VIVA
28 pages
Loan Approval Prediction Using Supervised Learning Algorithm
No ratings yet
Loan Approval Prediction Using Supervised Learning Algorithm
11 pages
ssrn-5088929
No ratings yet
ssrn-5088929
11 pages
KHOAHOCDULIEU Final
No ratings yet
KHOAHOCDULIEU Final
33 pages
Presentation 13
No ratings yet
Presentation 13
8 pages
Assessment Report Richa
No ratings yet
Assessment Report Richa
12 pages
Project Lit Final1
No ratings yet
Project Lit Final1
15 pages
IJNRD2407179
No ratings yet
IJNRD2407179
7 pages
SSRN Id3769854
No ratings yet
SSRN Id3769854
8 pages
23PGDDA03 - Priyadarshini P
No ratings yet
23PGDDA03 - Priyadarshini P
17 pages
(IJCST-V9I3P21) :sanket Bhattad, Sumit Bawane, Shweta Agrawal, Unnati Ramteke, Dr. P. B. Ambhore
No ratings yet
(IJCST-V9I3P21) :sanket Bhattad, Sumit Bawane, Shweta Agrawal, Unnati Ramteke, Dr. P. B. Ambhore
4 pages
Loan Prediction Using Artificial Intelligence and Machine Learning
No ratings yet
Loan Prediction Using Artificial Intelligence and Machine Learning
24 pages
REORT
No ratings yet
REORT
3 pages
Detailed_Loan_Approval_Project_Report
No ratings yet
Detailed_Loan_Approval_Project_Report
4 pages
SSRN Id4532468
No ratings yet
SSRN Id4532468
13 pages
2022 V13i876
No ratings yet
2022 V13i876
9 pages
IJSRDV8I80146
No ratings yet
IJSRDV8I80146
6 pages
SYNOPSIS OF LEP 01
No ratings yet
SYNOPSIS OF LEP 01
8 pages
Credit_Card_Approval_Prediction_Report-Final
No ratings yet
Credit_Card_Approval_Prediction_Report-Final
27 pages
0Loan_Eligibility_prediction_Python.ipynb - Colab
No ratings yet
0Loan_Eligibility_prediction_Python.ipynb - Colab
6 pages
Machine Learning
No ratings yet
Machine Learning
26 pages
Loan Approval Prediction2
No ratings yet
Loan Approval Prediction2
72 pages
d.sce project (2)
No ratings yet
d.sce project (2)
28 pages
DOC-20240719-WA0003.
No ratings yet
DOC-20240719-WA0003.
6 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
From Everand
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
Steven Taylor
No ratings yet
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
Reprocessing The Bangalee Creek Tailings - Jason Downes - 2012
100% (1)
Reprocessing The Bangalee Creek Tailings - Jason Downes - 2012
97 pages
Romulo S. Rimando JR.: Cordillera Regional Science High School
No ratings yet
Romulo S. Rimando JR.: Cordillera Regional Science High School
5 pages
A Cleansing Fire - Moral Outrage Alleviates Guilt and Buffers Threats To One's Moral Identity
No ratings yet
A Cleansing Fire - Moral Outrage Alleviates Guilt and Buffers Threats To One's Moral Identity
21 pages
Research Instrument
No ratings yet
Research Instrument
2 pages
Consumers' Purchase Intentions For Foreign Products: An Empirical Research Study in Istanbul, Turkey
No ratings yet
Consumers' Purchase Intentions For Foreign Products: An Empirical Research Study in Istanbul, Turkey
8 pages
Download An Applied Guide to Research Designs: Quantitative, Qualitative, and Mixed Methods W. Alex Edmonds ebook All Chapters PDF
100% (1)
Download An Applied Guide to Research Designs: Quantitative, Qualitative, and Mixed Methods W. Alex Edmonds ebook All Chapters PDF
55 pages
Sample Action Research
No ratings yet
Sample Action Research
12 pages
ECON6001: Applied Econometrics S&W: Chapter 4: Linear Regression With One Regressor, An Introduction Dr. Gedeon Lim
No ratings yet
ECON6001: Applied Econometrics S&W: Chapter 4: Linear Regression With One Regressor, An Introduction Dr. Gedeon Lim
59 pages
Chapter Three: Research Methodology 3.0 Research Design
No ratings yet
Chapter Three: Research Methodology 3.0 Research Design
5 pages
NCM 113 Midterm Lec Mod 6
No ratings yet
NCM 113 Midterm Lec Mod 6
10 pages
Diferencias de VHI-10 y SVHI-10 en La Autopercepción de Los Cantantes Sobre La Severidad de La Disfonía
No ratings yet
Diferencias de VHI-10 y SVHI-10 en La Autopercepción de Los Cantantes Sobre La Severidad de La Disfonía
4 pages
Pjo Eyesight
No ratings yet
Pjo Eyesight
4 pages
2 - 6 - Practice Test REG
No ratings yet
2 - 6 - Practice Test REG
5 pages
Maritime Industry2 - The Relevance of Maritime Education and Training at The Secondary
No ratings yet
Maritime Industry2 - The Relevance of Maritime Education and Training at The Secondary
114 pages
III Research Copy Tin
75% (4)
III Research Copy Tin
39 pages
Immediate download Quantitative Research Methods in Communication The Power of Numbers for Social Justice 1st Edition Erica Scharrer ebooks 2024
100% (5)
Immediate download Quantitative Research Methods in Communication The Power of Numbers for Social Justice 1st Edition Erica Scharrer ebooks 2024
75 pages
Research For Managerial Decisions
No ratings yet
Research For Managerial Decisions
2 pages
Tugas 1
100% (1)
Tugas 1
7 pages
Limitation and Misuse LOPA
No ratings yet
Limitation and Misuse LOPA
4 pages
A Study On Hotel Services 11
No ratings yet
A Study On Hotel Services 11
34 pages
Stimulate Empathy
No ratings yet
Stimulate Empathy
10 pages
Leveraging the SCARF Model for Employee Engagement - An in-Depth Analysis With Special Reference to Government Organisations
No ratings yet
Leveraging the SCARF Model for Employee Engagement - An in-Depth Analysis With Special Reference to Government Organisations
13 pages
Developing Occupation Kits in A Hand Therapy Student Experiential Learning Clinic
No ratings yet
Developing Occupation Kits in A Hand Therapy Student Experiential Learning Clinic
10 pages
Literature Review Customer Relationship Management
100% (1)
Literature Review Customer Relationship Management
6 pages
Future Anxiety and Its Relationship To Students' Attitude
No ratings yet
Future Anxiety and Its Relationship To Students' Attitude
12 pages
Learner Guide - Assessors Course
No ratings yet
Learner Guide - Assessors Course
153 pages
CM2060 NLP Coursework
No ratings yet
CM2060 NLP Coursework
5 pages
P5yearbook Annuaire
No ratings yet
P5yearbook Annuaire
340 pages

Loan Approval Model Prediction

Uploaded by

Loan Approval Model Prediction

Uploaded by

LOAN APPROVAL MODEL PREDICTION

Moumita Paul (24IM0002)

Under the Guidance of Assistant Prof. Manisha Verma

INDIAN INSTITUTE OF TECHNOLOGY

Life Cycle of ML Project

 Data Preprocessing/Feature Engineering: This phase focuses on preparing the

 This is a common preprocessing step.

MODEL TRAINING: Random Forest Classifier

In the given code, Random

MODEL TRAINING: Logistic Regression

MODEL TRAINING: K Neighbors Classifier

You might also like