0% found this document useful (0 votes)
36 views

Loan Approval Model Prediction

Machine Learning Project about Loan Approval Prediction

Uploaded by

24im0002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Loan Approval Model Prediction

Machine Learning Project about Loan Approval Prediction

Uploaded by

24im0002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

LOAN APPROVAL MODEL PREDICTION

Submitted by

Moumita Paul (24IM0002)

Under the Guidance of Assistant Prof. Manisha Verma

DATA ANALYTICS
Department of Mathematics and Computing

INDIAN INSTITUTE OF TECHNOLOGY


(Indian School of Mines), Dhanbad
Aim: To determine the loan approval system using
Machine Learning Algorithms.

Synopsis: LOANS are the major requirement of the modern world. By this only, Banks get a
major part of the total profit. It is beneficial for students to manage their education and living
expenses, and for people to buy any kind of luxury like houses, cars, etc. But when it comes to
deciding whether the applicant’s profile is relevant to be granted with loan or not. Banks have to
look after many aspects.

We are going to develop one such model that can predict whether a person will get his/her loan
approved or not by using some of the background information of the applicant like the applicant’s
gender, marital status, income, etc.

Life Cycle of ML Project


 Data Collection: Download the dataset from here: Loan Prediction Problem Dataset
(kaggle.com)
 This dataset contains information about loan applications, including various attributes
related to applicants and whether their loan applications were approved or denied. The
dataset is designed for predictive modeling tasks, specifically for predicting whether a loan
application will be approved or not based on the provided features.
 The Dataset Contains 13 features.

1 Loan A unique id
2 Gender Gender of the applicant Male/female
3 Married Marital Status of the applicant, values will be Yes/ No
4 Dependents It tells whether the applicant has any dependents or not.
5 Education It will tell us whether the applicant is Graduated or not.
6 Self-Employed This defines that the applicant is self-employed i.e. Yes/ No
7 Applicant Income Applicant income
8 Coapplicant Income Co-applicant income
9 Loan Amount Loan amount (in thousands)
10 Loan_Amount_Term Terms of loan (in months)
11 Credit_History Credit history of individual’s repayment of their debts
12 Property_Area Area of property i.e. Rural/Urban/Semi-urban
13 Loan_Status Status of Loan Approved or not i.e. Y- Yes, N-No
Importing Libraries
 Pandas: To load the Data frame
 Matplotlib: To visualize the data features i.e. bar plot
 Seaborn: To see the correlation between features using heat map

 Data Cleaning: Clean the data to handle missing values, outliers, and inconsistencies.
This step is crucial for the model's accuracy and generalization. We may need to impute
missing values, standardize or normalize features, and deal with any data anomalies.
 Outlier: An outlier is a data point that significantly deviates from the other data points in
a dataset. Outliers can be unusually high or low values and can distort statistical analyses
and model training.
 Data Visualization: Exploratory data analysis is performed using visualizations like
count plots and box plots to gain insights into the distribution and relationships between
features.

The code analyzes loan applications by gender, providing the frequency of each gender and
visualizing the distribution with a count plot. This analysis reveals that there are significantly more
male applicants than female applicants seeking loans.

Visualize all the unique values in columns using bar plot. This will simply show which value is
dominating per our dataset.
 The next step that involves creating a heat map to visualize correlation typically belongs to
the Exploratory Data Analysis (EDA) phase in a machine learning project.

 Exploratory Data Analysis (EDA) focuses on understanding the data, identifying patterns,
relationships, and potential issues before proceeding with model building.

 The code calculates the correlation between numerical features in the loan application
dataset and visualizes it using a heat map. The heat map reveals the strength and direction
of relationships, with darker shades indicating stronger correlations. Positive correlations
are shown in blue shades, negative correlations in lighter shades, allowing for insights into
feature dependencies.

 Data Preprocessing/Feature Engineering: This phase focuses on preparing the


data for model training by transforming it into a suitable format for the algorithms. Log
transformation is a common preprocessing technique that helps address issues like
skewness, outliers, and non-linear relationships in the data.
 The log transformation applied to Loan Amount, Loan Amount Term, Total Income, and
Applicant Income resulted in noticeable changes in their distributions.
 Loan Amount and Total Income showed clear reductions in positive skew, suggesting a
more balanced distribution with reduced influence of high-value outliers.
 Loan Amount Term exhibited a modified distribution, potentially impacting its central
tendency and skewness depending on the original data's characteristics.
 Applicant Income displayed a shift in its distribution, potentially addressing positive
skewness and enhancing its suitability for certain modeling techniques.
 These transformations generally enhance the data's suitability for statistical modeling by
mitigating the negative impact of extreme values and improving the data's alignment with
normality assumptions in some models.
Categorical Feature Encoding: Convert categorical features into numerical form using Label
Encoding or One-Hot Encoding. The code uses Label Encoding to convert categorical columns into
numerical representations for machine learning.
Label Encoding can impose unwanted ordinal relationships, so One-Hot Encoding is often
preferred to avoid this. One-Hot Encoding creates binary columns for each category, representing
presence (1) or absence (0).

Data Splitting: It
prepares data for both
model training and
evaluation. The project
aims to predict loan
approval status using
machine learning models
trained on a dataset split
into 75% for training and 25% for testing.
test_size=0.25 means 25% of the data is allocated for testing.

 This is a common preprocessing step.


 Aims to improve the performance of machine learning
models by scaling numerical features to have zero mean
and unit variance.
 The code standardizes the features in the training and
testing data using Standard Scaler for better model
performance
Model Training: XGBoost (Extreme Gradient Boosting)
The code trains an XGBoost classifier model to
predict loan approval status, evaluates its
performance with accuracy, confusion matrix,
and classification report, and aims for a robust
and accurate prediction model. XGBoost is a
powerful gradient boosting algorithm known
for its speed and performance. The output
provides insights into model accuracy,
prediction errors (confusion matrix), and
detailed performance metrics like precision,
recall, and F1-score (classification report).

MODEL TRAINING: Random Forest Classifier

In the given code, Random


Forest Classifier is used to build a
model for predicting loan
approval status. It leverages the
advantages mentioned above to
potentially create a robust and
accurate prediction tool. Random Forest Classifier is an ensemble learning method used for
classification tasks.

MODEL TRAINING: Logistic Regression


Logistic Regression is a statistical method used for binary classification, where the outcome
variable is categorical (e.g., loan approval: yes/no)
The Logistic Regression model achieved an overall
accuracy of 81.82% in predicting loan approvals.
While it shows good performance in identifying
approved loans (high recall of 0.99 for class 1), it
struggles with correctly classifying rejected loans (low
precision of 0.93 and recall of 0.34 for class 0). This
suggests potential areas for improvement in
identifying rejection cases. The weighted average F1-
score of 0.79 indicates a decent overall performance, considering the class imbalance.
MODEL TRAINING: Decision Tree Classifier

The Decision Tree model provides a visual representation of the loan approval prediction process,
highlighting key features and decision rules. By analyzing the tree structure, feature importance,
and decision paths, valuable insights can be gained into the factors influencing loan approval
decisions. This interpretability is a significant advantage of Decision Trees, allowing for better
understanding and transparency in the prediction process.

MODEL TRAINING: K Neighbors Classifier


 KNN is a simple but powerful machine learning algorithm used for both classification and
regression tasks.
 The K Neighbors Classifier is used to predict loan approvals based on the features of the
loan applicants.

 The KNN model achieved an accuracy of 79.87% in predicting loan approvals. While
demonstrating good overall performance, it shows a slightly lower recall for rejected loans
(0.41 for class 0). Further optimization of the 'k' neighbor’s parameter may improve
performance. The weighted average F1-score of 0.78 suggests a decent overall performance,
making it a potential candidate for loan approval prediction.
Conclusion
This project explored various machine learning models to predict loan approval status
using a provided dataset. I investigated algorithms like Logistic Regression, Decision Tree,
Random Forest, K-Nearest Neighbors, and XGBoost. Each model was trained, evaluated
using metrics such as accuracy, confusion matrix, and classification report, and compared
with others.
While all models demonstrated reasonable performance, XGBoost emerged as a strong
contender with high accuracy and robust predictions. The Decision Tree provided valuable
insights into feature importance and decision rules through its interpretable visualization.
Further improvements could be explored through hyperparameter tuning, feature
engineering, and addressing class imbalance. This project provides a solid foundation for
developing a reliable loan approval prediction system, enabling faster and more informed
decision-making in the loan application process.

You might also like