0% found this document useful (0 votes)
96 views6 pages

Final Project Report - Kelompok 4

The document describes a project to develop a machine learning model to predict customer credit risk and deploy the model as a web application. The team analyzed a credit risk dataset, developed a random forest classifier model with high accuracy, and deployed the model through a Flask API. The deployed web app allows users to input customer data and receives predictions of good or bad credit risk. Implementing this model is expected to reduce review times, costs, and improve profit by accurately assessing customer loan eligibility.

Uploaded by

andonifikri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views6 pages

Final Project Report - Kelompok 4

The document describes a project to develop a machine learning model to predict customer credit risk and deploy the model as a web application. The team analyzed a credit risk dataset, developed a random forest classifier model with high accuracy, and deployed the model through a Flask API. The deployed web app allows users to input customer data and receives predictions of good or bad credit risk. Implementing this model is expected to reduce review times, costs, and improve profit by accurately assessing customer loan eligibility.

Uploaded by

andonifikri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Credit Risk Prediction With Deployment

1. Andoni Fikri Oktaviano


2. Narendra Seta Roesdyoko

1. Problem & Idea


1.1. Problem Statement
Our Banking company need to review each customer that uses our credit system to make sure that the
customers that takes loan out of the credit provided by our company can eventually payback the loan.
Doing the review manually will take a long time and people to do causing increase in working capital
of the company. The rate of customer reviewed using manual review is also relatively slow and interfere
with the profit produced.

1.2. Solution Idea


Upon requesting loan from the credit, our customers will be asked certain questions regarding their
personal data. This data then will be processed and reviewed using a model that analyse each parameter
from the data received to determine whether the customer will be eligible to receive loan or not. The
results then if accepted will be reviewed manually but because of large amount of review work is done
already by the model, the manual review is only done to make sure the data received is factually correct
thus reducing the amount of time needed to review each customer.

2. Results and Discussion


2.1. Solution Architecture

The solution that our team proposed consist of two major process. The first process is creating a robust
machine learning model with high accuracy and the second major process is to deploy the model in a
virtual web. The detailed solution work flow could be seen on figure 1.

Our solution start with data set research. We found the data set from kaggle about customer credit data
set that is quite complex and consist a lot of data points [1]. Once the data set is load, then we will search
through each column of the data set to make sure that we understand what each column represent. The
data set itself consist of 12 columns and 32581 rows.

On the process, we found the data set has some missing value in few columns. The column which has a
missing value is the customer employment length and the loan intention rate. The percentage of missing
value for both columns are 2,74% and 9,5% respectively. For the column that has a missing value lower
than 5%, we drop the data as it is the common practice in the data science community [2]. For the
columns which has missing value grater than 5%, we impute the missing values with the mean values
of the columns since the column is not evenly distributed.
ML Model Build Up

Figure 1. Solution Architecture Flow Diagram

After data cleaning process has been completed, we preprocessed the categorical values in the data set.
In total there are 4 columns that has a categorical value and is then being processed by using label
encoding (for unique value greater than 2) and binary encoding (for unique value equal to 2). Straight
after preprocessed the categorical value, Exploratory Data Analysis (EDA) is done to detect outliers in
the data set. We found that some columns may have an extreme outlier which was handled by using
capping method [3].

Selection of feature is also crucial to developed the ML model. Some articles have warned to see the
correlation between each column and to make sure none of this columns is highly correlated
(Multicollinearity Check) [4]. This would effect the accuracy of the model if highly correlated values
was not handled correctly. Luckly in our data set there is no feature that has a high correlation value.

The next step after selecting the feature and getting our data set cleaned, we deploy various machine
learning algorithms and evaluate which algotihms give the best accurate value. This step would be
discussed in detail on section 2.2 and for the deployment would be discussed in section 2.3. For the
tools, we used google colaboratory for ML model build up (red dashed line) by utilizing pandas, numpy,
matplotlib, and sklearn libarary. HTML, CSS, and Visual Code with Python Flask library is used for the
deployment (blue dashed line).
2.2. Machine Learning Model Evaluation

To deploy the correct machine learning algortihm, model selection is performed by fitting few
algorithms to the data set and it is evaluated with AUC Score since it is a classification problem. Few
algotihms that we fit to the data set are Random Forest Classifier, Decision Tree Classifier, KNN
Classifier, and Logistic Regression.

Figure 2. ML Algorithm and AUC Score

From figure 2, it can be concluded that Random Forest Classifier is the algorithm that produced the best
score. Thus, we will evaluate the model by using confusion matrix (figure 3) to see how well the model
predict the customer creditworthy. Overall, the model prediction gives a good accuracy with a precision
score of 0,97 for 1 (Bad loan customer) and 0,92 for 0 (Good loan customer). Thus, this model will be
used for the deployment.

Figure 3. Confusion Matrix for Random Forest Classifier


2.3. Model Deployment

For deploying the model, we used the reference that we’ve found on youtube considering the minimum
time for the project [5]. The deployment flow was quite simple. It starts with saving the model into a
pickle file which then will be loaded to the web page using the Flask API library from Python. Figure 4
shows the user interface of the web while figure 5 and 6 showing the user interface when the predictor
values has been inputted.

Figure 4. Web Home Page

Figure 5. Web Appearance when Predicting Bad Customer


Figure 6. Web Appearance when Predicting Good Customer

2.4. Bussiness Impact

The predicted benefit of changing the system of review for loan is that the review time needed for each
customer can be reduced significantly and in the same time reducing the workforce needed to manual
review. Reduce in expenses for manual labor, and increase in income as the time needed to review is
shortens will be expected after using the model.

Beside the more efficient operations, predicting the right customer will also gaining more revenue to the
company. From the data set we could now that the bad customer loan amount average is equal $10760.
Thu, by using our model with precision score of 0.97, the company could save more than $1 million
every 100 customer and instead gaining more revenue by giving the amount to a more promising
customer.

2.5. Github Project Link

Github: https://github.com/AndoniFikri/Credit-Risk-Prediction-with-Deployment
3. References

[1] L. Tse, "Credit Risk Dataset," Kaggle, 2020. [Online]. Available:


https://www.kaggle.com/datasets/laotse/credit-risk-dataset?resource=download. [Accessed
August 2022].
[2] S. Kumar, "7 Ways to Handle Missing Values in Machine Learning," Towards Data Science, 24
July 2020. [Online]. Available: https://towardsdatascience.com/7-ways-to-handle-missing-values-
in-machine-learning-1a6326adf79e. [Accessed August 2022].
[3] C. Goyal, "Feature Engineering – How to Detect and Remove Outliers (with Python Code),"
Analytics Vidhya, 19 May 2021. [Online]. Available:
https://www.analyticsvidhya.com/blog/2021/05/feature-engineering-how-to-detect-and-remove-
outliers-with-python-code/. [Accessed August 2022].
[4] W. Badr, "Why Feature Correlation Matters …. A Lot!," Towards Data Science, 18 January 2019.
[Online]. Available: https://towardsdatascience.com/why-feature-correlation-matters-a-lot-
847e8ba439c4. [Accessed August 2022].
[5] K. Naik, "Deploy Machine Learning Model using Flask," Youtube, 16 June 2019. [Online].
Available: https://www.youtube.com/watch?v=UbCWoMf80PY. [Accessed August 2022].

You might also like