0% found this document useful (0 votes)

193 views29 pages

FRA Project Report Milestone 1 PDF

Uploaded by

Monica

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

193 views29 pages

FRA Project Report Milestone 1 PDF

Uploaded by

Monica

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Finance and Risk Analytics PROJECT REPORT

(Milestone -I)
Table of Contents

Contents

Problem 1: Predicting Credit Risk

Executive Summary Problem-I ................................................................................................................................... 3

Introduction ....................................................................................................................................................... 3

Data Description .......................................................................................................................................... 4 - 6

Sample of the dataset: ........................................................................................................................................6

1.1 Outlier Treatment. …..……………………………………………………………………… 7

1.2 Missing Value Treatment………………………..……………………………………….…. 9

1.3 Transform Target variable into 0 and 1 ……………………………………….…………… 11

1.4 Univariate (4 marks) & Bivariate ( 6marks) analysis with proper interpretation.

(You may choose to include only those variables which were significant in the model

building) …………………………..……………………………………………………….……13

1.5 Train Test Split ……………….………….………………………………….……………… 21

1.6 Build Logistic Regression Model (using statsmodel library) on most important variables

on Train Dataset and choose the optimum cutoff. Also showcase your model building

approach ……….………….……………….………….……………….………….……….…… 22

1.7 Validate the Model on Test Dataset and state the performance matrices. Also state

interpretation from the model ……………………………………………………………...…... 24

1|Page
List of Figures Problem -I Predicting Credit Risk Page No.

Fig. 1 Outliers in data set……………………………….………………………………………. 8

Fig. 2 After Treating Outliers in data set...…...………………….……………………………… 8
Fig. 3 Visual of missing values – After dropping …...…………….…...………………….…...... 10
Fig.4 Count plot - Target Variable (Default) …………………………………………………... 12
Fig.5 Important Feature variables Box Plot…………………………………………………………………………..13
Fig.6 Scaled Feature Variables - Distribution of column with Dis plot & Box plot …....…… 14-17
Fig.7 Gross Sales Vs Net Sales ………………………………………………………..…….… 18
Fig.8 Net worth Vs Capital Employment………………………………………………….….….… 18
Fig.9 Net worth Vs Cost of Production……….…………………………………………….….…. 18
Fig 10 Correlation heat-map - Top significant Variables……………………………………………... 19
Fig.11 Training Set - Confusion matrix and Classification Report …………………………….............. 24
Fig.12 Testing Set - Confusion matrix and Classification Report …………………………………...... 25
Fig.13 SMOTE Training Set - Confusion matrix and Classification Report……………………………26
Fig.14 SMOTE Testing Set - Confusion matrix and Classification Report….………………………… 27

List of Tables Page No.

Table 1 -ata description of all variables of raw data……………………………….……………….......... 4-6

Table 2. Dataset Sample ………………………………………………………………………………… 6
Table 3. Fixing Messy Column - Dataset Sample ………………………………………………………... 6
Table 4. Target Variable First & Last 5 Rows …………….………………………….………………… 11
Table.5 Descriptive Statistics Summary …..……………………………………………….……….......... 12
Table.6 Scaled Train dataset First 5 rows ………………………..…………………………………........ 21
Table.7 -Scaled Test dataset First 5 rows ………………………………………………………………................... 21
Table.8 - Highest contributing independent variables …………………………………………….……..23

THE END! ..................................................................................................................................................................28

2|Page
Predicting Credit Risk
Problem – I

Executive Summary

Businesses or companies can fall prey to default if they are not able to keep up their debt obligations. Defaults
will lead to a lower credit rating for the company which in turn reduces its chances of getting credit in the future
and may have to pay higher interests on existing debts as well as any new obligations. From an investor's point
of view, he would want to invest in a company if it is capable of handling its financial obligations, can grow
quickly, and is able to manage the growth scale.
A balance sheet is a financial statement of a company that provides a snapshot of what a company owns, owes,
and the amount invested by the shareholders. Thus, it is an important tool that helps evaluate the performance
of a business.
Data that is available includes information from the financial statement of the companies for the previous year
(2015). Also, information about the Networth of the company in the following year (2016) is provided which
can be used to drive the labeled field.
Explanation of data fields available in Data Dictionary, 'Credit Default Data Dictionary.xlsx'
We need to create a default variable that should take the value of 1 when net worth next year is negative & 0
when net worth next year is positive.

Introduction

This assignment helps us to perform Outlier Treatment, Missing Value Treatment, Transform Target variable
into 0 and 1, Univariate and Bivariate Analysis, Split data into Train & Test, Model Building Logistic Regression
on most important variables on Train Dataset and choose the optimum cutoff, Model Validation is to be done
on Test Dataset.
We have 3586 entries and 67 columns. The outcome of this assignment will suggesting investors good credit
rating companies to invest their money.

3|Page
Data Description

S.No Field Name Description

1 Co_Code Company Code
2 Co_Name Company Name
Value of a company as on 2016 - Next Year(difference between
3 Networth Next Year the value of total assets and total liabilities)
Amount that has been received by the company through the issue
4 Equity Paid Up of shares to the shareholders
5 Networth Value of a company as on 2015 - Current Year
Total amount of capital used for the acquisition of profits by a
6 Capital Employed company
The sum of money borrowed by the company and is due to be
7 Total Debt paid
8 Gross Block Total value of all of the assets that a company owns
The difference between a company's current assets (cash,
accounts receivable, inventories of raw materials and finished
9 Net Working Capital goods) and its current liabilities (accounts payable).
All the assets of a company that are expected to be sold or used
10 Current Assets as a result of standard business operations over the next year.
Short-term financial obligations that are due within one year
11 Current Liabilities and Provisions (includes amount that is set aside cover a future liability)
12 Total Assets/Liabilities Ratio of total assets to liabailities of the company
13 Gross Sales The grand total of sale transactions within the accounting period
14 Net Sales Gross sales minus returns, allowances, and discounts
Income realized from non-business activities (e.g. sale of long
15 Other Income term asset)
Product of physical output of goods and services produced by
16 Value Of Output company and its market price
Costs incurred by a business from manufacturing a product or
17 Cost of Production providing a service
Costs which are made to create the demand for the product
(advertising expenditures, packaging and styling, salaries,
commissions and travelling expenses of sales personnel, and the
18 Selling Cost cost of shops and showrooms)
19 PBIDT Profit Before Interest, Depreciation & Taxes
20 PBDT Profit Before Depreciation and Tax
21 PBIT Profit before interest and taxes
22 PBT Profit before tax
23 PAT Profit After Tax
24 Adjusted PAT Adjusted profit is the best estimate of the true profit
Commercial paper , a short-term debt instrument to meet short-
26 CP term liabilities.
27 Revenue earnings in forex Revenue earned in foreign currency

4|Page
S.No Field Name Description
28 Revenue expenses in forex Expenses due to foreign currency transactions
29 Capital expenses in forex Long term investment in forex
30 Book Value (Unit Curr) Net asset value
31 Book Value (Adj.) (Unit Curr) Book value adjusted to reflect asset's true fair market value
Product of the total number of a company's outstanding shares
32 Market Capitalisation and the current market price of one share
Cash Earnings per Share, profitability ratio that measures the
financial performance of a company by calculating cash flows on
33 CEPS (annualised) (Unit Curr) a per share basis
Cash Flow From Operating
34 Activities Use of cash from ongoing regular business activities
Cash Flow From Investing Cash used in the purchase of non-current assets–or long-term
35 Activities assets– that will deliver value in the future
Cash Flow From Financing Net flows of cash that are used to fund the company
36 Activities (transactions involving debt, equity, and dividends)
37 ROG-Net Worth (%) Rate of Growth - Networth
38 ROG-Capital Employed (%) Rate of Growth - Capital Employed
39 ROG-Gross Block (%) Rate of Growth - Gross Block
40 ROG-Gross Sales (%) Rate of Growth - Gross Sales
41 ROG-Net Sales (%) Rate of Growth - Net Sales
42 ROG-Cost of Production (%) Rate of Growth - Cost of Production
43 ROG-Total Assets (%) Rate of Growth - Total Assets
44 ROG-PBIDT (%) Rate of Growth- PBIDT
45 ROG-PBDT (%) Rate of Growth- PBDT
46 ROG-PBIT (%) Rate of Growth- PBIT
47 ROG-PBT (%) Rate of Growth- PBT
48 ROG-PAT (%) Rate of Growth- PAT
49 ROG-CP (%) Rate of Growth- CP
ROG-Revenue earnings in forex
50 (%) Rate of Growth - Revenue earnings in forex
ROG-Revenue expenses in forex
51 (%) Rate of Growth - Revenue expenses in forex
52 ROG-Market Capitalisation (%) Rate of Growth - Market Capitalisation
Liquidity ratio, company's ability to pay short-term obligations or
53 Current Ratio[Latest] those due within one year
Solvency ratio, the capacity of a company to discharge its
54 Fixed Assets Ratio[Latest] obligations towards long-term lenders indicating
Activity ratio, specifies the number of times the stock or
55 Inventory Ratio[Latest] inventory has been replaced and sold by the company
Measures how quickly cash debtors are paying back to the
56 Debtors Ratio[Latest] company
Total Asset Turnover The value of a company's revenues relative to the value of its
57 Ratio[Latest] assets

5|Page
S.No Field Name Description
Determines how easily a company can pay interest on its
58 Interest Cover Ratio[Latest] outstanding debt
59 PBIDTM (%)[Latest] Profit before Interest Depreciation and Tax Margin
60 PBITM (%)[Latest] Profit Before Interest Tax Margin
61 PBDTM (%)[Latest] Profit Before Depreciation Tax Margin
62 CPM (%)[Latest] Cost per thousand (advertising cost)
63 APATM (%)[Latest] After tax profit margin
64 Debtors Velocity (Days) Average days required for receiving the payments
65 Creditors Velocity (Days) Average number of days company takes to pay suppliers
Average number of days the company needs to turn its inventory
66 Inventory Velocity (Days) into sales
67 Value of Output/Total Assets Ratio of Value of Output (market value) to Total Assets
68 Value of Output/Gross Block Ratio of Value of Output (market value) to Gross Block

Table 1 - Data description of all variables of raw data

Sample of the dataset

Table 2. Dataset Sample

Checking top 5 rows again after fixing messy column names for ease of use

Table 3. Fixing Messy Column - Dataset Sample

6|Page
Inference:

• The number of rows (observations) is = 3586 & The number of columns (variables) is = 67

• There is NO duplicate entry in the dataset.

• All the variables are numeric type except one variable (Co_Name) which is object type.

• For our analysis, We can drop Co_Code and Co_Name variables.

• There are missing values in 13 of the variables. Missing values will be treated with either mean
or median values of corresponding variables.

• There are outliers in the dataset. It will be treated for our analysis.

• The problem statement requires to predict “default” status of the company where the “Networth
Next Year” of the company is used to drive the “default” field. The “default” is 1 when “Networth
Next Year” is negative and it is 0 when “Networth Next Year” is positive.
• The “Default” field is created and added to the dataset based on the condition mentioned above.
Subsequently “Networth Next Year” is not considered further as it became redundant.

Q1.1. Outlier Treatment

• Outliers are present in all of the independent variables. Outlier treatment is necessary for any
regression model, whereas in Regression, outliers pull the regression line towards itself thereby
affecting its slope. This distorts the reality and leads to faulty predictions.
• Treating outlier by using Inter Quartile range for each of numerical column.
• Values greater than Upper quartile range would be capped with 75% of quartile value
• Values lesser than Lower quartile range would be capped with 25% of quartile value
 Q1 = 25th percentile, Q3 = 75th percentile - IQR = Q3 - Q1
 Outlier = any value which lies beyond 1.5 times of IQR from Q1 and Q3 on either side
• The outliers would be replaced with Upper Quartile values or lower. And post outlier treatment
the numerical variables in boxplot:

7|Page
Before Treating Outliers

Fig. 1 Outliers in data set

After Treating Outliers

Fig. 2 – After Treating Outliers in data set

8|Page
Inference:

Given the fact that this is a financial data and the outliers might very well reflect the information
which is genuine in nature. Since there is data captured for small, medium as well as large
companies.

Q1.2. Missing Value Treatment

There are some missing values in the dataset which is to be treated in the further steps. Given the size
of the data set i.e. 3586 rows, there were not many missing values to start with.
Before imputing NULL Values :

There were a total 118 missing records around 0.05% observed in the entire data.
Null values were present in many columns, however significant number was present in
"Inventory_Vel_Days" column. This is the one which we treated.
There is a large number of zeros, these zeros don't add any predictive value. But it will cause "linear
algebra error" when using StatsModel .
Records with missing value in "Inventory_Vel_Days" column were imputed with the average value.

Columns with missing values:

(array([27, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 62], dtype=int64),)

9|Page
Visually inspect the missing values in our data: A visual of all these missing values is give below - after
dropping

Fig. 3 – Visual of missing values – After dropping

After imputing NULL values:

We will be using from sklearn.impute import SimpleImputer for treating Null values

No more missing values were present after treatment

10 | P a g e
Q1.3 Transform Target variable into 0 and 1

• There is no target variable defined – but since the objective is to build a model for investor to

decode which company to invest in – the variable Networth_Next_Year could be used to

transform into target variable.

• If the company’s Networth_Next_Year is positive and greater than 0 – then the company would

continue to return good investment for investor and thus could be transformed as 0 – NON-

DEFAULT

• If the company’s Networth_Next_Year is equal to 0 or less than it – then the company would

likely not return a good investment to investor and transformed as 1 – DEFAULT.

 Overall distribution of defaults in the dataset:

0 3198
1 388
Which is to say about 11% of the companies from the dataset are likely to default and ones the investor
could avoid investing in.

Target Variable - First 5 Rows Target Variable - Last 5 Rows

Table 4. – Target Variable First & Last 5 Rows

11 | P a g e
Q1.4 Univariate & Bivariate analysis with proper interpretation. (We choose to
include only those variables which were significant in the model building).
Univariate Analysis:

Basic measures of Descriptive Statistics for the continuous variables

We performed the descriptive summary for the company data. Since most of the column data is
continuous, we can see the mean, standard deviation and percentile details for all the columns.

Table.5 - Descriptive Statistics Summary

Target Variable Default

Fig.4 Count plot - Target Variable (Default)

12 | P a g e
Some of Important Feature Box Plot

Fig.5 Important Feature variables Box Plot

• Variable 'Total_Asset_To_Liabilities still have some extreme values.

• Varible 'Capital_Employes'still have some Extreme and Lower values.

13 | P a g e
15 Significate Scaled Feature Variables - Distribution of column with Displot & Box plot:

14 | P a g e
15 | P a g e
Fig.6 Scaled Feature Variables - Distribution of column with Displot & Box plot:

16 | P a g e
Inference:

• ‘Selling Cost’ - has max companies around its mean. They have Right Skew with outliers on
higher side.
• ‘Cash Flow from Operating Activities’ - normal distribution with max companies lying around
the mean
• ‘PBIDT’ - ‘Profit before Int Depreciation and Tax’ - max companies are around the mean
with a prominent right skew. This indicates that there are still many companies with high
PBIDT
• ‘ROG Networth’, ‘ROG Capital Employed’, ‘ROG Total Assets’, ‘ROG PBIDT’, ‘ROG
PBT (Profit Before Tax)’, ‘ROG CP’, ‘Current ratio Latest’, ‘Interest Cover Ratio Latest’,
• ‘Value of Output to Total Assets’, ‘Net Sales, ‘Book Value Adjusted’ - these variables have
max density of companies around its mean with right skew. This indicates outliers on the
higher side.
• ‘APATM (After Tax Profit Margin)’ - has max density around its mean and a prominent left
skew. This indicates that there are many companies have their Net Profit on the lower side of
the distribution - Possible indication of default.
• Mostly, it is observed that there are many companies with good margin and financials before
tax and all other costs. But, after costs are considered, they slide to the lower half - Shows they
need to work on their costs and bottom line.

17 | P a g e
Bivariate Analysis:

1. Gross Sales Vs Net Sales 2. Net worth Vs Capital Employment:

Fig.7 - Gross Sales Vs Net Sales Fig.8 - Net worth Vs Capital Employment

• There exists linear relationship between these • As the capital increases, net worth also increases,
two important variables. but in some cases, capital seems to be disbursed
even for lesser net worth

3. Networth Vs Cost of Production:

This plot is scattered and there exists no such

relationship between these two variables.

Fig.9 – Net worth Vs Cost of Production:

18 | P a g e
Correlation Heatmap: As per Regression Feature Elimination (RFE) we got Top important
Variables and 1 Target Variable. These were significant in the model building

Fig 10. - Correlation heat-map - Top significant Variables

• Correlation heat-map of top 19 predictors and 1 Target, used to get the best correlation is given
above.
• Fixed_Asset_Ratio(Latest) and Value_of_output_to_gross_block &
Total_Asset_Turnover_Ration (Latest) and Value_of_output_to_Total_Assets :- the above pairs
of features show high correlation, it looks obvious as they seem derived or direct functions of
each other
• Target variable ‘default’ has high negative correlation with Book_Value_unit_curr. This indicates
as Book Value rises, Probability of Default falls

19 | P a g e
Inferences from Univariate and Bi-variate analysis:

• Most of the variables have skewed distribution. But we will not treat those distribution by any

kind of transformation or new features.

• All the variables have outliers. These outliers will be treated as we are going to apply Logistic

regression to predict the outcome.

• Bi-variate analysis is performed on some of the important variables

• We also performed multi variate analsysis on the data to see if there are any correlation that are

observed within the data.

• It is observed that there is negative correlation between variables

• We observed that networth and networth next year were highly correlated. Apart from this, we

also found various Rate of Growth variables were highly correlated.

• Also, we observed that there is positive correlation between variables CPM_perc_Latest,

APATM_perc_Latest, PBIDTM_perc_Latest, PBITM_perc_Latest, PBDTM_perc_Latest.

• It is observed that there is positive correlation between variables Capital_Employed and

Current_Assets, Current_Assets and Total_debt, Total_debt and Capital_Employed.

• Overall, high positive and negative correlation between variables can be seen above.This analysis

tells us that there is a problem of collinearity with this data set.

20 | P a g e
Q 1.5 Split the data into Train and Test dataset in a ratio of 67:33

• Split the data into Train and Test dataset in a ratio of 67:33 and use random_state =42.

• We are splitting the data set as X (data which has independent variables) and y (data which has

the predictor variable)

• Model Building is to be done on Train Dataset and Model Validation is to be done on Test

Dataset.

• We use train_test_split function from scikit-learn library to split the data.

• Out of Total 3586 So after split, Train Set has 2402 observations & Test Set has 1184

observations

Train Date First 5 Rows

Table.6 - Scaled Train dataset First 5 rows

Test Date First 5 Rows

Table.7 -Scaled Test dataset First 5 rows

21 | P a g e
Training Data Columns :

Q 1.6 Build Logistic Regression Model (using statsmodel library) on most

important variables on Train Dataset and choose the optimum cutoff. Also
showcase your model building approach.

Build Logistic Regression Model (using statsmodel library) on most important variables on Train
Dataset and choose the optimum cut-off. Also showcase your model building approach.

22 | P a g e
For model building, we try to approach Recursive Feature Elimination (RFE) and we want to
select top 21 features (1/3rd of total feature variables) that would contribute to the model well.

We give weightage to each variable and based on the weightage; rankings are provided.
For modelling we will use Logistic Regression will recursive feature elimination.

Below are the highest contributing independent variables to the model building.

Table.8 - Highest contributing independent variables

Applying GridSearchCV for Logistic Regression :

grid_search.best_params_ and grid_search.best_estimator_ are as follows :

23 | P a g e
Q 1.7 Validate the Model on Test Dataset and state the performance matrices.
Also state interpretation from the model
➢ We train the model and then validate the model in both the training and testing sets. We are
plotting the confusion matrix and classification report for both sets.
➢ We could see high precision and accuracy, but the recall seems to be less in the training data.
➢ We need to improve the recall value as that would give us True Positives (TP), which in turn
means that , we will correctly identify the defaulters accurately, because if we miss a defaulter,
that would account to the bank paying higher interests to the existing debts and cash flow will
not be regularised in the bank.
Confusion matrix and Classification Report for the training set:

Fig.11 – Training Set - Confusion matrix and Classification Report

 We train the model and then validate the model in both the training and testing sets.

24 | P a g e
Confusion matrix and Classification Report for the test set :

We could see high precision and accuracy, but the recall seems to be less in the testing set.

Fig.12 – Testing Set - Confusion matrix and Classification Report

 Accuracy of over 95% was achieved, while precision, recall and f1 score were also very high at
96%, 98% and 97% respectively.
 We could see for both the models, accuracy and precision (ratio of True Positive’s to the entire
Positive’s) is on the higher side, but the recall seems to be downfall in both these sets.
 This seems to be the case because we had an imbalanced dataset for our model.
➢ So, we have balanced our default values (the ratio of 0’s to 1’s is to be increased) In our dataset,
we only had 11% of the defaults, we try to balance the dataset using SMOTE technique before
fitting it in our model.
➢ After applying the SMOTE technique, we fit the model and predict our values in both training
and testing sets.

25 | P a g e
Classification Report for the Training Set (SMOTE):

Fig.13 – SMOTE Training Set - Confusion matrix and Classification Report

 We could see that the recall has improved greatly, so the chances of identifying our defaulters

has significantly improved and there is less chance of the model missing out on any potential

default candidates/companies to our bank.

26 | P a g e
Classification Report for the Testing Set (SMOTE):

Fig.14 – SMOTE Testing Set - Confusion matrix and Classification Report

 Accuracy of 92% and, precision, high recall and f1 score of 99% ,92% and 95% respectively
were also observed on the test set.

Finally, we are able to achieve a descent recall value without overfitting. Considering
the opportunities such as outliers, missing values and correlated features this is a fairly
good model. It can be improved if we get better quality data where the features
explaining the default are not missing to this extent. Of course, we can try other
techniques which are not sensitive towards missing values and outliers.

27 | P a g e
Interpretation:

This clearly indicates that the model which has been built is highly efficient and has been
able to capture the correct variable for prediction. It has been proven to work on train as
well as test data.

Re-call value for the testing & the training set are near and this model is the best suited to
identify correct defaulters because of high-recall value in both sets.

Precision seem to be on a lower side for the sets because of the SMOTE technique as we
try to create more values to balance the defaulter ratio.

But, in this model , recall seems to be an important factor as we stress on identifying the
defaulters accurately.

From Multi-variate Analysis, we observed that many companies had good proﬁt margins
before considering taxes, interests and other costs. But once all costs are considered along-
with taxes and depreciation, majority of these companies slide to the bottom half in
Proﬁtability. These companies should focus on optimizing their bottom line.

THE END

28 | P a g e

Alexander Lawrence - Leviathan's Ruse, Volume 1 - PDF (TKRG)
75% (4)
Alexander Lawrence - Leviathan's Ruse, Volume 1 - PDF (TKRG)
402 pages
JVC GYDV5101 en
No ratings yet
JVC GYDV5101 en
103 pages
AMUGE JOAN 2-1
No ratings yet
AMUGE JOAN 2-1
41 pages
EDUCATIONAL PSYCHOLOGY ppt
No ratings yet
EDUCATIONAL PSYCHOLOGY ppt
11 pages
x 431 Pro Se说明书 en
No ratings yet
x 431 Pro Se说明书 en
86 pages
English 7 Q2 - Module 2 Library Tools
No ratings yet
English 7 Q2 - Module 2 Library Tools
41 pages
Check VRVDaikin Soft Instruct
No ratings yet
Check VRVDaikin Soft Instruct
40 pages
FRA Main Project Part B Guided
No ratings yet
FRA Main Project Part B Guided
23 pages
Advertisement Analysis Prompt
No ratings yet
Advertisement Analysis Prompt
2 pages
Finance Risk Analytics - Priyanka Sharma - Business Report
No ratings yet
Finance Risk Analytics - Priyanka Sharma - Business Report
49 pages
CALCULATE - Return On Your College Investment Activity
No ratings yet
CALCULATE - Return On Your College Investment Activity
6 pages
503: Rhyd-y-Bleu, Ebbw Vale. Building Recording Survey. APAC. LTD
No ratings yet
503: Rhyd-y-Bleu, Ebbw Vale. Building Recording Survey. APAC. LTD
57 pages
0625 Thermal Properties and Temperature - P4 - MS2
No ratings yet
0625 Thermal Properties and Temperature - P4 - MS2
13 pages
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
No ratings yet
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
48 pages
April Free Chapter - The Light Between Oceans by M. L. Stedman
No ratings yet
April Free Chapter - The Light Between Oceans by M. L. Stedman
19 pages
Bahrain Licensure Examination 2022
No ratings yet
Bahrain Licensure Examination 2022
4 pages
FRA Module Business Report
100% (1)
FRA Module Business Report
37 pages
CH 6 Learning Guide
No ratings yet
CH 6 Learning Guide
13 pages
Suresh-Rose Time Series Forecasting Project Report
100% (1)
Suresh-Rose Time Series Forecasting Project Report
75 pages
The Feast of Gold
No ratings yet
The Feast of Gold
25 pages
PSM 2 Dump
No ratings yet
PSM 2 Dump
11 pages
Faculty Schedule
No ratings yet
Faculty Schedule
39 pages
Great Learning Predictive Modelling Project
No ratings yet
Great Learning Predictive Modelling Project
12 pages
Thy Will Be Done+ PDF
No ratings yet
Thy Will Be Done+ PDF
1 page
Emotions in The Practice of Psychotherapy PDF
100% (1)
Emotions in The Practice of Psychotherapy PDF
230 pages
Business Report SMDM Project - Coded
No ratings yet
Business Report SMDM Project - Coded
27 pages
Project Advanced Statistics UMESHHASIJA SEP2021 Jupyter File
100% (1)
Project Advanced Statistics UMESHHASIJA SEP2021 Jupyter File
25 pages
ĐỀ THI HSG ANH 8 TP VIỆT TRÌ - PHÚ THỌ 2023-2024
No ratings yet
ĐỀ THI HSG ANH 8 TP VIỆT TRÌ - PHÚ THỌ 2023-2024
10 pages
ML 2 - Problem statements and Rubirics
No ratings yet
ML 2 - Problem statements and Rubirics
3 pages
Human Resource Management: Instructor: Sameia Farhat Lecture No. 7
No ratings yet
Human Resource Management: Instructor: Sameia Farhat Lecture No. 7
42 pages
ML-2 Guided Project Report
No ratings yet
ML-2 Guided Project Report
63 pages
Project Predictive Modeling
50% (2)
Project Predictive Modeling
69 pages
Predictive Modelling Project 2
100% (4)
Predictive Modelling Project 2
32 pages
RACHIT MITTAL Capstone Project. Notes 2 PDF
No ratings yet
RACHIT MITTAL Capstone Project. Notes 2 PDF
39 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
No ratings yet
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
12 pages
Design & Fabrication of Portable Organic Waste Chopping Machine To Obtain Compost
No ratings yet
Design & Fabrication of Portable Organic Waste Chopping Machine To Obtain Compost
8 pages
Social Media Tourism - Capstone Project
No ratings yet
Social Media Tourism - Capstone Project
13 pages
Osage Orange Tree
No ratings yet
Osage Orange Tree
6 pages
P L Lohitha 19-04-23 TSF Business Report
No ratings yet
P L Lohitha 19-04-23 TSF Business Report
70 pages
All Mcq's For MCQ 2
No ratings yet
All Mcq's For MCQ 2
18 pages
2.1.1 Computational Thinking 1
No ratings yet
2.1.1 Computational Thinking 1
6 pages
Business Report Pradeep Chauhan 11june'23
100% (1)
Business Report Pradeep Chauhan 11june'23
25 pages
Detail Project Report SMDM
100% (1)
Detail Project Report SMDM
25 pages
ML - Project - Business Report
No ratings yet
ML - Project - Business Report
43 pages
Time Series Rose Shehroz Arfeen
100% (1)
Time Series Rose Shehroz Arfeen
42 pages
SMDM - Project Report - Lakshmi
No ratings yet
SMDM - Project Report - Lakshmi
26 pages
ANOVA
33% (3)
ANOVA
1 page
MySQL - Week 5 Quiz
100% (1)
MySQL - Week 5 Quiz
6 pages
Steven B Clark Defends Sword of The Spirit
100% (1)
Steven B Clark Defends Sword of The Spirit
6 pages
Shivani Pandey TSF
100% (1)
Shivani Pandey TSF
32 pages
Data Mining Clustering PDF
No ratings yet
Data Mining Clustering PDF
15 pages
PM ProjectJune - 2021
100% (1)
PM ProjectJune - 2021
33 pages
Rajendra Ladda DVT Car Insurance Tableau Project
No ratings yet
Rajendra Ladda DVT Car Insurance Tableau Project
8 pages
SMDM Project Report Dipti
No ratings yet
SMDM Project Report Dipti
14 pages
SQL Project Questions
0% (1)
SQL Project Questions
3 pages
Effect of Drugs To Young Adults
No ratings yet
Effect of Drugs To Young Adults
2 pages
AS Graded Project Suchi Solanki
No ratings yet
AS Graded Project Suchi Solanki
21 pages
E Waste Final Project
50% (8)
E Waste Final Project
31 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
Sunira - Predictive Modeling
100% (1)
Sunira - Predictive Modeling
65 pages
Factor-Hair RV PDF
No ratings yet
Factor-Hair RV PDF
23 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
Answer Report: Data Mining
No ratings yet
Answer Report: Data Mining
32 pages
SQL Quiz Results
No ratings yet
SQL Quiz Results
17 pages
Anshul Dyundi Predictive Modelling Alternate Project July 2022
No ratings yet
Anshul Dyundi Predictive Modelling Alternate Project July 2022
11 pages
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
Palash Bhai - Machine Learning Assignment
100% (2)
Palash Bhai - Machine Learning Assignment
18 pages
Asphalt Shingles Data Analysis PDF
No ratings yet
Asphalt Shingles Data Analysis PDF
4 pages
Mini Project - Factor Hair Analysis: Sravanthi.M
100% (2)
Mini Project - Factor Hair Analysis: Sravanthi.M
24 pages
Answer Book - Rose Wines
100% (1)
Answer Book - Rose Wines
11 pages
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
No ratings yet
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
56 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
Assignment Clustering
No ratings yet
Assignment Clustering
22 pages
Anamit Deb Gupta Mra - Project Milestone - 1
100% (1)
Anamit Deb Gupta Mra - Project Milestone - 1
30 pages
Stage Function: Stages in The Reservoir Life
No ratings yet
Stage Function: Stages in The Reservoir Life
5 pages
SuperKart Milestone1 Final
No ratings yet
SuperKart Milestone1 Final
15 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
Logistic Regression Quiz: Pandas Version: 1.0.5 Seaborn Version: 0.10.1 Matplotlib Version: 3.2.1 Sklearn Version: 0.23.1
50% (2)
Logistic Regression Quiz: Pandas Version: 1.0.5 Seaborn Version: 0.10.1 Matplotlib Version: 3.2.1 Sklearn Version: 0.23.1
1 page
Car Transport Machine Learning
89% (9)
Car Transport Machine Learning
28 pages
Problem Statement1
No ratings yet
Problem Statement1
1 page
Project - Finance and Risk Assessment: Submitted By: Navendu Mishra
No ratings yet
Project - Finance and Risk Assessment: Submitted By: Navendu Mishra
18 pages
End Term Quiz1 - Attempt Review
No ratings yet
End Term Quiz1 - Attempt Review
5 pages
Fra Project Report-Bajaj Auto Ltd. Vs Hero Motocorp Ltd. (Group-X)
100% (1)
Fra Project Report-Bajaj Auto Ltd. Vs Hero Motocorp Ltd. (Group-X)
10 pages
Quiz 3 Name: Kainat Iftikhar Reg# 2021630007 1. List Three Examples of Time Series Data. Time Series Data
No ratings yet
Quiz 3 Name: Kainat Iftikhar Reg# 2021630007 1. List Three Examples of Time Series Data. Time Series Data
2 pages
Data Mining Project - 27.06.2021
No ratings yet
Data Mining Project - 27.06.2021
6 pages
Executive Sumary - Rajarshi Das (Data Visualization Using Tableau Project)
100% (1)
Executive Sumary - Rajarshi Das (Data Visualization Using Tableau Project)
11 pages
Project Questions
No ratings yet
Project Questions
3 pages
The BaLobedu by H Stoddart PDF
No ratings yet
The BaLobedu by H Stoddart PDF
8 pages

FRA Project Report Milestone 1 PDF

Uploaded by

FRA Project Report Milestone 1 PDF

Uploaded by

Finance and Risk Analytics PROJECT REPORT

Problem 1: Predicting Credit Risk

Executive Summary Problem-I ................................................................................................................................... 3

Data Description .......................................................................................................................................... 4 - 6

Sample of the dataset: ........................................................................................................................................6

1.1 Outlier Treatment. …..……………………………………………………………………… 7

1.2 Missing Value Treatment………………………..……………………………………….…. 9

1.3 Transform Target variable into 0 and 1 ……………………………………….…………… 11

1.5 Train Test Split ……………….………….………………………………….……………… 21

interpretation from the model ……………………………………………………………...…... 24

Fig. 1 Outliers in data set……………………………….………………………………………. 8

List of Tables Page No.

Table 1 -ata description of all variables of raw data……………………………….……………….......... 4-6

THE END! ..................................................................................................................................................................28

S.No Field Name Description

Table 1 - Data description of all variables of raw data

Sample of the dataset

Table 2. Dataset Sample

Table 3. Fixing Messy Column - Dataset Sample

• There is NO duplicate entry in the dataset.

• For our analysis, We can drop Co_Code and Co_Name variables.

Q1.1. Outlier Treatment

Fig. 1 Outliers in data set

Fig. 2 – After Treating Outliers in data set

Q1.2. Missing Value Treatment

Columns with missing values:

Fig. 3 – Visual of missing values – After dropping

No more missing values were present after treatment

decode which company to invest in – the variable Networth_Next_Year could be used to

transform into target variable.

likely not return a good investment to investor and transformed as 1 – DEFAULT.

 Overall distribution of defaults in the dataset:

Target Variable - First 5 Rows Target Variable - Last 5 Rows

Table 4. – Target Variable First & Last 5 Rows

Basic measures of Descriptive Statistics for the continuous variables

Table.5 - Descriptive Statistics Summary

Target Variable Default

Fig.4 Count plot - Target Variable (Default)

Fig.5 Important Feature variables Box Plot

• Variable 'Total_Asset_To_Liabilities still have some extreme values.

1. Gross Sales Vs Net Sales 2. Net worth Vs Capital Employment:

3. Networth Vs Cost of Production:

This plot is scattered and there exists no such

Fig.9 – Net worth Vs Cost of Production:

Fig 10. - Correlation heat-map - Top significant Variables

kind of transformation or new features.

regression to predict the outcome.

• Bi-variate analysis is performed on some of the important variables

observed within the data.

• It is observed that there is negative correlation between variables

also found various Rate of Growth variables were highly correlated.

• Also, we observed that there is positive correlation between variables CPM_perc_Latest,

APATM_perc_Latest, PBIDTM_perc_Latest, PBITM_perc_Latest, PBDTM_perc_Latest.

• It is observed that there is positive correlation between variables Capital_Employed and

Current_Assets, Current_Assets and Total_debt, Total_debt and Capital_Employed.

tells us that there is a problem of collinearity with this data set.

the predictor variable)

• We use train_test_split function from scikit-learn library to split the data.

Train Date First 5 Rows

Table.6 - Scaled Train dataset First 5 rows

Test Date First 5 Rows

Table.7 -Scaled Test dataset First 5 rows

Q 1.6 Build Logistic Regression Model (using statsmodel library) on most

Table.8 - Highest contributing independent variables

Applying GridSearchCV for Logistic Regression :

grid_search.best_params_ and grid_search.best_estimator_ are as follows :

Fig.11 – Training Set - Confusion matrix and Classification Report

Fig.12 – Testing Set - Confusion matrix and Classification Report

Fig.13 – SMOTE Training Set - Confusion matrix and Classification Report

default candidates/companies to our bank.

Fig.14 – SMOTE Testing Set - Confusion matrix and Classification Report

You might also like