100% found this document useful (2 votes)
1K views

FRA Project Business Report

The document describes analyzing financial data for a company. It includes: - Importing train and test data sets with 3541 observations and 52 variables for the train set. - Removing an unnecessary variable, checking for missing values (10007 in train, 2593 in test), and imputing missing values with medians. - Converting character variables to numeric and further treating outliers. - Removing unnecessary variables, leaving 50 variables and 3541 observations for analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
1K views

FRA Project Business Report

The document describes analyzing financial data for a company. It includes: - Importing train and test data sets with 3541 observations and 52 variables for the train set. - Removing an unnecessary variable, checking for missing values (10007 in train, 2593 in test), and imputing missing values with medians. - Converting character variables to numeric and further treating outliers. - Removing unnecessary variables, leaving 50 variables and 3541 observations for analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

RAVEENDRA BABU GADDAM

FRA PROJECT AS ON 15TH MARCH-2020

Importing of data sets

> traindata<-read_excel("raw-data.xlsx")

> testdata<-read_excel("validation_data.xlsx")

> dim(traindata)

[1] 3541 52

The train data contains 3541 observations and 52 variables.

Names of the 52 variables are displayed below.

> names(traindata)

[1] "Num" "Networth Next Year"


[3] "Total assets" "Net worth"
[5] "Total income" "Change in stock"
[7] "Total expenses" "Profit after tax"
[9] "PBDITA" "PBT"
[11] "Cash profit" "PBDITA as % of total income"
[13] "PBT as % of total income" "PAT as % of total income"
[15] "Cash profit as % of total income" "PAT as % of net worth"
[17] "Sales" "Income from financial services"
[19] "Other income" "Total capital"
[21] "Reserves and funds" "Deposits (accepted by commercial
banks)"
[23] "Borrowings" "Current liabilities & provisions"
[25] "Deferred tax liability" "Shareholders funds"
[27] "Cumulative retained profits" "Capital employed"
[29] "TOL/TNW" "Total term liabilities / tangible net
worth"
[31] "Contingent liabilities / Net worth (%)" "Contingent liabilities"
[33] "Net fixed assets" "Investments"
[35] "Current assets" "Net working capital"
[37] "Quick ratio (times)" "Current ratio (times)"
[39] "Debt to equity ratio (times)" "Cash to current liabilities (times)"
[41] "Cash to average cost of sales per day" "Creditors turnover"
[43] "Debtors turnover" "Finished goods turnover"
[45] "WIP turnover" "Raw material turnover"
[47] "Shares outstanding" "Equity face value"
[49] "EPS" "Adjusted EPS"
[51] "Total liabilities" "PE on BSE"

# Let us the remove the column of Deposits (accepted by commercial banks)

# which does not have any data

> traindata<-traindata[,-22]

> View(traindata)

> Newtraindata<-traindata

# Creation of default variable for train data set

> Newtraindata$Default<-ifelse(Newtraindata$`Networth Next Year`>0,0,1)


# Checking for missing values and treating of the same

sum(is.na(Newtraindata))
[1] 10007
> sum(is.na(testdata))

[1] 2593

There are 10007 & 2593 missing value cells in train and test data respectively.

Looking for percentage of missing values

> plot_intro(Newtraindata)

Memory Usage: 2.1 Mb

Discrete Columns 15.4%

Continuous Columns 84.6%

Dimension
Metrics

a column
All Missing Columns 0.0%
a observation

a row

Complete Rows 26.4%

Missing Observations 5.4%

0.0% 20.0% 40.0% 60.0% 80.0%


Value
> Newtestdata<-testdata[,-22]

> plot_intro(Newtestdata)
Memory Usage: 498.9 Kb

Discrete Columns 15.7%

Continuous Columns 84.3%

Dimension
Metrics

a column
All Missing Columns 0.0%
a observation

a row

Complete Rows 25.6%

Missing Observations 5.2%

0.0% 20.0% 40.0% 60.0% 80.0%


Value

There are 5.4% & 5.2% missing observations in Train and Test data respectively.

# Conversion Character variables into Number Variables and treatment of missing values (Imputing of
missing values by median)

> for(i in 1:ncol(Newtraindata)){


+ Newtraindata[,i] <- as.numeric(Newtraindata[,i])
+ Newtraindata[is.na(Newtraindata[,i]), i] <- median(Newtraindata[,i], na.rm = TRUE)
+ }

> plot_intro(Newtraindata)

Memory Usage: 1.4 Mb

Discrete Columns 0%

Continuous Columns 100%

Dimension
Metrics

a column
All Missing Columns 0%
a observation

a row

Complete Rows 100%

Missing Observations 0%

0% 25% 50% 75% 100%


Value
> for(i in 1:ncol(Newtestdata)){
+ Newtestdata[,i] <- as.numeric(Newtestdata[,i])
+ Newtestdata[is.na(Newtestdata[,i]), i] <- median(Newtestdata[,i], na.rm = TRUE)
+ }
> plot_intro(Newtestdata)

Memory Usage: 296.8 Kb

Discrete Columns 0%

Continuous Columns 100%

Dimension
Metrics

a column
All Missing Columns 0%
a observation

a row

Complete Rows 100%

Missing Observations 0%

0% 25% 50% 75% 100%


Value

# Treatment of Outliers
The outliers in the dataset are treated by replacing the observations lesser than the 1st percentile with
value of the 1st percentile and the observations more than the 99th percentile with the value of the
99th percentile. This outlier treatment is done for every column in the dataset.

The quantile function identifies the observations less than 1st percentile and more than the 99 th
percentile. The squish function replaces the values of these identified outliers with the value of the 1st
percentile and the 99th percentile.

> for(i in 2:ncol(Newtraindata)){


+ qnt <- quantile(Newtraindata[,i], c(0.01, 0.99))
+ Newtraindata[,i] <- squish(Newtraindata[,i], qnt)
+ }

# removing of unnecessary variables in the Data Sets

> NTrain<-Newtraindata[,-c(1,2)]
> NTest<-Newtestdata[,-1]
Univariate & bivariate analysis
> dim(NTrain)

[1] 3541 50

Now we have 50 Variables and 3541 observations.

head(NTrain)

Total assets Net worth Total income Change in stock Total expenses Profit after tax
PBDITA
1 17512.3 7093.2 24965.2 235.8 23657.8 1543.2
2860.2
2 941.0 351.5 1527.4 42.7 1454.9 115.2
283.0
3 232.8 100.6 477.3 -5.2 478.7 -6.6
5.8
4 2.7 2.7 444.9 1.6 407.7 8.8
35.4
5 478.5 107.6 1580.5 -17.0 1558.0 5.5
31.0
6 2434.4 675.8 2648.6 62.3 2636.4 74.5
200.1
PBT Cash profit PBDITA as % of total income PBT as % of total income
1 2417.2 1872.80 11.46 9.68
2 188.4 158.60 18.53 12.33
3 -6.6 0.30 1.22 -1.38
4 12.4 18.85 0.00 0.00
5 6.3 11.90 1.96 0.40
6 74.5 146.90 7.55 2.81

PAT as % of total income Cash profit as % of total income PAT as % of net worth Sales
1 6.18 7.50 23.78 24458.0
2 7.54 10.38 38.08 1504.3
3 -1.38 0.06 -6.35 475.6
4 0.00 0.00 0.00 453.1
5 0.35 0.75 5.25 1575.1
6 2.81 5.55 21.78 2639.5
Income from financial services Other income Total capital Reserves and funds Borrowings
1 158.0 286.16 423.8 6822.8 14.9
2 4.0 15.90 115.5 257.8 272.5
3 1.5 0.20 81.4 19.2 35.4
4 1.8 1.40 0.5 2.2 99.2
5 3.9 0.90 6.2 161.8 193.1
6 6.4 0.20 33.8 972.0 717.1
Current liabilities & provisions Deferred tax liability Shareholders funds
1 9965.9 284.9 7093.2
2 210.0 85.2 351.5
3 96.8 13.4 100.6
4 69.4 13.4 2.7
5 112.8 4.6 107.6
6 555.9 54.4 698.2

Cumulative retained profits Capital employed TOL/TNW


1 6263.3 7108.1 1.33
2 247.4 624.0 1.23
3 32.4 136.0 1.44
4 2.2 2.7 0.00
5 82.7 300.7 2.83
6 317.7 1415.3 1.80
Total term liabilities / tangible net worth Contingent liabilities / Net worth (%)
1 0.00 14.80
2 0.34 19.23
3 0.29 45.83
4 0.00 0.00
5 1.59 34.94
6 0.37 36.28
Contingent liabilities Net fixed assets Investments Current assets Net working capital
1 1049.7 1900.2 1069.60 13277.5 3588.5
2 67.6 286.4 2.20 563.9 203.5
3 46.1 38.7 4.30 167.5 59.6
4 38.0 2.5 8.35 0.2 0.2
5 37.6 94.8 7.40 349.7 215.8
6 245.2 864.9 22.70 1296.2 278.5
Quick ratio (times) Current ratio (times) Debt to equity ratio (times)
1 1.18 1.37 0.00
2 0.95 1.56 0.78
3 1.11 1.55 0.35
4 0.67 1.23 0.00
5 1.41 2.54 1.79
6 0.48 1.27 1.09
Cash to current liabilities (times) Cash to average cost of sales per day Creditors
turnover
1 0.43 68.210
3.62
2 0.06 5.960
9.80
3 0.21 17.070
5.28
4 0.07 8.025
0.00
5 0.00 0.000
13.00
6 0.11 15.780
6.50
Debtors turnover Finished goods turnover WIP turnover Raw material turnover
1 3.85 200.55 21.780 7.71
2 5.70 14.21 7.490 11.46
3 5.07 9.24 0.238 6.40
4 0.00 17.27 9.760 0.00
5 9.46 12.68 7.900 17.03
6 21.13 10.14 8.380 4.74

Shares outstanding Equity face value EPS Adjusted EPS Total liabilities PE on BSE
Default
1 42381675 10 35.52 7.10 17512.3 27.31
0
2 11550000 10 9.97 9.97 941.0 8.17
0
3 8149090 10 -0.50 -0.50 232.8 -5.76
0
4 52404 10 0.00 0.00 2.7 9.10
0
5 619635 10 7.91 7.91 478.5 9.10
0
6 1141718 10 30.57 15.28 2434.4 9.10
0

tail(NTrain)

Total assets Net worth Total income Change in stock Total expenses Profit after tax
PBDITA
3536 17.8 1.2 15.5 -1.2 14.2 0.1
1.8
3537 450.5 172.3 565.0 30.5 581.1 14.4
76.7
3538 97.6 82.0 75.8 -4.0 66.5 5.3
11.1
3539 902.9 209.1 1005.1 5.6 966.5 44.2
120.3
3540 177.0 137.2 371.0 3.9 348.9 26.0
50.5
3541 1.7 0.3 444.9 1.6 17.4 -17.4 -
17.4
PBT Cash profit PBDITA as % of total income PBT as % of total income
3536 0.2 0.5 11.61 1.29
3537 41.1 48.4 13.58 7.27
3538 6.2 9.2 14.64 8.18
3539 70.0 62.6 11.97 6.96
3540 40.8 33.6 13.61 11.00
3541 -17.4 -17.4 9.66 3.31
PAT as % of total income Cash profit as % of total income PAT as % of net worth Sales
3536 0.65 3.23 8.700 14.3
3537 2.55 8.57 8.710 564.5
3538 6.99 12.14 6.680 73.9
3539 4.40 6.23 22.770 995.9
3540 7.01 9.06 20.300 365.8
3541 2.34 5.64 -138.524 453.1
Income from financial services Other income Total capital Reserves and funds
Borrowings
3536 1.8 1.2 1.0 0.2
14.5
3537 0.5 1.4 89.0 85.5
190.2
3538 1.7 1.4 38.6 48.4
3.0
3539 2.6 0.3 30.0 179.1
305.0
3540 3.3 1.6 50.9 86.3
1.3
3541 1.8 1.4 28.3 -28.0
99.2
Current liabilities & provisions Deferred tax liability Shareholders funds
3536 2.1 13.4 1.2
3537 42.5 36.8 172.3
3538 7.6 13.4 87.0
3539 363.4 25.4 209.1
3540 21.1 17.4 137.2
3541 0.3 13.4 0.3
Cumulative retained profits Capital employed TOL/TNW
3536 0.2 15.7 13.83
3537 76.8 362.5 1.30
3538 36.6 90.0 0.12
3539 179.1 514.1 2.45
3540 77.1 138.5 0.10
3541 -28.0 1.1 1.00

Total term liabilities / tangible net worth Contingent liabilities / Net worth (%)
3536 4.83 0.00
3537 0.72 0.00
3538 0.02 5.12
3539 0.68 93.45
3540 0.01 6.20
3541 0.00 0.00
Contingent liabilities Net fixed assets Investments Current assets Net working capital
3536 38.0 5.7 0.10 6.4 -4.4
3537 38.0 227.0 8.35 187.0 78.3
3538 4.2 21.9 6.80 55.8 47.2
3539 195.4 217.7 17.50 477.5 -49.5
3540 8.5 73.5 8.35 80.8 59.7
3541 38.0 93.5 8.35 0.6 0.3

Quick ratio (times) Current ratio (times) Debt to equity ratio (times)
3536 0.46 0.59 12.08
3537 0.41 1.71 1.10
3538 4.58 6.49 0.10
3539 0.59 0.91 1.46
3540 2.83 3.83 0.01
3541 2.00 2.00 0.00
Cash to current liabilities (times) Cash to average cost of sales per day
3536 0.07 20.71
3537 0.07 5.67
3538 3.88 177.71
3539 0.05 11.05
3540 1.35 29.93
3541 2.00 1277.50
Creditors turnover Debtors turnover Finished goods turnover WIP turnover
3536 5.81 3.67 8.33 7.52
3537 15.65 20.64 8.66 5.14
3538 10.07 14.21 5.13 4.17
3539 3.96 3.76 33.03 11.68
3540 25.00 13.75 49.00 47.03
3541 0.00 0.00 17.27 9.76
Raw material turnover Shares outstanding Equity face value EPS Adjusted EPS
3536 10.92 4672063 10 0.00 0.00
3537 19.47 14904213 10 0.97 0.97
3538 4.83 3362800 10 1.61 1.61
3539 4.63 3000000 10 13.10 13.10
3540 17.42 4422346 10 6.06 6.06
3541 0.00 5220000 10 -0.02 -0.02
Total liabilities PE on BSE Default
3536 17.8 9.10 0
3537 450.5 9.10 0
3538 97.6 2.49 0
3539 902.9 12.62 0
3540 177.0 4.07 0
3541 1.7 9.10 1

str(NTrain)

'data.frame': 3541 obs. of 50 variables:


$ Total assets : num 17512.3 941 232.8 2.7 478.5 ...
$ Net worth : num 7093.2 351.5 100.6 2.7 107.6 ...
$ Total income : num 24965 1527 477 445 1580 ...
$ Change in stock : num 235.8 42.7 -5.2 1.6 -17 ...
$ Total expenses : num 23658 1455 479 408 1558 ...
$ Profit after tax : num 1543.2 115.2 -6.6 8.8 5.5 ...
$ PBDITA : num 2860.2 283 5.8 35.4 31 ...
$ PBT : num 2417.2 188.4 -6.6 12.4 6.3 ...
$ Cash profit : num 1872.8 158.6 0.3 18.9 11.9 ...
$ PBDITA as % of total income : num 11.46 18.53 1.22 0 1.96 ...
$ PBT as % of total income : num 9.68 12.33 -1.38 0 0.4 ...
$ PAT as % of total income : num 6.18 7.54 -1.38 0 0.35 2.81 0 0.72
8.29 -2.88 ...
$ Cash profit as % of total income : num 7.5 10.38 0.06 0 0.75 ...
$ PAT as % of net worth : num 23.78 38.08 -6.35 0 5.25 ...
$ Sales : num 24458 1504 476 453 1575 ...
$ Income from financial services : num 158 4 1.5 1.8 3.9 6.4 1.8 1.8 7.3
1.8 ...
$ Other income : num 286.2 15.9 0.2 1.4 0.9 ...
$ Total capital : num 423.8 115.5 81.4 0.5 6.2 ...
$ Reserves and funds : num 6822.8 257.8 19.2 2.2 161.8 ...
$ Borrowings : num 14.9 272.5 35.4 99.2 193.1 ...
$ Current liabilities & provisions : num 9965.9 210 96.8 69.4 112.8 ...
$ Deferred tax liability : num 284.9 85.2 13.4 13.4 4.6 ...
$ Shareholders funds : num 7093.2 351.5 100.6 2.7 107.6 ...
$ Cumulative retained profits : num 6263.3 247.4 32.4 2.2 82.7 ...
$ Capital employed : num 7108.1 624 136 2.7 300.7 ...
$ TOL/TNW : num 1.33 1.23 1.44 0 2.83 1.8 0.03 5.17
1.05 3.25 ...
$ Total term liabilities / tangible net worth: num 0 0.34 0.29 0 1.59 0.37 0.03 0.94 0.3
0.54 ...
$ Contingent liabilities / Net worth (%) : num 14.8 19.2 45.8 0 34.9 ...
$ Contingent liabilities : num 1049.7 67.6 46.1 38 37.6 ...
$ Net fixed assets : num 1900.2 286.4 38.7 2.5 94.8 ...
$ Investments : num 1069.6 2.2 4.3 8.35 7.4 ...
$ Current assets : num 13277.5 563.9 167.5 0.2 349.7 ...
$ Net working capital : num 3588.5 203.5 59.6 0.2 215.8 ...
$ Quick ratio (times) : num 1.18 0.95 1.11 0.67 1.41 0.48 0.67
0.54 0.59 0.39 ...
$ Current ratio (times) : num 1.37 1.56 1.55 1.23 2.54 1.27 1.23
1.15 1.58 0.5 ...
$ Debt to equity ratio (times) : num 0 0.78 0.35 0 1.79 1.09 0.32 2.31 0.94
3.13 ...
$ Cash to current liabilities (times) : num 0.43 0.06 0.21 0.07 0 0.11 0.07 0.04
0.19 0 ...
$ Cash to average cost of sales per day : num 68.21 5.96 17.07 8.02 0 ...
$ Creditors turnover : num 3.62 9.8 5.28 0 13 ...
$ Debtors turnover : num 3.85 5.7 5.07 0 9.46 ...
$ Finished goods turnover : num 200.55 14.21 9.24 17.27 12.68 ...
$ WIP turnover : num 21.78 7.49 0.238 9.76 7.9 ...
$ Raw material turnover : num 7.71 11.46 6.4 0 17.03 ...
$ Shares outstanding : num 42381675 11550000 8149090 52404 619635
...
$ Equity face value : num 10 10 10 10 10 10 10 10 10 10 ...
$ EPS : num 35.52 9.97 -0.5 0 7.91 ...
$ Adjusted EPS : num 7.1 9.97 -0.5 0 7.91 ...
$ Total liabilities : num 17512.3 941 232.8 2.7 478.5 ...
$ PE on BSE : num 27.31 8.17 -5.76 9.1 9.1 ...
$ Default : num 0 0 0 0 0 0 0 0 0 1 ...

The data is dada frame with 3541 observations & 50 variables and all variables are number type.

> plot_str(NTrain)

> plot_intro(NTrain)

Memory Usage: 1.4 Mb

Discrete Columns 0%

Continuous Columns 100%

Dimension
Metrics

a column
All Missing Columns 0%
a observation

a row

Complete Rows 100%

Missing Observations 0%

0% 25% 50% 75% 100%


Value
There is no missing observations and coloumns in the data

> plot_histogram(NTrain)

Cash.profit Cash.profit.as...of.total.income Change.in.stock Income.from.financial.services


3000
2000 2000
1000 1500 2000
1000 500 1000 1000
500
0 0 0 0
0 2000 4000 -100 -50 0 50 -250 0 250 500 0 200 400 600 800

Net.worth PAT.as...of.net.worth PAT.as...of.total.income PBDITA


1000 2000
2000 750 1500 2000
1000 500 1000 1000
250 500
Frequency

0 0 0 0
0 5000 10000 15000 20000 -150 -100 -50 0 50 100 -200 -150 -100 -50 0 50 0 2000 4000 6000

PBDITA.as...of.total.income PBT PBT.as...of.total.income Profit.after.tax

750 2000 1500 2000


500 1000
250 1000 500 1000
0 0 0 0
-40 0 40 80 0 2000 4000 -200 -100 0 0 1000 2000 3000 4000

Sales Total.assets Total.expenses Total.income


2500 2000
2000 2000 2000
1500 1500 1500 1500
1000 1000 1000 1000
500 500 500 500
0 0 0 0
0 10000 20000 30000 40000 0 20000 40000 0 10000 20000 30000 40000 0 10000 20000 30000 40000
value
Page 1

Borrowings Capital.employed Contingent.liabilities Contingent.liabilities...Net.worth....


2500 2000
2000 2000 2000 1500
1500
1000 1000 1000 1000
500 500
0 0 0 0
0 5000 10000 15000 0 10000 20000 30000 0 2000 4000 6000 0 200 400 600 800

Cumulative.retained.profits Current.assets Current.liabilities...provisions Deferred.tax.liability


2500
2000 2000 2000
1500 2000
1500
1000 1000 1000 1000
500 500
Frequency

0 0 0 0
0 5000 10000 0 5000 10000 15000 0 3000 6000 9000 0 500 1000 1500 2000

Investments Net.fixed.assets Other.income Reserves.and.funds


3000 3000
2000 2000 2000
2000
1000 1000 1000 1000
0 0 0 0
0 1000 2000 3000 4000 5000 0 5000 10000 15000 0 100 200 300 0 5000 10000 15000

Shareholders.funds TOL.TNW Total.capital Total.term.liabilities...tangible.net.worth


2000 2000
2000 1000 1500 1500
1000 1000 1000
500
500 500
0 0 0 0
0 5000 10000 15000 20000 0 20 40 0 1000 2000 3000 0 10 20 30
value
Page 2
Adjusted.EPS Cash.to.average.cost.of.sales.per.day Cash.to.current.liabilities..times. Creditors.turnover
3000 2000 1500
2000 2000 1500
1000 1000
1000 1000 500
500
0 0 0 0
0 250 500 750 0 500 1000 0 2 4 6 0 50 100

Current.ratio..times. Debt.to.equity.ratio..times. Debtors.turnover EPS


1500 3000
1500
1500 2000
1000 1000 1000
500 500 500 1000
Frequency

0 0 0 0
0 5 10 15 20 0 10 20 30 0 50 100 150 0 250 500 750

Equity.face.value Finished.goods.turnover Net.working.capital Quick.ratio..times.


3000 1500 2500 1500
2000
2000 1000 1500 1000
1000 1000 500
500 500
0 0 0 0
0 50 100 150 200 0 250 500 750 -2000 0 2000 4000 0 5 10

Raw.material.turnover Shares.outstanding Total.liabilities WIP.turnover


1000 1500 2500
2000 1500
750 1000
500 1500 1000
500 1000 500
250 500
0 0 0 0
0 25 50 75 100 0e+00 1e+08 2e+08 0 20000 40000 0 50 100 150 200
value
Page 3

PE.on.BSE

2000
Frequency

1000

-50 0 50 100 150


value
Page 4

There is no single variable with normal distribution.

# Box plot by Default Variable.

> plot_boxplot(NTrain, by="Default",geom_boxplot_args = list("fill"="blue"))


Cash.profit Change.in.stock Net.worth PAT.as...of.total.income

(0.8,1]

[0,0.2]

0 2000 4000 -200 0 200 400 600 0 5000 100001500020000 -200 -150 -100 -50 0 50

PBDITA PBDITA.as...of.total.income PBT PBT.as...of.total.income

(0.8,1]
Default

[0,0.2]

0 2000 4000 6000 -40 0 40 80 0 2000 4000 -200-150-100 -50 0 50

Profit.after.tax Total.assets Total.expenses Total.income

(0.8,1]

[0,0.2]

0 1000 2000 3000 4000 0 10000


2 0000
3 0000
40000
5 0000 0 10000 20000 30000 400000 10000200003000040000
value
Page 1

Borrowings Cash.profit.as...of.total.income Cumulative.retained.profits Current.liabilities...provisions

(0.8,1]

[0,0.2]

0 5000 10000 15000 -100 -50 0 50 0 5000 10000 0 3000 6000 9000

Deferred.tax.liability Income.from.financial.services Other.income PAT.as...of.net.worth

(0.8,1]
Default

[0,0.2]

0 500 1000 1500 2000 0 200 400 600 800 0 100 200 300
-150 -100 -50 0 50 100

Reserves.and.funds Sales Shareholders.funds Total.capital

(0.8,1]

[0,0.2]

0 5000 10000 15000 0 10000 20000 30000 40000 0 5000 10000 15000 20000 0 1000 2000 3000
value
Page 2
Capital.employed Contingent.liabilities Contingent.liabilities...Net.worth.... Current.assets

(0.8,1]

[0,0.2]

0 10000 20000 30000 0 2000 4000 6000 0 200 400 600 800 0 5000 10000 15000

Current.ratio..times. Debt.to.equity.ratio..times. Investments Net.fixed.assets

(0.8,1]
Default

[0,0.2]

0 5 10 15 0 10 20 30 0 1000 2000 3000 4000 5000 0 5000 10000 15000

Net.working.capital Quick.ratio..times. TOL.TNW Total.term.liabilities...tangible.net.worth

(0.8,1]

[0,0.2]

-1000 0 1000 2000 3000 0 5 10 0 20 40 0 10 20 30


value
Page 3

Adjusted.EPS Cash.to.average.cost.of.sales.per.day
Cash.to.current.liabilities..times. Creditors.turnover

(0.8,1]

[0,0.2]

0 250 500 750 0 500 1000 0 2 4 6 0 50 100

Debtors.turnover EPS Equity.face.value Finished.goods.turnover

(0.8,1]
Default

[0,0.2]

0 50 100 150 0 250 500 750 0 50 100 150 0 250 500 750

Raw.material.turnover Shares.outstanding Total.liabilities WIP.turnover

(0.8,1]

[0,0.2]

0 25 50 75 100
0e+00 1e+08 2e+08 0 1000020000300004000050000 0 50 100 150 200
value
Page 4
PE.on.BSE

(0.8,1]
D e fa u lt

[0,0.2]

-50 0 50 100 150


value
Page 5

> plot_correlation(NTrain)

From the correlation matrix, it is understood that the dark red cells are positively correlated and dark
blue cells are negatively correlated.
New Variables Creation (One ration for profitability, leverage,
liquidity and company's size each )
Profitability Ratio and Share Price

> NTrain$PATonSales<-NTrain$`Profit after tax`/NTrain$Sales

> NTrain$SharePrice<-NTrain$EPS*NTrain$`PE on BSE`

Liquidity Ratio

> NTrain$NWCtoTotalAssets <- NTrain$`Net working capital`/NTrain$`Total assets`

> NTrain$TotalEquity <- NTrain$`Total liabilities`/NTrain$`Debt to equity ratio (times)`

Leverage Ratios

> NTrain$AssetstoEquity <- NTrain$`Total assets`/NTrain$TotalEquity

Company Size Ratios

> NTrain$TOINCtoTOEXP<-NTrain$`Total income`/NTrain$`Total expenses`

> NTrain$NWTOTA<-NTrain$`Net worth`/NTrain$`Total assets`

# Creation of Variables for Validation Data

> # Creation of variables for Validation Data


>
> NTest$PATonSales<-NTest$`Profit after tax`/NTest$Sales
>
> NTest$SharePrice<-NTest$EPS*NTest$`PE on BSE`
>
>
> NTest$NWCtoTotalAssets <- NTest$`Net working capital`/NTest$`Total assets`
>
> NTest$TotalEquity <- NTest$`Total liabilities`/NTest$`Debt to equity ratio (times)`
>
> NTest$AssetstoEquity <- NTest$`Total assets`/NTest$TotalEquity
>
> NTest$TOINCtoTOEXP<-NTest$`Total income`/NTest$`Total expenses`
>
> NTest$NWTOTA<-NTest$`Net worth`/NTest$`Total assets`

> NTrain[is.infinite(NTrain[,54]), 54] <- 0

> NTrain[is.infinite(NTrain[,55]), 55] <- 0

> NTest[is.infinite(NTest[,54]), 54] <- 0


>
> NTest[is.infinite(NTest[,55]), 55] <- 0
Build Logistic Regression Model on most important variables
> Model_1<- glm(Default~.,data = NTrain, family=binomial)

> summary(Model_1)

Call:
glm(formula = Default ~ ., family = binomial, data = NTrain)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.8529 -0.2604 -0.1358 -0.0236 4.0197

Coefficients: (2 not defined because of singularities)


Estimate Std. Error z value Pr(>|z|)
(Intercept) -9.628e-01 2.699e-01 -3.567 0.000360 ***
`Total assets` 2.910e-03 9.751e-04 2.984 0.002843 **
`Net worth` 2.295e-03 1.067e-03 2.151 0.031465 *
`Total income` 2.443e-05 1.812e-03 0.013 0.989243
`Change in stock` 2.616e-03 3.289e-03 0.795 0.426461
`Total expenses` -7.019e-04 1.419e-03 -0.495 0.620858
`Profit after tax` 1.330e-02 1.043e-02 1.276 0.201963
PBDITA -5.011e-03 2.685e-03 -1.867 0.061933 .
PBT -8.159e-03 9.975e-03 -0.818 0.413359
`Cash profit` -4.950e-03 3.268e-03 -1.515 0.129780
`PBDITA as % of total income` -5.030e-03 7.282e-03 -0.691 0.489755
`PBT as % of total income` 4.030e-03 1.885e-02 0.214 0.830691
`PAT as % of total income` -5.342e-03 1.924e-02 -0.278 0.781231
`Cash profit as % of total income` -6.005e-03 7.607e-03 -0.789 0.429910
`PAT as % of net worth` -1.973e-02 2.780e-03 -7.097 1.28e-12 ***
Sales 8.475e-04 1.193e-03 0.710 0.477443
`Income from financial services` 4.721e-03 7.985e-03 0.591 0.554310
`Other income` 1.259e-02 8.137e-03 1.548 0.121732
`Total capital` -2.964e-04 9.574e-04 -0.310 0.756845
`Reserves and funds` -4.078e-03 1.215e-03 -3.356 0.000792 ***
Borrowings 4.237e-03 1.739e-03 2.436 0.014837 *
`Current liabilities & provisions` -4.480e-04 1.263e-03 -0.355 0.722857
`Deferred tax liability` 1.478e-03 1.698e-03 0.871 0.384012
`Shareholders funds` 3.540e-03 1.663e-03 2.129 0.033253 *
`Cumulative retained profits` -1.375e-03 1.305e-03 -1.054 0.291984
`Capital employed` -7.255e-03 2.059e-03 -3.524 0.000425 ***
`TOL/TNW` 1.733e-02 1.791e-02 0.968 0.333062
`Total term liabilities / tangible net worth` -6.849e-02 3.987e-02 -1.718 0.085795 .
`Contingent liabilities / Net worth (%)` 4.613e-04 7.726e-04 0.597 0.550478
`Contingent liabilities` -1.415e-03 7.367e-04 -1.921 0.054760 .
`Net fixed assets` 8.316e-04 3.650e-04 2.278 0.022704 *
Investments 1.283e-03 9.806e-04 1.309 0.190613
`Current assets` -2.693e-03 8.923e-04 -3.019 0.002540 **
`Net working capital` 1.352e-03 9.480e-04 1.426 0.153805
`Quick ratio (times)` -2.113e-01 1.483e-01 -1.425 0.154238
`Current ratio (times)` 3.422e-02 7.910e-02 0.433 0.665324
`Debt to equity ratio (times)` 8.350e-02 3.137e-02 2.662 0.007776 **
`Cash to current liabilities (times)` 4.780e-01 2.503e-01 1.910 0.056184 .
`Cash to average cost of sales per day` 8.810e-04 5.260e-04 1.675 0.093926 .
`Creditors turnover` -9.910e-03 8.348e-03 -1.187 0.235180
`Debtors turnover` -4.881e-04 3.748e-03 -0.130 0.896377
`Finished goods turnover` 5.734e-04 9.815e-04 0.584 0.559100
`WIP turnover` -7.856e-03 4.606e-03 -1.705 0.088108 .
`Raw material turnover` -8.469e-03 8.755e-03 -0.967 0.333363
`Shares outstanding` -7.805e-09 8.647e-09 -0.903 0.366709
`Equity face value` -3.050e-03 3.441e-03 -0.887 0.375300
EPS -6.004e-02 3.538e-02 -1.697 0.089665 .
`Adjusted EPS` 2.916e-02 2.193e-02 1.330 0.183616
`Total liabilities` NA NA NA NA
`PE on BSE` -1.348e-02 6.544e-03 -2.059 0.039460 *
PATonSales 6.093e-03 3.149e-03 1.935 0.053029 .
SharePrice 3.320e-03 1.647e-03 2.016 0.043808 *
NWCtoTotalAssets 2.069e-01 9.855e-02 2.099 0.035821 *
TotalEquity -2.277e-04 1.925e-04 -1.183 0.236793
AssetstoEquity NA NA NA NA
TOINCtoTOEXP -9.864e-06 3.844e-04 -0.026 0.979527
NWTOTA -4.144e+00 5.971e-01 -6.941 3.90e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1770.97 on 3540 degrees of freedom


Residual deviance: 924.26 on 3486 degrees of freedom
AIC: 1034.3

Number of Fisher Scoring iterations: 14

There are only few items are significant and remaining is not significant.

# Creation of another model after removing non-significant variables

> Model_2<-glm(Default~`Total assets`+`Net worth`+`Profit after tax`+PBDITA+`Cash


profit`+`PAT as % of net worth`+`Other income`+`Reserves and
funds`+Borrowings+`Shareholders funds`+`Capital employed`+`Total term liabilities /
tangible net worth`+
+ `Contingent liabilities`+`Net fixed assets`+`Current assets`+`Debt to
equity ratio (times)`+`Cash to current liabilities (times)`+`Cash to average cost of sales
per day`+`WIP turnover`+EPS+`PE on
BSE`+PATonSales+SharePrice+NWCtoTotalAssets+NWTOTA,data=NTrain,family = binomial)

> summary(Model_2)

Call:
glm(formula = Default ~ `Total assets` + `Net worth` + `Profit after tax` +
PBDITA + `Cash profit` + `PAT as % of net worth` + `Other income` +
`Reserves and funds` + Borrowings + `Shareholders funds` +
`Capital employed` + `Total term liabilities / tangible net worth` +
`Contingent liabilities` + `Net fixed assets` + `Current assets` +
`Debt to equity ratio (times)` + `Cash to current liabilities (times)` +
`Cash to average cost of sales per day` + `WIP turnover` +
EPS + `PE on BSE` + PATonSales + SharePrice + NWCtoTotalAssets +
NWTOTA, family = binomial, data = NTrain)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.8988 -0.2745 -0.1524 -0.0477 3.8622

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2342163 0.2175334 -5.674 1.40e-08 ***
`Total assets` 0.0019376 0.0005403 3.586 0.000336 ***
`Net worth` 0.0012991 0.0008780 1.480 0.138990
`Profit after tax` 0.0020926 0.0014730 1.421 0.155431
PBDITA -0.0017381 0.0011454 -1.517 0.129155
`Cash profit` -0.0065740 0.0019866 -3.309 0.000936 ***
`PAT as % of net worth` -0.0243717 0.0025940 -9.396 < 2e-16 ***
`Other income` 0.0107008 0.0069793 1.533 0.125220
`Reserves and funds` -0.0031307 0.0006937 -4.513 6.39e-06 ***
Borrowings 0.0030806 0.0007974 3.864 0.000112 ***
`Shareholders funds` 0.0024006 0.0009051 2.652 0.007994 **
`Capital employed` -0.0050348 0.0011137 -4.521 6.16e-06 ***
`Total term liabilities / tangible net worth` -0.0570652 0.0370392 -1.541 0.123397
`Contingent liabilities` -0.0015924 0.0006586 -2.418 0.015606 *
`Net fixed assets` 0.0006934 0.0002372 2.923 0.003468 **
`Current assets` -0.0019319 0.0005077 -3.805 0.000142 ***
`Debt to equity ratio (times)` 0.1022856 0.0305061 3.353 0.000800 ***
`Cash to current liabilities (times)` 0.1596880 0.1455585 1.097 0.272611
`Cash to average cost of sales per day` 0.0012400 0.0004379 2.831 0.004635 **
`WIP turnover` -0.0078436 0.0037295 -2.103 0.035454 *
EPS -0.0127152 0.0062313 -2.041 0.041296 *
`PE on BSE` -0.0148210 0.0061842 -2.397 0.016549 *
PATonSales 0.0054742 0.0027123 2.018 0.043563 *
SharePrice 0.0011132 0.0005237 2.126 0.033524 *
NWCtoTotalAssets 0.1954791 0.0998926 1.957 0.050360 .
NWTOTA -4.1417719 0.5749713 -7.203 5.87e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)


Null deviance: 1770.97 on 3540 degrees of freedom
Residual deviance: 964.16 on 3515 degrees of freedom
AIC: 1016.2

Number of Fisher Scoring iterations: 11

Model-2 also showing some variables are not significant and AIC improved from 1034.3 to 1016.2

# Building of another model after removing non significant variables.

> Model_3<-glm(Default~`Total assets`+`Cash profit`+`PAT as % of net worth`+`Reserves and


funds`+Borrowings+`Shareholders funds`+`Capital employed`+
+ + `Contingent liabilities`+`Net fixed assets`+`Current
assets`+`Debt to equity ratio (times)`+`Cash to average cost of sales per day`+`WIP
turnover`+EPS+`PE on BSE`+PATonSales+SharePrice+NWCtoTotalAssets+NWTOTA,data=NTrain,family
= binomial)

> summary(Model_3)

Call:
glm(formula = Default ~ `Total assets` + `Cash profit` + `PAT as % of net worth` +
`Reserves and funds` + Borrowings + `Shareholders funds` +
`Capital employed` + +`Contingent liabilities` + `Net fixed assets` +
`Current assets` + `Debt to equity ratio (times)` + `Cash to average cost of sales per
day` +
`WIP turnover` + EPS + `PE on BSE` + PATonSales + SharePrice +
NWCtoTotalAssets + NWTOTA, family = binomial, data = NTrain)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.9297 -0.2788 -0.1537 -0.0458 3.9879

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2089980 0.2142856 -5.642 1.68e-08 ***
`Total assets` 0.0021031 0.0005607 3.751 0.000176 ***
`Cash profit` -0.0059508 0.0019384 -3.070 0.002140 **
`PAT as % of net worth` -0.0239192 0.0025650 -9.325 < 2e-16 ***
`Reserves and funds` -0.0031037 0.0006860 -4.524 6.06e-06 ***
Borrowings 0.0026913 0.0007790 3.455 0.000551 ***
`Shareholders funds` 0.0030583 0.0006930 4.413 1.02e-05 ***
`Capital employed` -0.0048281 0.0011405 -4.233 2.30e-05 ***
`Contingent liabilities` -0.0016599 0.0006970 -2.382 0.017236 *
`Net fixed assets` 0.0005711 0.0002703 2.113 0.034599 *
`Current assets` -0.0021453 0.0005115 -4.194 2.73e-05 ***
`Debt to equity ratio (times)` 0.0596168 0.0132579 4.497 6.90e-06 ***
`Cash to average cost of sales per day` 0.0014287 0.0003903 3.660 0.000252 ***
`WIP turnover` -0.0077262 0.0037417 -2.065 0.038934 *
EPS -0.0132850 0.0064645 -2.055 0.039874 *
`PE on BSE` -0.0143776 0.0062189 -2.312 0.020783 *
PATonSales 0.0056276 0.0025449 2.211 0.027012 *
SharePrice 0.0011216 0.0005290 2.120 0.033997 *
NWCtoTotalAssets 0.1869028 0.1000439 1.868 0.061733 .
NWTOTA -3.9752489 0.5491187 -7.239 4.51e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1770.97 on 3540 degrees of freedom


Residual deviance: 972.26 on 3521 degrees of freedom
AIC: 1012.3

Number of Fisher Scoring iterations: 11

The coefficients Cash Profit, PAT as % of net worth , Reserves and funds, Capital employed, contingent
liabilities, Current assets, WIP turnover, EPS , PE on BSE and Net worth to total assets is having negative
coefficients, that means these independent variables is having negative relationship with Dependant
variable and all other variables like borrowings, share holders funds, net fixed assets, debt equity ratio,
to average cost of sales per day, PAT on sales, share price and net working capital to total assets is
having positive relationship with dependant variable.

# Looking for multicollinearity


> vif(Model_3)
`Total assets` `Cash profit`
154.525600 1.702564
`PAT as % of net worth` `Reserves and funds`
1.213821 3.596094
Borrowings `Shareholders funds`
75.637658 7.810375
`Capital employed` `Contingent liabilities`
310.403312 1.901567
`Net fixed assets` `Current assets`
7.692180 7.788648
`Debt to equity ratio (times)` `Cash to average cost of sales per day`
1.392113 1.115997
`WIP turnover` EPS
1.052742 5.558822
`PE on BSE` PATonSales
1.048170 1.012169
SharePrice NWCtoTotalAssets
5.472727 1.039845
NWTOTA
1.498815

The VIF values are high for Total Assets and Capital employed. Hence there is multicollinearity between
the significant variables.

# Prediction by using Model_3 and checking of Model performance measures.

# Prediction on Training Data Set.

> Predict.Train<-predict.glm(Model_3,newdata = NTrain,type = "response")

> table.train<-table(NTrain$Default,Predict.Train>0.5)

> table.train

FALSE TRUE
0 3262 36
130 113

> sum(diag(table.train))/nrow(NTrain)

[1] 0.9531206
> confusionMatrix(glm_pred_train, as.factor(NTrain$Default), positive = "1")
Confusion Matrix and Statistics

Reference
Prediction 0 1
0 3262 130
1 36 113

Accuracy : 0.9531
95% CI : (0.9456, 0.9598)
No Information Rate : 0.9314
P-Value [Acc > NIR] : 4.107e-08

Kappa : 0.5532

Mcnemar's Test P-Value : 5.268e-13

Sensitivity : 0.46502
Specificity : 0.98908
Pos Pred Value : 0.75839
Neg Pred Value : 0.96167
Prevalence : 0.06862
Detection Rate : 0.03191
Detection Prevalence : 0.04208
Balanced Accuracy : 0.72705

'Positive' Class : 1

The Accuracy of the Model is 95.31% on Train Data.

ROC curve for the Training Data

> train_roc<-roc(NTrain$Default,Predict.Train)
Setting levels: control = 0, case = 1
Setting direction: controls < cases

> train_roc

Call:
roc.default(response = NTrain$Default, predictor = Predict.Train)

Data: Predict.Train in 3298 controls (NTrain$Default 0) < 243 cases (NTrain$Default 1).
Area under the curve: 0.9218
> plot(train_roc)

1 .0
0 .8
S e n s itiv ity
0 .6
0 .4
0 .2
0 .0

1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2


Specificity

# Prediction on Validation/Test data and checking of Model validation


measurements.
> # Prediction for Test data

> Predict.test<-predict.glm(Model_3,newdata = NTest,type = "response")


> table.test<-table(NTest$Default,Predict.test>0.5)

> table.test

FALSE TRUE
0 648 13
1 20 34

> sum(diag(table.test))/nrow(NTest)

[1] 0.9538462

> glm_pred_test <- ifelse(Predict.test>0.5,1,0)


>
> glm_pred_test <- as.factor(glm_pred_test)
>
>
> confusionMatrix(glm_pred_test, as.factor(NTest$Default), positive = "1")
Confusion Matrix and Statistics

Reference
Prediction 0 1
0 648 20
1 13 34

Accuracy : 0.9538
95% CI : (0.9358, 0.968)
No Information Rate : 0.9245
P-Value [Acc > NIR] : 0.001024

Kappa : 0.6486

Mcnemar's Test P-Value : 0.296270

Sensitivity : 0.62963
Specificity : 0.98033
Pos Pred Value : 0.72340
Neg Pred Value : 0.97006
Prevalence : 0.07552
Detection Rate : 0.04755
Detection Prevalence : 0.06573
Balanced Accuracy : 0.80498

'Positive' Class : 1
The accuracy of the model performance on the Test/Validation data is 95.38%
> test_roc<-roc(NTest$Default,Predict.test)
Setting levels: control = 0, case = 1
Setting direction: controls < cases
>
> test_roc

Call:
roc.default(response = NTest$Default, predictor = Predict.test)

Data: Predict.test in 661 controls (NTest$Default 0) < 54 cases (NTest$Default 1).


Area under the curve: 0.9514

The Area under curve is 95.14% for Validation data and for training data is 92.18%

> plot(test_roc)
1.0
0.8
S ensitivity
0.6
0.4
0.2
0.0

1.0 0.5 0.0


Specificity

Sort the data in descending order based on probability of default and then
divide into 10 dociles based on probability & check how well the model has
performed

> NTrain$Pred<-predict(Model_3,NTrain,type="response")
> decile <- function(x)
+ {
+ deciles <- vector(length=10)
+ for (i in seq(0.1,1,.1))
+ {
+ deciles[i*10] <- quantile(x, i, na.rm=T)
+ }
+ return (
+ ifelse(x<deciles[1], 1,
+ ifelse(x<deciles[2], 2,
+ ifelse(x<deciles[3], 3,
+ ifelse(x<deciles[4], 4,
+ ifelse(x<deciles[5], 5,
+ ifelse(x<deciles[6], 6,
+ ifelse(x<deciles[7], 7,
+ ifelse(x<deciles[8], 8,
+ ifelse(x<deciles[9], 9, 10
+ ))))))))))
+ }
> NTrain$deciles <- decile(NTrain$Pred)

> tmp_DT = data.table(NTrain)

> rank <- tmp_DT[, list(cnt=length(Default),


+ cnt_resp=sum(Default==1),
+ cnt_non_resp=sum(Default==0)
+ ), by=deciles][order(-deciles)]
> rank$rrate <- round(rank$cnt_resp / rank$cnt,4);
> rank$cum_resp <- cumsum(rank$cnt_resp)
> rank$cum_non_resp <- cumsum(rank$cnt_non_resp)
> rank$cum_rel_resp <- round(rank$cum_resp / sum(rank$cnt_resp),4);
> rank$cum_rel_non_resp <- round(rank$cum_non_resp / sum(rank$cnt_non_resp),4);
> rank$ks <- abs(rank$cum_rel_resp - rank$cum_rel_non_resp) * 100;
> rank$rrate <- percent(rank$rrate)
> rank$cum_rel_resp <- percent(rank$cum_rel_resp)
> rank$cum_rel_non_resp <- percent(rank$cum_rel_non_resp)
> NTrainRank <- rank
> view(rank)

deciles cnt cnt_resp cnt_non_resp rrate cum_resp cum_non_resp cum_rel_resp cum_rel_non_resp ks

1 10 355 178 177 50.1% 178 177 73.2% 5.4% 67.88


2 9 354 26 328 7.3% 204 505 84.0% 15.3% 68.64
3 8 353 11 342 3.1% 215 847 88.5% 25.7% 62.80
4 7 355 13 342 3.7% 228 1189 93.8% 36.0% 57.78
5 6 354 5 349 1.4% 233 1538 95.9% 46.6% 49.25
6 5 354 3 351 0.8% 236 1889 97.1% 57.3% 39.84
7 4 353 2 351 0.6% 238 2240 97.9% 67.9% 30.02
8 3 355 4 351 1.1% 242 2591 99.6% 78.6% 21.03
9 2 354 1 353 0.3% 243 2944 100.0% 89.3% 10.73
10 1 354 0 354 0.0% 243 3298 100.0% 100.0% 0.00
The ranks of the deciles are seen above. The deciles are sorted in the descending
order.
The 10th decile has the maximum number of defaults in the form of cnt_resp

# Deciling of Test data

> NTest$pred = predict(Model_3, NTest, type="response")

> decile <- function(x)


+ {
+ deciles <- vector(length=10)
+ for (i in seq(0.1,1,.1))
+ {
+ deciles[i*10] <- quantile(x, i, na.rm=T)
+ }
+ return (
+ ifelse(x<deciles[1], 1,
+ ifelse(x<deciles[2], 2,
+ ifelse(x<deciles[3], 3,
+ ifelse(x<deciles[4], 4,
+ ifelse(x<deciles[5], 5,
+ ifelse(x<deciles[6], 6,
+ ifelse(x<deciles[7], 7,
+ ifelse(x<deciles[8], 8,
+ ifelse(x<deciles[9], 9, 10
+ ))))))))))
+ }

> NTest$deciles <- decile(NTest$pred)

> tmp_DT = data.table(NTest)

> rank <- tmp_DT[, list(cnt=length(`Default - 1`),


+ cnt_resp=sum(`Default - 1`==1),
+ cnt_non_resp=sum(`Default - 1`==0)
+ ), by=deciles][order(-deciles)]
> rank$rrate <- round(rank$cnt_resp / rank$cnt,4);
> rank$cum_resp <- cumsum(rank$cnt_resp)
> rank$cum_non_resp <- cumsum(rank$cnt_non_resp)
> rank$cum_rel_resp <- round(rank$cum_resp / sum(rank$cnt_resp),4);
> rank$cum_rel_non_resp <- round(rank$cum_non_resp / sum(rank$cnt_non_resp),4);
> rank$ks <- abs(rank$cum_rel_resp - rank$cum_rel_non_resp) * 100;
> rank$rrate <- percent(rank$rrate)
> rank$cum_rel_resp <- percent(rank$cum_rel_resp)
> rank$cum_rel_non_resp <- percent(rank$cum_rel_non_resp)

> NTestRank<-rank

> View(rank)

deciles cnt cnt_resp cnt_non_resp rrate cum_resp cum_non_resp cum_rel_resp cum_rel_non_resp ks

1 10 72 45 27 62.5% 45 27 83.3% 4.1% 79.25

2 9 71 4 67 5.6% 49 94 90.7% 14.2% 76.52

3 8 72 0 72 0.0% 49 166 90.7% 25.1% 65.63

4 7 71 3 68 4.2% 52 234 96.3% 35.4% 60.90

5 6 72 2 70 2.8% 54 304 100.0% 46.0% 54.01

6 5 71 0 71 0.0% 54 375 100.0% 56.7% 43.27

7 4 71 0 71 0.0% 54 446 100.0% 67.5% 32.53

8 3 72 0 72 0.0% 54 518 100.0% 78.4% 21.63

9 2 71 0 71 0.0% 54 589 100.0% 89.1% 10.89

10 1 72 0 72 0.0% 54 661 100.0% 100.0% 0.00


deciles cnt cnt_resp cnt_non_resp rrate cum_resp cum_non_resp cum_rel_resp cum_rel_non_resp ks

> # cut_p returns the cut internal for each observation


> cut_ptrain = with(NTrain,
+ cut(Pred, breaks = quantile(Pred, prob=seq(0,1,0.1)), include.lowest =
T))
> cut_ptest = with(NTest,
+ cut(pred, breaks = quantile(pred, prob=seq(0,1,0.1)), include.lowest =
T))
> levels(cut_ptrain)
[1] "[2.22e-16,3.19e-05]" "(3.19e-05,0.00164]" "(0.00164,0.00548]" "(0.00548,0.00995]"
[5] "(0.00995,0.0169]" "(0.0169,0.0261]" "(0.0261,0.0394]" "(0.0394,0.0619]"
[9] "(0.0619,0.13]" "(0.13,0.999]"
> levels(cut_ptest)
[1] "[2.22e-16,5.31e-06]" "(5.31e-06,0.000934]" "(0.000934,0.00426]" "(0.00426,0.0088]"
[5] "(0.0088,0.0147]" "(0.0147,0.0247]" "(0.0247,0.0381]" "(0.0381,0.0607]"
[9] "(0.0607,0.21]" "(0.21,1]"
> NTrain$rank = factor(cut_ptrain, labels = 1:10)
> NTest$rank = factor(cut_ptest, labels = 1:10)

The mean is taken for both the Training and Testing dataset to differentiate the
predicted
and observed values
> mean.obs.train = aggregate(Default ~ rank, data = NTrain, mean)
> mean.pred.train = aggregate(Pred ~ rank, data = NTrain, mean)
> mean.obs.val = aggregate( `Default - 1`~ rank, data = NTest, mean)
> mean.pred.val = aggregate(pred ~ rank, data = NTest, mean)
> par(mfrow=c(1,2))
> plot(mean.obs.train[,2], type="b", col="black", ylim=c(0,0.8), xlab="Decile",
ylab="Prob")
> lines(mean.pred.train[,2], type="b", col="red", lty=2)
> title(main="Training Sample")
> plot(mean.obs.val[,2], type="b", col="black", ylim=c(0,0.8), xlab="Decile", ylab="Prob")
> lines(mean.pred.val[,2], type="b", col="red", lty=2)
> title(main="Validation Sample")
0.8
Training Sample Validation Sample

0.8
0.6

0.6
Prob

Prob
0.4

0.4
0.2

0.2
0.0

0.0
2 4 6 8 10 2 4 6 8 10

Decile Decile

The plot shows that the model almost accurately predicted both the Training and Testing
dataset with an accuracy of almost 95%

You might also like