FRA Project Business Report
FRA Project Business Report
> traindata<-read_excel("raw-data.xlsx")
> testdata<-read_excel("validation_data.xlsx")
> dim(traindata)
[1] 3541 52
> names(traindata)
> traindata<-traindata[,-22]
> View(traindata)
> Newtraindata<-traindata
sum(is.na(Newtraindata))
[1] 10007
> sum(is.na(testdata))
[1] 2593
There are 10007 & 2593 missing value cells in train and test data respectively.
> plot_intro(Newtraindata)
Dimension
Metrics
a column
All Missing Columns 0.0%
a observation
a row
> plot_intro(Newtestdata)
Memory Usage: 498.9 Kb
Dimension
Metrics
a column
All Missing Columns 0.0%
a observation
a row
There are 5.4% & 5.2% missing observations in Train and Test data respectively.
# Conversion Character variables into Number Variables and treatment of missing values (Imputing of
missing values by median)
> plot_intro(Newtraindata)
Discrete Columns 0%
Dimension
Metrics
a column
All Missing Columns 0%
a observation
a row
Missing Observations 0%
Discrete Columns 0%
Dimension
Metrics
a column
All Missing Columns 0%
a observation
a row
Missing Observations 0%
# Treatment of Outliers
The outliers in the dataset are treated by replacing the observations lesser than the 1st percentile with
value of the 1st percentile and the observations more than the 99th percentile with the value of the
99th percentile. This outlier treatment is done for every column in the dataset.
The quantile function identifies the observations less than 1st percentile and more than the 99 th
percentile. The squish function replaces the values of these identified outliers with the value of the 1st
percentile and the 99th percentile.
> NTrain<-Newtraindata[,-c(1,2)]
> NTest<-Newtestdata[,-1]
Univariate & bivariate analysis
> dim(NTrain)
[1] 3541 50
head(NTrain)
Total assets Net worth Total income Change in stock Total expenses Profit after tax
PBDITA
1 17512.3 7093.2 24965.2 235.8 23657.8 1543.2
2860.2
2 941.0 351.5 1527.4 42.7 1454.9 115.2
283.0
3 232.8 100.6 477.3 -5.2 478.7 -6.6
5.8
4 2.7 2.7 444.9 1.6 407.7 8.8
35.4
5 478.5 107.6 1580.5 -17.0 1558.0 5.5
31.0
6 2434.4 675.8 2648.6 62.3 2636.4 74.5
200.1
PBT Cash profit PBDITA as % of total income PBT as % of total income
1 2417.2 1872.80 11.46 9.68
2 188.4 158.60 18.53 12.33
3 -6.6 0.30 1.22 -1.38
4 12.4 18.85 0.00 0.00
5 6.3 11.90 1.96 0.40
6 74.5 146.90 7.55 2.81
PAT as % of total income Cash profit as % of total income PAT as % of net worth Sales
1 6.18 7.50 23.78 24458.0
2 7.54 10.38 38.08 1504.3
3 -1.38 0.06 -6.35 475.6
4 0.00 0.00 0.00 453.1
5 0.35 0.75 5.25 1575.1
6 2.81 5.55 21.78 2639.5
Income from financial services Other income Total capital Reserves and funds Borrowings
1 158.0 286.16 423.8 6822.8 14.9
2 4.0 15.90 115.5 257.8 272.5
3 1.5 0.20 81.4 19.2 35.4
4 1.8 1.40 0.5 2.2 99.2
5 3.9 0.90 6.2 161.8 193.1
6 6.4 0.20 33.8 972.0 717.1
Current liabilities & provisions Deferred tax liability Shareholders funds
1 9965.9 284.9 7093.2
2 210.0 85.2 351.5
3 96.8 13.4 100.6
4 69.4 13.4 2.7
5 112.8 4.6 107.6
6 555.9 54.4 698.2
Shares outstanding Equity face value EPS Adjusted EPS Total liabilities PE on BSE
Default
1 42381675 10 35.52 7.10 17512.3 27.31
0
2 11550000 10 9.97 9.97 941.0 8.17
0
3 8149090 10 -0.50 -0.50 232.8 -5.76
0
4 52404 10 0.00 0.00 2.7 9.10
0
5 619635 10 7.91 7.91 478.5 9.10
0
6 1141718 10 30.57 15.28 2434.4 9.10
0
tail(NTrain)
Total assets Net worth Total income Change in stock Total expenses Profit after tax
PBDITA
3536 17.8 1.2 15.5 -1.2 14.2 0.1
1.8
3537 450.5 172.3 565.0 30.5 581.1 14.4
76.7
3538 97.6 82.0 75.8 -4.0 66.5 5.3
11.1
3539 902.9 209.1 1005.1 5.6 966.5 44.2
120.3
3540 177.0 137.2 371.0 3.9 348.9 26.0
50.5
3541 1.7 0.3 444.9 1.6 17.4 -17.4 -
17.4
PBT Cash profit PBDITA as % of total income PBT as % of total income
3536 0.2 0.5 11.61 1.29
3537 41.1 48.4 13.58 7.27
3538 6.2 9.2 14.64 8.18
3539 70.0 62.6 11.97 6.96
3540 40.8 33.6 13.61 11.00
3541 -17.4 -17.4 9.66 3.31
PAT as % of total income Cash profit as % of total income PAT as % of net worth Sales
3536 0.65 3.23 8.700 14.3
3537 2.55 8.57 8.710 564.5
3538 6.99 12.14 6.680 73.9
3539 4.40 6.23 22.770 995.9
3540 7.01 9.06 20.300 365.8
3541 2.34 5.64 -138.524 453.1
Income from financial services Other income Total capital Reserves and funds
Borrowings
3536 1.8 1.2 1.0 0.2
14.5
3537 0.5 1.4 89.0 85.5
190.2
3538 1.7 1.4 38.6 48.4
3.0
3539 2.6 0.3 30.0 179.1
305.0
3540 3.3 1.6 50.9 86.3
1.3
3541 1.8 1.4 28.3 -28.0
99.2
Current liabilities & provisions Deferred tax liability Shareholders funds
3536 2.1 13.4 1.2
3537 42.5 36.8 172.3
3538 7.6 13.4 87.0
3539 363.4 25.4 209.1
3540 21.1 17.4 137.2
3541 0.3 13.4 0.3
Cumulative retained profits Capital employed TOL/TNW
3536 0.2 15.7 13.83
3537 76.8 362.5 1.30
3538 36.6 90.0 0.12
3539 179.1 514.1 2.45
3540 77.1 138.5 0.10
3541 -28.0 1.1 1.00
Total term liabilities / tangible net worth Contingent liabilities / Net worth (%)
3536 4.83 0.00
3537 0.72 0.00
3538 0.02 5.12
3539 0.68 93.45
3540 0.01 6.20
3541 0.00 0.00
Contingent liabilities Net fixed assets Investments Current assets Net working capital
3536 38.0 5.7 0.10 6.4 -4.4
3537 38.0 227.0 8.35 187.0 78.3
3538 4.2 21.9 6.80 55.8 47.2
3539 195.4 217.7 17.50 477.5 -49.5
3540 8.5 73.5 8.35 80.8 59.7
3541 38.0 93.5 8.35 0.6 0.3
Quick ratio (times) Current ratio (times) Debt to equity ratio (times)
3536 0.46 0.59 12.08
3537 0.41 1.71 1.10
3538 4.58 6.49 0.10
3539 0.59 0.91 1.46
3540 2.83 3.83 0.01
3541 2.00 2.00 0.00
Cash to current liabilities (times) Cash to average cost of sales per day
3536 0.07 20.71
3537 0.07 5.67
3538 3.88 177.71
3539 0.05 11.05
3540 1.35 29.93
3541 2.00 1277.50
Creditors turnover Debtors turnover Finished goods turnover WIP turnover
3536 5.81 3.67 8.33 7.52
3537 15.65 20.64 8.66 5.14
3538 10.07 14.21 5.13 4.17
3539 3.96 3.76 33.03 11.68
3540 25.00 13.75 49.00 47.03
3541 0.00 0.00 17.27 9.76
Raw material turnover Shares outstanding Equity face value EPS Adjusted EPS
3536 10.92 4672063 10 0.00 0.00
3537 19.47 14904213 10 0.97 0.97
3538 4.83 3362800 10 1.61 1.61
3539 4.63 3000000 10 13.10 13.10
3540 17.42 4422346 10 6.06 6.06
3541 0.00 5220000 10 -0.02 -0.02
Total liabilities PE on BSE Default
3536 17.8 9.10 0
3537 450.5 9.10 0
3538 97.6 2.49 0
3539 902.9 12.62 0
3540 177.0 4.07 0
3541 1.7 9.10 1
str(NTrain)
The data is dada frame with 3541 observations & 50 variables and all variables are number type.
> plot_str(NTrain)
> plot_intro(NTrain)
Discrete Columns 0%
Dimension
Metrics
a column
All Missing Columns 0%
a observation
a row
Missing Observations 0%
> plot_histogram(NTrain)
0 0 0 0
0 5000 10000 15000 20000 -150 -100 -50 0 50 100 -200 -150 -100 -50 0 50 0 2000 4000 6000
0 0 0 0
0 5000 10000 0 5000 10000 15000 0 3000 6000 9000 0 500 1000 1500 2000
0 0 0 0
0 5 10 15 20 0 10 20 30 0 50 100 150 0 250 500 750
PE.on.BSE
2000
Frequency
1000
(0.8,1]
[0,0.2]
0 2000 4000 -200 0 200 400 600 0 5000 100001500020000 -200 -150 -100 -50 0 50
(0.8,1]
Default
[0,0.2]
(0.8,1]
[0,0.2]
(0.8,1]
[0,0.2]
0 5000 10000 15000 -100 -50 0 50 0 5000 10000 0 3000 6000 9000
(0.8,1]
Default
[0,0.2]
0 500 1000 1500 2000 0 200 400 600 800 0 100 200 300
-150 -100 -50 0 50 100
(0.8,1]
[0,0.2]
0 5000 10000 15000 0 10000 20000 30000 40000 0 5000 10000 15000 20000 0 1000 2000 3000
value
Page 2
Capital.employed Contingent.liabilities Contingent.liabilities...Net.worth.... Current.assets
(0.8,1]
[0,0.2]
0 10000 20000 30000 0 2000 4000 6000 0 200 400 600 800 0 5000 10000 15000
(0.8,1]
Default
[0,0.2]
(0.8,1]
[0,0.2]
Adjusted.EPS Cash.to.average.cost.of.sales.per.day
Cash.to.current.liabilities..times. Creditors.turnover
(0.8,1]
[0,0.2]
(0.8,1]
Default
[0,0.2]
0 50 100 150 0 250 500 750 0 50 100 150 0 250 500 750
(0.8,1]
[0,0.2]
0 25 50 75 100
0e+00 1e+08 2e+08 0 1000020000300004000050000 0 50 100 150 200
value
Page 4
PE.on.BSE
(0.8,1]
D e fa u lt
[0,0.2]
> plot_correlation(NTrain)
From the correlation matrix, it is understood that the dark red cells are positively correlated and dark
blue cells are negatively correlated.
New Variables Creation (One ration for profitability, leverage,
liquidity and company's size each )
Profitability Ratio and Share Price
Liquidity Ratio
Leverage Ratios
> summary(Model_1)
Call:
glm(formula = Default ~ ., family = binomial, data = NTrain)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.8529 -0.2604 -0.1358 -0.0236 4.0197
There are only few items are significant and remaining is not significant.
> summary(Model_2)
Call:
glm(formula = Default ~ `Total assets` + `Net worth` + `Profit after tax` +
PBDITA + `Cash profit` + `PAT as % of net worth` + `Other income` +
`Reserves and funds` + Borrowings + `Shareholders funds` +
`Capital employed` + `Total term liabilities / tangible net worth` +
`Contingent liabilities` + `Net fixed assets` + `Current assets` +
`Debt to equity ratio (times)` + `Cash to current liabilities (times)` +
`Cash to average cost of sales per day` + `WIP turnover` +
EPS + `PE on BSE` + PATonSales + SharePrice + NWCtoTotalAssets +
NWTOTA, family = binomial, data = NTrain)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.8988 -0.2745 -0.1524 -0.0477 3.8622
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2342163 0.2175334 -5.674 1.40e-08 ***
`Total assets` 0.0019376 0.0005403 3.586 0.000336 ***
`Net worth` 0.0012991 0.0008780 1.480 0.138990
`Profit after tax` 0.0020926 0.0014730 1.421 0.155431
PBDITA -0.0017381 0.0011454 -1.517 0.129155
`Cash profit` -0.0065740 0.0019866 -3.309 0.000936 ***
`PAT as % of net worth` -0.0243717 0.0025940 -9.396 < 2e-16 ***
`Other income` 0.0107008 0.0069793 1.533 0.125220
`Reserves and funds` -0.0031307 0.0006937 -4.513 6.39e-06 ***
Borrowings 0.0030806 0.0007974 3.864 0.000112 ***
`Shareholders funds` 0.0024006 0.0009051 2.652 0.007994 **
`Capital employed` -0.0050348 0.0011137 -4.521 6.16e-06 ***
`Total term liabilities / tangible net worth` -0.0570652 0.0370392 -1.541 0.123397
`Contingent liabilities` -0.0015924 0.0006586 -2.418 0.015606 *
`Net fixed assets` 0.0006934 0.0002372 2.923 0.003468 **
`Current assets` -0.0019319 0.0005077 -3.805 0.000142 ***
`Debt to equity ratio (times)` 0.1022856 0.0305061 3.353 0.000800 ***
`Cash to current liabilities (times)` 0.1596880 0.1455585 1.097 0.272611
`Cash to average cost of sales per day` 0.0012400 0.0004379 2.831 0.004635 **
`WIP turnover` -0.0078436 0.0037295 -2.103 0.035454 *
EPS -0.0127152 0.0062313 -2.041 0.041296 *
`PE on BSE` -0.0148210 0.0061842 -2.397 0.016549 *
PATonSales 0.0054742 0.0027123 2.018 0.043563 *
SharePrice 0.0011132 0.0005237 2.126 0.033524 *
NWCtoTotalAssets 0.1954791 0.0998926 1.957 0.050360 .
NWTOTA -4.1417719 0.5749713 -7.203 5.87e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Model-2 also showing some variables are not significant and AIC improved from 1034.3 to 1016.2
> summary(Model_3)
Call:
glm(formula = Default ~ `Total assets` + `Cash profit` + `PAT as % of net worth` +
`Reserves and funds` + Borrowings + `Shareholders funds` +
`Capital employed` + +`Contingent liabilities` + `Net fixed assets` +
`Current assets` + `Debt to equity ratio (times)` + `Cash to average cost of sales per
day` +
`WIP turnover` + EPS + `PE on BSE` + PATonSales + SharePrice +
NWCtoTotalAssets + NWTOTA, family = binomial, data = NTrain)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.9297 -0.2788 -0.1537 -0.0458 3.9879
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2089980 0.2142856 -5.642 1.68e-08 ***
`Total assets` 0.0021031 0.0005607 3.751 0.000176 ***
`Cash profit` -0.0059508 0.0019384 -3.070 0.002140 **
`PAT as % of net worth` -0.0239192 0.0025650 -9.325 < 2e-16 ***
`Reserves and funds` -0.0031037 0.0006860 -4.524 6.06e-06 ***
Borrowings 0.0026913 0.0007790 3.455 0.000551 ***
`Shareholders funds` 0.0030583 0.0006930 4.413 1.02e-05 ***
`Capital employed` -0.0048281 0.0011405 -4.233 2.30e-05 ***
`Contingent liabilities` -0.0016599 0.0006970 -2.382 0.017236 *
`Net fixed assets` 0.0005711 0.0002703 2.113 0.034599 *
`Current assets` -0.0021453 0.0005115 -4.194 2.73e-05 ***
`Debt to equity ratio (times)` 0.0596168 0.0132579 4.497 6.90e-06 ***
`Cash to average cost of sales per day` 0.0014287 0.0003903 3.660 0.000252 ***
`WIP turnover` -0.0077262 0.0037417 -2.065 0.038934 *
EPS -0.0132850 0.0064645 -2.055 0.039874 *
`PE on BSE` -0.0143776 0.0062189 -2.312 0.020783 *
PATonSales 0.0056276 0.0025449 2.211 0.027012 *
SharePrice 0.0011216 0.0005290 2.120 0.033997 *
NWCtoTotalAssets 0.1869028 0.1000439 1.868 0.061733 .
NWTOTA -3.9752489 0.5491187 -7.239 4.51e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The coefficients Cash Profit, PAT as % of net worth , Reserves and funds, Capital employed, contingent
liabilities, Current assets, WIP turnover, EPS , PE on BSE and Net worth to total assets is having negative
coefficients, that means these independent variables is having negative relationship with Dependant
variable and all other variables like borrowings, share holders funds, net fixed assets, debt equity ratio,
to average cost of sales per day, PAT on sales, share price and net working capital to total assets is
having positive relationship with dependant variable.
The VIF values are high for Total Assets and Capital employed. Hence there is multicollinearity between
the significant variables.
> table.train<-table(NTrain$Default,Predict.Train>0.5)
> table.train
FALSE TRUE
0 3262 36
130 113
> sum(diag(table.train))/nrow(NTrain)
[1] 0.9531206
> confusionMatrix(glm_pred_train, as.factor(NTrain$Default), positive = "1")
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 3262 130
1 36 113
Accuracy : 0.9531
95% CI : (0.9456, 0.9598)
No Information Rate : 0.9314
P-Value [Acc > NIR] : 4.107e-08
Kappa : 0.5532
Sensitivity : 0.46502
Specificity : 0.98908
Pos Pred Value : 0.75839
Neg Pred Value : 0.96167
Prevalence : 0.06862
Detection Rate : 0.03191
Detection Prevalence : 0.04208
Balanced Accuracy : 0.72705
'Positive' Class : 1
> train_roc<-roc(NTrain$Default,Predict.Train)
Setting levels: control = 0, case = 1
Setting direction: controls < cases
> train_roc
Call:
roc.default(response = NTrain$Default, predictor = Predict.Train)
Data: Predict.Train in 3298 controls (NTrain$Default 0) < 243 cases (NTrain$Default 1).
Area under the curve: 0.9218
> plot(train_roc)
1 .0
0 .8
S e n s itiv ity
0 .6
0 .4
0 .2
0 .0
> table.test
FALSE TRUE
0 648 13
1 20 34
> sum(diag(table.test))/nrow(NTest)
[1] 0.9538462
Reference
Prediction 0 1
0 648 20
1 13 34
Accuracy : 0.9538
95% CI : (0.9358, 0.968)
No Information Rate : 0.9245
P-Value [Acc > NIR] : 0.001024
Kappa : 0.6486
Sensitivity : 0.62963
Specificity : 0.98033
Pos Pred Value : 0.72340
Neg Pred Value : 0.97006
Prevalence : 0.07552
Detection Rate : 0.04755
Detection Prevalence : 0.06573
Balanced Accuracy : 0.80498
'Positive' Class : 1
The accuracy of the model performance on the Test/Validation data is 95.38%
> test_roc<-roc(NTest$Default,Predict.test)
Setting levels: control = 0, case = 1
Setting direction: controls < cases
>
> test_roc
Call:
roc.default(response = NTest$Default, predictor = Predict.test)
The Area under curve is 95.14% for Validation data and for training data is 92.18%
> plot(test_roc)
1.0
0.8
S ensitivity
0.6
0.4
0.2
0.0
Sort the data in descending order based on probability of default and then
divide into 10 dociles based on probability & check how well the model has
performed
> NTrain$Pred<-predict(Model_3,NTrain,type="response")
> decile <- function(x)
+ {
+ deciles <- vector(length=10)
+ for (i in seq(0.1,1,.1))
+ {
+ deciles[i*10] <- quantile(x, i, na.rm=T)
+ }
+ return (
+ ifelse(x<deciles[1], 1,
+ ifelse(x<deciles[2], 2,
+ ifelse(x<deciles[3], 3,
+ ifelse(x<deciles[4], 4,
+ ifelse(x<deciles[5], 5,
+ ifelse(x<deciles[6], 6,
+ ifelse(x<deciles[7], 7,
+ ifelse(x<deciles[8], 8,
+ ifelse(x<deciles[9], 9, 10
+ ))))))))))
+ }
> NTrain$deciles <- decile(NTrain$Pred)
> NTestRank<-rank
> View(rank)
The mean is taken for both the Training and Testing dataset to differentiate the
predicted
and observed values
> mean.obs.train = aggregate(Default ~ rank, data = NTrain, mean)
> mean.pred.train = aggregate(Pred ~ rank, data = NTrain, mean)
> mean.obs.val = aggregate( `Default - 1`~ rank, data = NTest, mean)
> mean.pred.val = aggregate(pred ~ rank, data = NTest, mean)
> par(mfrow=c(1,2))
> plot(mean.obs.train[,2], type="b", col="black", ylim=c(0,0.8), xlab="Decile",
ylab="Prob")
> lines(mean.pred.train[,2], type="b", col="red", lty=2)
> title(main="Training Sample")
> plot(mean.obs.val[,2], type="b", col="black", ylim=c(0,0.8), xlab="Decile", ylab="Prob")
> lines(mean.pred.val[,2], type="b", col="red", lty=2)
> title(main="Validation Sample")
0.8
Training Sample Validation Sample
0.8
0.6
0.6
Prob
Prob
0.4
0.4
0.2
0.2
0.0
0.0
2 4 6 8 10 2 4 6 8 10
Decile Decile
The plot shows that the model almost accurately predicted both the Training and Testing
dataset with an accuracy of almost 95%