0% found this document useful (0 votes)
129 views6 pages

Credit Risk Analysis Capstone Project

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views6 pages

Credit Risk Analysis Capstone Project

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Credit Risk Analysis Capstone Project

1. Introduction
1.1 Project Overview
The objective of this project is to analyze credit risk using historical loan data from the
Lending Club. By leveraging SAS for data manipulation and SQL for data querying and
analysis, this project aims to predict loan default risk based on borrower attributes and loan
characteristics. The insights gained will assist financial institutions in making informed
decisions to manage credit risk effectively.

1.2 Dataset Description


The dataset used for this analysis is sourced from Kaggle's Lending Club Loan Data
repository. It includes detailed information on loan applications, borrower demographics,
loan terms, credit scores, and loan status (whether the loan was fully paid or charged off).
The dataset spans multiple years and consists of millions of records, providing a
comprehensive source of data for credit risk assessment.

2. Data Collection
2.1 Data Source
The Lending Club Loan Data was imported into SAS for data manipulation and
preprocessing. SQL queries were used to extract relevant subsets of data and perform initial
exploratory analysis.

/* Import CSV file into SAS dataset */

PROC IMPORT DATAFILE='C:\project\credit_risk.csv'

OUT=work.credit_risk_data

DBMS=CSV REPLACE;

RUN;

/* Sample SQL query to explore dataset */

PROC SQL;

SELECT COUNT(*) AS num_records


FROM work.credit_risk_data;

QUIT;

3. Exploratory Data Analysis (EDA)


3.1 Summary Statistics

Generate summary statistics using SAS procedures (PROC MEANS, PROC FREQ) and SQL
queries to understand the distribution of variables.

/* Summary statistics using SAS */

PROC MEANS DATA=work.credit_risk_data;

VAR loan_amnt int_rate annual_inc;

RUN;

/* SQL query for loan status distribution */

PROC SQL;

SELECT loan_status, COUNT(*) AS num_loans

FROM work.credit_risk_data

GROUP BY loan_status;

QUIT;

3.2 Data Visualizations

Create basic visualizations using SAS procedures (PROC SGPLOT, PROC SQL) to explore
relationships and distributions.

/* Example of histogram using PROC SGPLOT */

PROC SGPLOT DATA=work.credit_risk_data;

HISTOGRAM loan_amnt / GROUP=loan_status;


RUN;

/* Example of scatter plot using PROC SQL */

PROC SQL;

SELECT loan_amnt, int_rate

FROM work.credit_risk_data

WHERE loan_status = 'Charged Off';

QUIT;

4. Data Cleaning and Preprocessing


4.1 Handling Missing Values
Use SAS data steps and SQL queries to handle missing values and prepare the dataset for
analysis.

/* Example of handling missing values in SAS */

DATA work.credit_risk_cleaned;

SET work.credit_risk_data;

/* Replace missing values */

IF loan_amnt IS NULL THEN loan_amnt = 0;

RUN;

/* Example of handling missing values in SQL */

PROC SQL;

UPDATE work.credit_risk_cleaned

SET annual_inc = MEAN(annual_inc)

WHERE annual_inc IS NULL;

QUIT;
4.2 Feature Engineering
Select and transform features using SAS data steps and SQL queries based on domain
knowledge and initial EDA findings.

/* Example of feature selection and transformation */

DATA work.credit_risk_features;

SET work.credit_risk_cleaned (KEEP=loan_amnt int_rate annual_inc


dti loan_status);

/* Feature engineering steps */

RUN;

/* Example of feature transformation in SQL */

PROC SQL;

CREATE TABLE work.credit_risk_transformed AS

SELECT loan_amnt, int_rate, annual_inc, dti,

CASE WHEN loan_status = 'Charged Off' THEN 1 ELSE 0 END AS


loan_default

FROM work.credit_risk_cleaned;

QUIT;

5. Model Development
5.1 Model Selection

Choose a suitable modeling approach in SAS (PROC LOGISTIC, PROC GENMOD) based on
the project requirements and dataset characteristics.

/* Example of logistic regression model in SAS */

PROC LOGISTIC DATA=work.credit_risk_features;


MODEL loan_status (EVENT='Charged Off') = loan_amnt int_rate
annual_inc dti;

RUN;

5.2 Model Training and Evaluation

Train the selected model and evaluate its performance using SAS procedures (PROC
LOGISTIC, PROC SCORE) and SQL queries.

/* Example of model training and evaluation */

PROC LOGISTIC DATA=work.credit_risk_features;

MODEL loan_status (EVENT='Charged Off') = loan_amnt int_rate


annual_inc dti;

SCORE DATA=work.credit_risk_features
OUT=work.credit_risk_predictions;

RUN;

/* Example of model evaluation in SAS */

PROC FREQ DATA=work.credit_risk_predictions;

TABLES _RESPONSE_ loan_status;

RUN;

6. Documentation and Reporting


6.1 Summary of Findings
Summarize key findings and insights from the credit risk analysis using SAS and SQL.

/* Example of documenting findings */

DATA _NULL_;

FILE 'credit_risk_analysis_summary.txt';
PUT "Summary of Findings:";

PUT "---------------------";

PUT "The logistic regression model predicts loan default with an


accuracy of X%.";

RUN;

6.2 Recommendations
Provide actionable recommendations based on the analysis results to optimize credit risk
management strategies.

/* Example of providing recommendations */

DATA _NULL_;

FILE 'credit_risk_analysis_recommendations.txt';

PUT "Recommendations:";

PUT "-----------------";

PUT "1. Implement dynamic risk assessment models based on real-


time data feeds.";

PUT "2. Enhance borrower education programs to improve financial


literacy.";

RUN;

You might also like