Sapient Problem Statement
Sapient Problem Statement
Financing or loans are essential for any community, today Banks are desperate
to provide loans to individuals having good tracking financial/credit history. But
many people are not part of this system, they don't have trackable financial
history and thus Banks are reluctant to offer loans to them. And these people are
often exploited by local untrustworthy lenders
Data is broken into two files for training (with TARGET) and testing (without
TARGET).
Static data for all applications. One row represents one loan in data sample.
Data Set Download Data Set
File Name Description Format
application_train.csv Train Data Set csv
Sample
sample_submission.csv csv
Submission
application_test.csv Test Data Set csv
Data Dictionary
Here's a brief version of what you'll find in the data description file.
Variable Description
SK_ID_CURR applicant_id
probability of applicant paying
TARGET
back
Submission
Data pre-processing:
1. Propose an end to end design from collecting uses' data to processing and
the finally accessing output in real time fashion.
2. This application is expected to process huge number of applications in
real time, hence think about scalability.
3. This application is also backbone of entire loan process, hence it's very
important to keep it up all time and think about fault-tolerance.
Expected output/submission:
Evaluation Metric
1. Your output file will be evaluated on area under the ROC curve between
the predicted probability and the observed target [45 % weightage]
2. 20 % marks will be awarded on code quality
3. 30 % marks will be awarded for feature engineering and EDA
4. 5% bonus marks will be awarded if problem is solved using any big data
technology (like pyspark)