0% found this document useful (0 votes)
49 views73 pages

No-Code ML With DataRobot

The document discusses the applications of machine learning in finance, highlighting areas such as robo-advisors, fraud detection, and high-frequency trading. It also presents a project overview for predicting loan risk using a machine learning model, detailing inputs and outputs. Additionally, it provides a guide for using the DataRobot platform for building and assessing machine learning models.

Uploaded by

gs23133
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views73 pages

No-Code ML With DataRobot

The document discusses the applications of machine learning in finance, highlighting areas such as robo-advisors, fraud detection, and high-frequency trading. It also presents a project overview for predicting loan risk using a machine learning model, detailing inputs and outputs. Additionally, it provides a guide for using the DataRobot platform for building and assessing machine learning models.

Uploaded by

gs23133
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 73

NO-CODE

MACHINE
LEARNING
WITH
DATAROBOT
INTRODUCTION
TO MACHINE
LEARNING IN
FINANCE/BANKIN
G
MACHINE LEARNING APPLICATIONS IN FINANCE
“Artificial intelligence & Machine Learning are to trading what fire was to the cavemen”!

1. 1. ROBO ADVISORS
• Robo advisors are AI/ML-based financial advisors that consume client financial status, target return, risk appetite and goals to
generate financial advice (Examples: Betterment and Wealthfront).

1. 2. ROBOTIC PROCESS AUTOMATION


• ML automates processes to reduce time/cost such as paperwork automation, chat bots, and customer service automation.
• Natural language processing techniques are used to extract data from documents, perform topic modeling, understand
customer’s sentiment from tweets, respond to customer’s queries and process legal documents.

1. 3. FRAUD DETECTION & SECURITY


• AI/ML can automatically detect fraudulent transactions, detect suspicious transactions on customer's account.

4. HIGH FREQUENCY TRADING


• AI/ML analyzes historical stock prices data, extracts news sentiment and predict future price movements of securities.
• Algorithmic trading models are trained to execute thousands of buy/sell orders at a fraction of a second (high frequency trading).

5. INSURANCE AND LOANS UNDERWRITING


• Insurance companies and banks rely on machine learning to assess client risk and decides whether a client qualifies for a loan or
insurance policy based on his/her features such as historical data, credit score, income, and debt amount.
SUCCESS STORIES

• “JP Morgan invests $11.5 billion a year in new data driven technologies. The
company’s machine learning-based Contract Intelligence (COiN) platform reviews
12,000 annual commercial loan agreements in few hours, as opposed to the
360,000 man-hours it would take to do so manually.”

• “Banking institutions prevented $22 billion worth of fraudulent transactions in


2018 with AI/ML”

• “Bank of America introduced Erica chatbot that served 6 million users as of March
2019. Erica provide clients relevant, timely guidance and help make managing
their finances easier.”

• Electronic trades account for almost 45% of revenues in cash equities trading”
U.K. research firm Coalition Report.

ource: https://medium.com/eleks-labs/4-powerful-use-cases-for-data-science-in-finance-35d50075ff80
ource: https://emerj.com/ai-sector-overviews/ai-in-banking-analysis/
READING TIME & QUIZ: ML APPLICATIONS IN
FINANCE
• Please read the article below and answer the following quiz.
15 MINS
o Link to Article: https://algorithmxlab.com/blog/applications-
machine-learning-finance/

10 MINS
PROJECT
OVERVIEW:
BANK LOAN
RISK
PREDICTION
PROJECT OVERVIEW & BUSINESS
•CASE
The objective of this case study is to develop a machine learning model
(classifier model) to predict whether a loan is good or bad loan given the
customer (borrower) features such as employment length, loan term, interest
rates, and debt to income ratio.

INPUTS

Employment Length
Home Ownership
OUTPUT
Annual Income
Loan Amount
MACHINE LEARNING
Loan Term MODEL Good Loan
Purpose of loan Bad Loan
Interest rate (CLASSIFIER)
Interest payments
Debt to Income
Ratio
MINI CHALLENGE

• Read this article by McKinsey & Company and answer the following
questions:
https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insigh
ts/the-executives-ai-playbook?page=industries/

1. How much annual value does McKinsey & Company predict AI could
potentially bring in the banking industry?
2. How much of this value is attributed to traditional vs. advanced AI?
3. What about “risk” category?
4. What are the use cases for the fraud and debt analytics?
5. What type of classifiers can be used?
MINI CHALLENGE SOLUTION

• Read this article by McKinsey & Company and answer the following questions:
https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/th
e-executives-ai-playbook?page=industries/

1. How much annual value does McKinsey & Company predict AI could potentially
bring in the banking industry? 1.0T
2. How much of this value is attributed to traditional vs. advanced AI? 660.9B vs.
361.5B
3. What about “risk” category? 372.9B
4. What are the use cases for the fraud and debt analytics?
• Detect money laundering schemes
• Detect fraud
• Detect and prevent future defaults
• Credit underwriting
• Improve debt collection strategies
5. What type of classifiers can be used?
• Support vector machines, logistic and linear.
DATAROBOT
DEMO – PART
#1: SIGNUP
AND TRAINING
DATA UPLOAD
DATAROBOT
• DataRobot is the leading end-to-end enterprise ai/ml platform that automates
the process of building, training and deploying ai/ml models at scale.

GO TO LINK: HTTPS://WWW.DATAROBOT.COM/ AND CLICK ON FREE TRIAL


DATAROBOT

CLICK ON FREE TRIAL


DATAROBOT

NOW SIGN UP TO DATAROBOT


DATAROBOT
DATAROBOT
DATAROBOT
AFTER YOU’VE SIGNED UP, CLICK ON
ML DEVELOPMENT TILE
DATAROBOT
CLICK ON LOCAL FILE TO UPLOAD THE DATA,
IT MIGHT TAKE 2-3 MINUTES
DATAROBOT
IF YOU SCROLL DOWN, YOU’LL SEE “DATA QUALITY
ASSESSMENT” WARNING. SINCE IT’S NOT SIGNIFICANT
AT THIS TIME (MOSTLY OUTLIERS), WE WILL IGNORE IT.
DATAROBOT
DEMO – PART
#2: TARGET
SELECTION AND
EXPLORATORY
DATA ANALYSIS
DATAROBOT
ONCE YOU’VE UPLOADED YOUR LOCAL FILE, DATAROBOT
WILL AUTOMATICALLY PERFORM EXPLORATORY DATA
ANALYSIS AS SHOWN ON THE RIGHT HAND SIDE
DATAROBOT
PLEASE SELECT THE TARGET COLUMN. TO SELECT THE TARGET, MOVE THE CURSOR NEAR THE
REQUIRED COLUMN AND CLICK ON ‘MAKE AS TARGET’. IN OUR PROJECT, WE WILL CHOOSE
“LOAN CONDITION” AS THE TARGET.
YOU CAN FURTHER EXPLORE EACH FEATURE BY CLICKING ON EACH COLUMN NAME.
DATAROBOT
YOU CAN EXPLORE VARIOUS FEATURES BY CLICKING ON THE
NAME OF THE FEATURE. LET’S EXLORE “LOAN CONDITION”
WHICH IS OUR TARGET VARIABLE.
DATAROBOT
YOU CAN EXPLORE VARIOUS FEATURES BY CLICKING ON THE
NAME OF THE FEATURE. LET’S EXLORE ANNUAL INCOME.
DATAROBOT
YOU CAN EXPLORE VARIOUS FEATURES BY CLICKING ON THE
NAME OF THE FEATURE. LET’S EXPLORE HOME OWNERSHIP.
DATAROBOT
YOU CAN EXPLORE VARIOUS FEATURES BY CLICKING ON THE
NAME OF THE FEATURE. LET’S EXPLORE LOAN AMOUNT.
DATAROBOT
YOU CAN EXPLORE VARIOUS FEATURES BY CLICKING ON THE
NAME OF THE FEATURE. LET’S EXPLORE LOAN “PURPOSE”.
DATAROBOT
YOU CAN EXPLORE VARIOUS FEATURES BY CLICKING ON THE
NAME OF THE FEATURE. LET’S EXPLORE LOAN “INTEREST RATES”.
DATAROBOT
AFTER EXPLORING THE FEATURES, BELOW THE START
BUTTON, YOU CAN FIND “SHOW ADVANCED OPTIONS”
WHERE YOU CAN SPECIFY THE NUMBER OF FOLDS FOR
VALIDATION AND CHOOSE THE HOLDOUT PERCENTAGE.
DATAROBOT
LET’S CHOOSE 70% FOR TRAINING, 15% FOR VALIDATION,
AND 15% FOR TESTING “HOLDOUT”. UNDER ADDITIONAL
MAKE SURE TO USE AUC AS THE METRIC FOR TRAINING.
DATAROBOT: DIVIDE DATA INTO TRAINING AND TESTING
• Data set is generally divided into 80% for training and 20% for testing.
• Sometimes, we might include validation dataset as well and then we divide
it into 60%, 20%, 20% segments for training, validation, and testing,
respectively (numbers may vary).
1. Training set: used for gradient calculation and weight update.
2. Validation set: used for cross-validation to assess training quality as
training proceeds. Cross-validation is implemented to overcome over-
fitting which occurs when algorithm focuses on training set details at
cost of losing generalization ability.
3. Testing set (Holdout dataset): used for testing final trained model.

TRAINING
DATASET
60%

VALIDATION
DATASET
20%

TESTING DATASET
20%
DATAROBOT
DEMO – PART
#3: MODEL
TRAINING &
FEATURE
IMPORTANCE
DATAROBOT
NOW CLICK ON THE START BUTTON
TO BEGIN YOUR TRAINING.
DATAROBOT
NOW, YOU CAN SEE THE PROGRESS OF
YOUR TRAINING IN THE SIDEBAR.
DATAROBOT
WHILE THE TRAINING IS IN PROGRESS, YOU CAN SEE THE
ASSOCIATION BETWEEN FEATURES BY CLICKING ON FEATURE
ASSOCIATION BOX.
DATAROBOT
BY CLICKING ON FEATURE ASSOCIATION PAIRS, YOU CAN SEE
A MUCH CLEARER ASSOCIATION BETWEEN TWO FEATURES.
DATAROBOT
UNDER INSIGHTS, YOU CAN SEE FEATURE IMPORTANCE
DATAROBOT AI
DEMO – PART #4:
CLASSIFICATION
MODELS BASIC
DEFINITIONS
(PRECISION,
RECALL, ROC, &
AUC)
CONFUSION MATRIX
TRUE CLASS

+ -

TYPE I ERROR
+ TRUE + FALSE +

PREDICTIONS

FALSE - TRUE -
-
TYPE II ERROR
CLASSIFICATION MODEL KPIs

• A confusion matrix is used to describe the performance of a classification model:

o True positives (TP): cases when classifier predicted TRUE (they have the disease), and
correct class was TRUE (patient has disease).

o True negatives (TN): cases when model predicted FALSE (no disease), and correct class
was FALSE (patient do not have disease).

o False positives (FP) (Type I error): classifier predicted TRUE, but correct class was FALSE
(patient did not have disease).

o False negatives (FN) (Type II error): classifier predicted FALSE (patient do not have
disease), but they actually do have the disease

o Classification Accuracy = (TP+TN) / (TP + TN + FP + FN)

o Misclassification rate (Error Rate) = (FP + FN) / (TP + TN + FP + FN)


PRECISION Vs. RECALL
o Precision = TP/Total TRUE Predictions = TP/ (TP+FP) (When model predicted TRUE class, how
often was it right?)
o Recall = TP/ Actual TRUE = TP/ (TP+FN) (when the class was actually TRUE, how often did the
classifier get it right?)

TRUE CLASS
PREDICTIONS
+ -

+ TRUE + FALSE +

FALSE - TRUE -
-
PRECISION Vs. RECALL EXAMPLE

TRUE CLASS FACTS:


100 PATIENTS TOTAL
91 PATIENTS ARE HEALTHY

+ - 9 PATIENTS HAVE CANCER


PREDICTIONS

• Accuracy is generally misleading and is not

+ TP = 1 FP = 1 enough to assess the performance of a


classifier.
• Recall is an important KPI in situations
where:
o Dataset is highly unbalanced; cases
- FN = 8 TN = 90 when you have small cancer patients
compared to healthy ones.

o Classification Accuracy = (TP+TN) / (TP + TN + FP + FN) = 91%


o Precision = TP/Total TRUE Predictions = TP/ (TP+FP) = ½=50%
o Recall = TP/ Actual TRUE = TP/ (TP+FN) = 1/9 = 11%
ROC (RECEIVER OPERATING
CHARACTERISTIC CURVE)

• ROC Curve is a metric that assesses the model ability


to distinguish between binary (0 or 1) classes.
• The ROC curve is created by plotting the true positive
rate (TPR) against the false positive rate (FPR) at
various threshold settings.
• The true-positive rate is also known
as sensitivity, recall or probability of
detection in machine learning.
• The false-positive rate is also known as
the probability of false alarm.
• Points above the diagonal line represent good
classification (better than random)
• The model performance improves if it becomes
skewed towards the upper left corner.

Photo Credit: https://commons.wikimedia.org/wiki/File:Roccurves.png


AUC (AREA UNDER CURVE) • The light blue area represents the
area Under the Curve of the Receiver
Operating Characteristic (AUROC).
• The diagonal dashed red line
PREDICTOR #1
represents the ROC curve of a random
predictor with AUROC of 0.5.
TRUE POSITIVE RATE

PREDICTOR #2 • If ROC AUC = 1, perfect classifier


• Predictor #1 is better than predictor
#2
• Higher the AUC, the better the model
is at predicting 0s as 0s and 1s as 1s.

RANDOM
PREDICTOR

FALSE POSITIVE RATE


DATAROBOT DEMO
– PART #5: MODEL
ASSESSMENT
DATAROBOT
ONCE THE TRAINING IS DONE, YOU CAN VIEW THE
PERFORMANCE OF DIFFERENT MODELS BY CLICKING ON
MODELS TAB. THIS MIGHT TAKE AT LEAST 30 MINUTES.
DATAROBOT
YOU CAN ALSO SEE THE ARCHITECTURE OF
THE CHOSEN MODEL
DATAROBOT
YOU CAN ALSO SEE THE MODEL INFO
DATAROBOT
DATAROBOT WHEN YOU CLICK THE EVALUATION TAB, YOU CAN SEE DIFFERENT WAYS TO EVALUATE
THE MODEL. YOU CAN SEE AUC, CONFUSION METRIC AND PREDICTION DISTRIBUTION.
GREAT RESOURCE BY JASON BROWNLEE:
DATAROBOT HTTPS://MACHINELEARNINGMASTERY.COM/THRESHOLD-
MOVING-FOR-IMBALANCED-CLASSIFICATION/

IF YOU SET THE THRESHOLD TOO HIGH, YOU WILL BECOME SUPER SELECTIVE,
MEANING YOUR RECALL (TRUE POSITIVE RATE)~0 BUT YOUR FALSE POSITIVE RATE
WILL REDUCE DRAMATICALLY AS WELL (YOU WON’T MAKE MANY MISTAKES)
GREAT RESOURCE BY JASON BROWNLEE:
DATAROBOT HTTPS://MACHINELEARNINGMASTERY.COM/THRESHOLD-
MOVING-FOR-IMBALANCED-CLASSIFICATION/

IF YOU SET THE THRESHOLD LITTLE LOWER, YOU WILL NOT BECOME SELECTIVE
ANYMORE AND MOST DATA POINTS WILL BE CLASSIFIED AS 1, MEANING YOUR RECALL
(TRUE POSITIVE RATE) WILL BE EQUAL ONE (BECAUSE YOU HAVE DETECTED ALL BAD
LOANS) BUT YOUR FALSE POSITIVE RATE WILL INCREASE DRAMATICALLY AS WELL (YOU
WILL MESS UP TOO MUCH AND MAKE MANY MISTAKES).
DATAROBOT TO EVALUATE FEATURE FIT OF THE
MODEL, CLICK ON THE COMPUTE
FEATURE FIT BOX
DATAROBOT
FOR AI/ML MODEL EXPLAINABILITY PURPOSES, WE CAN EXPLORE
FEATURE IMPACTS AND FEATURE EFFECTS AS SHOWN BELOW.
DATAROBOT
FOR AI/ML MODEL EXPLAINABILITY PURPOSES, WE CAN EXPLORE
FEATURE IMPACTS AND FEATURE EFFECTS AS SHOWN BELOW.
DATAROBOT
HERE, YOU CAN SEE THE FEATURES AND
THEIR EFFECTS ON THE MODEL TO COME
UP WITH A DECISION.
DATAROBOT DEMO
– PART #6:
HYPERPARAMETER
S TUNING
DATAROBOT
YOU CAN TUNE THE HYPERPARAMETERS BY CLICKING ON ADVANCED
TUNING OPTION. YOU CAN SPECIFY THE VALUES OF PREDICTION
PARAMETER.
DATAROBOT
TUNE THE MODEL PARAMTERS
DATAROBOT
ONCE YOU MAKE CHANGES TO THE PARAMETER, CLICK
ON UPDATE PARAMETER.
CLICK ON BEGIN TUNING TO START HYPER PARAMETER TUNING. THE NEW
DATAROBOT MODEL WILL BE ADDED TO THE LEADERBOARD
DATAROBOT
THE NEW MODEL SHOWN ON THE BOTTOM OF THE LEADERBOARD.
DATAROBOT

EVEN AFTER PARAMETER TUNING, THE INITIAL MODEL SEEMS TO BE PERFORMING


WELL, SO WE ARE STICKING WITH THE ORIGINAL MODEL.
DATAROBOT DEMO
– PART #7: MODEL
DEPLOYMENT &
MAKING
INFERENCE
DATAROBOT
TO DEPLOY THE MODEL, CLICK ON THE DEPLOY
AUTOMODEL OPTION ON THE TOP
DATAROBOT

AFTER DEPLOYING THE MODEL, CLICK ON


THE CREATE APPLICATION OPTION.
DATAROBOT
CLICK ON APPLICATIONS AND APPLICATION
GALLERY
DATAROBOT
CLICK THE PREDICTOR OPTION
AND THEN DEPLOY IT.
DATAROBOT
LAUNCH PREDICTOR
APPLICATION
DATAROBOT
YOU WILL ARRIVE AT THIS PAGE,
AFTER DEPLOYING THE PREDICTOR.
CLICK ON THE NEW RECORD
DATAROBOT

HERE, YOU CAN GIVE THE


INFERENCE INPUT TO THE MODEL.
DATAROBOT
UPLOAD THE INFERENCE FILE
DATAROBOT ONCE YOU CLICK ON THE INFERENCE RECORD, IT WILL SHOW IF THAT RECORD
HAS A HIGH OR LOW CHANCE.

annual_
emp_length_int home_ownership income_category loan_amount term application_type purpose interest_payments interest_rate grade dti region
inc

0.6 RENT Low 31000 2600 65 months INDIVIDUAL car High 16 C 1.5 leinster
DATAROBOT

EXPLORE FEATURES BELOW

You might also like