0% found this document useful (0 votes)

5 views

2.WhyMachineLearning.pdf

Uploaded by

hantao.deng

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

2.WhyMachineLearning.pdf

Uploaded by

hantao.deng

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Why machine learning and deep

learning are relevant to finance

Modelling structured data

1
Modelling objectives
• Modelling for inference
• Modelling for missing data (intrapolation)
• Modelling for prediction (extrapolation)

2
Modelling for inference
• Modelling for inference (discovery of possible
causes): error = residual
𝑦𝑦�

• β𝑜𝑜 + β1𝑥𝑥1 + β2𝑥𝑥2 + 𝑒𝑒 = 𝑦𝑦

• 95% confidence interval of β1 =[lb1….ub1]
• 95% confidence interval of β2 =[lb2….ub2]
• lb= lower bound, ub = upper bound

3
Modelling for prediction
• Modelling for prediction:
• β𝑜𝑜 + β1𝑥𝑥1 + β2𝑥𝑥2 + 𝑒𝑒 = 𝑦𝑦
• β𝑜𝑜 + β1𝑥𝑥1 + β2𝑥𝑥2 = 𝑦𝑦�
• 95% confidence interval of �𝑦𝑦 =[lb….ub]

4
The regression equation: geometrical
interpretation

𝑦𝑦𝑖𝑖 = β𝑜𝑜 + β1𝑥𝑥1 + β2𝑥𝑥2 + 𝑒𝑒𝑖𝑖

𝐸𝐸(𝑦𝑦𝑖𝑖 ) = β𝑜𝑜 + β1𝑥𝑥1 + β2𝑥𝑥2 +𝐸𝐸(𝑒𝑒𝑖𝑖 )

Introductory statistics for Business and Economics by Thomas

https://en.wikipedia.org/wiki/Aircraft_principal_axes
Wonnacott and Ronald Wonnacott
5
The regression errors: geometrical
interpretation

Introductory statistics for Business and Economics by Thomas Wonnacott and Ronald Wonnacott 6
Regular regression fitting
• The sum-squared-error cost function
• A.k.a. Ordinary least squares (OLS)
- The point
of this function
is to minimize the
Minimize: ei in the previous
slide such that
their mean is zero

7
Residual
• e is assumed to be
– Centered around zero
– Normal (if doing inference with small samples)
– Statistically independent from the input variables
• x has no impact on the level of e
• x has no impact on the variance (standard deviation) of e
– as when the variance of e is constant

SEE: RegrGoodnessOfFitResiduals.pdf especially pp. 374-5

8
Gaussian White Noise
• Gaussian White Noise is a Gaussian White Noise
process that 0.45

• Can be generated using 0.4

an excel formula 0.35

NORMDIST(RAND(), 0,1, 0.3

FALSE) 0.25

• Whose mean is static

Gaussian White
0.2 Noise

• Whose variance is static 0.15

• Each sample is i.i.d. 0.1

(independent and 0.05

identically distributed) 0
109
145
181
217
253
289
325
361
397
433
469
1
37
73

9
=0

10
https://www.analyticsvidhya.com/blog/2018/09/non-stationary-time-series-python/
Time series decomposition
1 2

3 4

11
Time series decomposition

5 6

=gaussian
noise

To predict the Ads, add every component back together again:

ARMA(2,3) + lag 1 + lag 24 + trend  SARIMA prediction

https://www.investopedia.com/articles/trading/07/stationary.asp 12
SARIMA prediction

13
Complexity of autoregressive models
1. AR(p):
– p is the number of target lag inputs in the regression
– p depends on the partial auto correlation graph of the target
2. ARMA(p, q):
– adds q, the number of residual inputs in the regression
– the residuals are generated by the previous application of ARMA(p,q)
regression
– q depends on the auto correlation graph of the target
3. ARIMA (p,d,q):
– adds d, the number of times the target is differenced
4. SARIMA (p,d,q)x(P,D,Q,m):
– adds m, the seasonal period length, D, the seasonal differencing, P, the
seasonal lag input no. & Q, the seasonal regression residual input no.
5. SARIMAX: adds exogenous inputs
6. GARCH: applies ARMA to the residuals.

14
Traditional statistical modelling
1. Models may require complex target and input
transformation to become linear
– Scaling may be required, but other transformations as
well like log transform or exponentiation, applied as
many times as necessary etc.
2. Model selection may require too much human input:
– May involve checking the stationarity of the residuals
• Correcting this is difficult without expertise
– May require hyper-parameter tuning
• E.g. Regularization hyper-parameter.
• Could be made algorithmic.
3. Suited for inferencing

15
Machine learning and deep learning
models
1. Are easily converted to non-linear and so require less
input and target transformation
– Scaling may be required
2. Model selection may require less human input.
– Some models, e.g. LSTMs, tolerate non-stationarity of the
residuals (but you do need to select the correct cost
function)
– May require hyper-parameter tuning
• E.g. regularization hyper-parameter
• Usually algorithmic.
3. Not suited for inferencing
– Input coefficients are not available or are not as
informative
16
Performance of Machine and Deep
Learning

1 2 Increase in data starts to be irrelevant

Structured
Data

17
Learn Keras for Deep Neural Networks by Moolayil
Types of models
• Cross-sectional: target and input variables
– The order of the target and input variable rows does
not contain important information
– The target and input variable rows can be shuffled (in
unison)
• Time-series: target and input variables are
indexed to time
– The order of the target and input variable rows
contains important information
– The target and input variable rows cannot be shuffled

18
Cross-sectional models
• Classification by target:
– Continuous target: Regression models
– Categorical target: Classification or clustering models
• Classification by number of inputs and targets
– Univariate input vs. multivariate input
– Single-target vs. multi-target
• Examples of multivariate models:
– Regression: Prediction of crop yield based on inputs like:
amount of light, fertilizer, water, soil acidity etc.
– Classification: Prediction of credit default (or no default)
based on inputs like: gender, salary, age, marriage status,
etc.

19
Time series models
• Classification by target:
– Continuous target: Regression models
– Categorical target: Classification models or clustering
• Classification by how the input is generated:
– Endogenous input: input is a lag of the target
• Econometric models
– A.K.A. autoregressive models
» AR(p) MA(q) ARMA(p,q) ARIMA(p,d,q) etc.
– A.K.A. “univariate” although one or more lags of the target may be used
as inputs
• Exponential smoothing, Kalman filters, Markov chains
– Exogenous input: one or more inputs is not a lag of the
target
• General Differences

20
Types of machine learning
• Machine Learning systems can be classified
according to the amount and type of
supervision they get during training:
1. supervised learning
2. unsupervised learning
3. semi-supervised learning
4. reinforcement learning

21
Supervised learning

Classification Hands-On Machine

Learning with
Scikit-Learn and
TensorFlow by
Geron 22
Supervised learning

Hands-On Machine
Learning with
Regression Scikit-Learn and
TensorFlow by
Geron 23
Supervised learning
• K-Nearest Neighbors
• Linear regression
• Logistic regression
• Support vector machines (SVMs)
• Decision trees and Random forests
• Neural networks

24
Unsupervised learning

Hands-On Machine
Learning with
Scikit-Learn and
TensorFlow by
Geron 25
Unsupervised learning
• Clustering
– k-Means
– Gaussian mixtures
– Hierarchical Cluster Analysis (HCA)
– Expectation Maximization
• Visualization and dimensionality reduction
– Principal Component Analysis (PCA)
– Kernel PCA
– Locally-Linear Embedding (LLE)
– t-distributed Stochastic Neighbor Embedding (t-SNE)
– Anomaly detection
• Association rule learning
– Apriori
– Eclat

26
Reinforcement Learning

Hands-On Machine
Learning with
Scikit-Learn and
TensorFlow by
Geron 27

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
Knee Ability Zero Now Complete As A Picture Book 4 PDF Free
94% (68)
Knee Ability Zero Now Complete As A Picture Book 4 PDF Free
49 pages
Read People Like A Book by Patrick King-Edited
61% (72)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (29)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
78% (27)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (55)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (7)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
70% (70)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Chatfield Ch. - Time-Series Forecasting (2000) (1st Edition)
100% (2)
Chatfield Ch. - Time-Series Forecasting (2000) (1st Edition)
265 pages
Assignment 4
No ratings yet
Assignment 4
4 pages
Symbolic Exec
No ratings yet
Symbolic Exec
38 pages
Data Preparation For ML in Practice v213
No ratings yet
Data Preparation For ML in Practice v213
78 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Seu Ds610 Mod03
No ratings yet
Seu Ds610 Mod03
45 pages
Design and Analysis of Algorithms
91% (11)
Design and Analysis of Algorithms
94 pages
5224 Measure Phase 2
No ratings yet
5224 Measure Phase 2
41 pages
Ugc Net Exam Daa PDF
No ratings yet
Ugc Net Exam Daa PDF
94 pages
CS1252 - Design and Analysis of Algorithms
100% (1)
CS1252 - Design and Analysis of Algorithms
94 pages
1.3.2. Feature Engineering and Variable - Transformation
No ratings yet
1.3.2. Feature Engineering and Variable - Transformation
29 pages
Modul3 DEA 3
No ratings yet
Modul3 DEA 3
32 pages
Monte Carlo Simulation
No ratings yet
Monte Carlo Simulation
18 pages
Unit3_rev3
No ratings yet
Unit3_rev3
201 pages
Batch Performance Analysis: Team Botsnbots
No ratings yet
Batch Performance Analysis: Team Botsnbots
14 pages
Arima
No ratings yet
Arima
65 pages
7.b-CMP460-S22-Linear Models - Regularization
No ratings yet
7.b-CMP460-S22-Linear Models - Regularization
9 pages
Descriptive Statistics Analysis Part 2
No ratings yet
Descriptive Statistics Analysis Part 2
40 pages
Topic 2 Nonlinear Regression Models
No ratings yet
Topic 2 Nonlinear Regression Models
16 pages
FALLSEM2024-25 BCSE401L TH VL2024250102084 2024-08-23 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE401L TH VL2024250102084 2024-08-23 Reference-Material-I
29 pages
7.c-CMP460-S22-Linear Models - Gradient Descent
No ratings yet
7.c-CMP460-S22-Linear Models - Gradient Descent
25 pages
Self-Adjusting Computation: Robert Harper Carnegie Mellon University (With Umut Acar and Guy Blelloch)
No ratings yet
Self-Adjusting Computation: Robert Harper Carnegie Mellon University (With Umut Acar and Guy Blelloch)
53 pages
Feedforward Networks: Marco Kuhlmann
No ratings yet
Feedforward Networks: Marco Kuhlmann
53 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
Lec 8
No ratings yet
Lec 8
43 pages
L 10 Principal Component Analysis 09052024 072206pm
No ratings yet
L 10 Principal Component Analysis 09052024 072206pm
37 pages
06 Profiling
No ratings yet
06 Profiling
29 pages
Minitab Handbook
100% (1)
Minitab Handbook
121 pages
ML Notes
No ratings yet
ML Notes
14 pages
Data Science - Convex Optimization and Examples PDF
No ratings yet
Data Science - Convex Optimization and Examples PDF
9 pages
Modeling and Simulation: ME 635/IPD 611 Kishore Pochiraju
No ratings yet
Modeling and Simulation: ME 635/IPD 611 Kishore Pochiraju
48 pages
2_DataPreProcessing_code
No ratings yet
2_DataPreProcessing_code
46 pages
Testing The CAPM: Statistical Framework For Estimation and Testing
No ratings yet
Testing The CAPM: Statistical Framework For Estimation and Testing
32 pages
Unit 2_Part_1
No ratings yet
Unit 2_Part_1
32 pages
MODULE 2 Deep Learning
No ratings yet
MODULE 2 Deep Learning
26 pages
Midterm 2 Review Packet-1
No ratings yet
Midterm 2 Review Packet-1
11 pages
Selecting A Model and Validation
No ratings yet
Selecting A Model and Validation
49 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
Name: P. Revathi Department: ECE Designation: Asst. Professor Title: Behavioural Level Modelling
No ratings yet
Name: P. Revathi Department: ECE Designation: Asst. Professor Title: Behavioural Level Modelling
69 pages
Lesson 5 Model Selection
No ratings yet
Lesson 5 Model Selection
41 pages
Midterm 2 Review Packet Solutions
No ratings yet
Midterm 2 Review Packet Solutions
11 pages
2-ML Principles
No ratings yet
2-ML Principles
34 pages
Week2 Complexity Analysis
No ratings yet
Week2 Complexity Analysis
74 pages
Algorithmic Differentiation - C++ and Extremum Estimation - Matt P. Dziubinski - CppCon 2015
No ratings yet
Algorithmic Differentiation - C++ and Extremum Estimation - Matt P. Dziubinski - CppCon 2015
283 pages
Programming For Performance
No ratings yet
Programming For Performance
79 pages
make_non-stationary
No ratings yet
make_non-stationary
29 pages
tutorial 1,2
No ratings yet
tutorial 1,2
12 pages
2015 Regression Using Stata and SAS
No ratings yet
2015 Regression Using Stata and SAS
36 pages
Sequence Alignment
No ratings yet
Sequence Alignment
92 pages
Data Science Checklist
No ratings yet
Data Science Checklist
22 pages
l06_machine_learning
No ratings yet
l06_machine_learning
52 pages
Handout 1: Prepared by Poompat Saengudomlert, Asian Institute of Technology
No ratings yet
Handout 1: Prepared by Poompat Saengudomlert, Asian Institute of Technology
6 pages
Feature Engineering
No ratings yet
Feature Engineering
35 pages
Lecture 2: Introduction To Pytorch
No ratings yet
Lecture 2: Introduction To Pytorch
7 pages
INDUSTRY 2 Akshat
No ratings yet
INDUSTRY 2 Akshat
12 pages
Nalysis of Lgorithms
No ratings yet
Nalysis of Lgorithms
17 pages
DIP Zero To Hero Practice Manual
No ratings yet
DIP Zero To Hero Practice Manual
230 pages
BDA-PPT Final
No ratings yet
BDA-PPT Final
28 pages
CD Unit 4
No ratings yet
CD Unit 4
152 pages
data science course training in india hyderabad: innomatics research labs
From Everand
data science course training in india hyderabad: innomatics research labs
innomatics research labs
No ratings yet
Solutions Manual Ch13 - 2012
No ratings yet
Solutions Manual Ch13 - 2012
37 pages
Machinng Minitab
No ratings yet
Machinng Minitab
5 pages
Design and Analysis of Experiments with SAS 1st Lawson Solution Manual 2024 scribd download full chapters
100% (1)
Design and Analysis of Experiments with SAS 1st Lawson Solution Manual 2024 scribd download full chapters
39 pages
Forecasting - CocaColaSales
No ratings yet
Forecasting - CocaColaSales
6 pages
Chapter 5,6 Regression Analysis
50% (2)
Chapter 5,6 Regression Analysis
44 pages
A Little Book of R For Time Series
100% (1)
A Little Book of R For Time Series
75 pages
Vecto Rs
No ratings yet
Vecto Rs
330 pages
Sampling Dist
No ratings yet
Sampling Dist
40 pages
Gerbils
No ratings yet
Gerbils
9 pages
Ppa 696: Sampling
No ratings yet
Ppa 696: Sampling
8 pages
(Lecture Notes in Control and Information Sciences) Andrzej Janczak - Identification of Nonlinear Systems Using Neural Networks and Polynomial Models - A Block-Oriented Approach-Springer (2004)
No ratings yet
(Lecture Notes in Control and Information Sciences) Andrzej Janczak - Identification of Nonlinear Systems Using Neural Networks and Polynomial Models - A Block-Oriented Approach-Springer (2004)
208 pages
Econ5025 Practice Problems
43% (7)
Econ5025 Practice Problems
33 pages
75 226 1 PB
No ratings yet
75 226 1 PB
13 pages
Asme PTC 4 - 2008
No ratings yet
Asme PTC 4 - 2008
4 pages
Sources of Autocorrelation
No ratings yet
Sources of Autocorrelation
11 pages
calbiration techniques
No ratings yet
calbiration techniques
15 pages
Some Social Predictors of Gamophobia
No ratings yet
Some Social Predictors of Gamophobia
12 pages
Instant Download Generalized Linear Models and Extensions Fourth Edition Hardin PDF All Chapters
100% (5)
Instant Download Generalized Linear Models and Extensions Fourth Edition Hardin PDF All Chapters
62 pages
Formulas 12 Eng
No ratings yet
Formulas 12 Eng
2 pages
Error Analysis in Experimental Physical Science "Mini-Version"
No ratings yet
Error Analysis in Experimental Physical Science "Mini-Version"
10 pages
NT Lab Manual
100% (1)
NT Lab Manual
54 pages
SAMPLE Midterm Exam #2
No ratings yet
SAMPLE Midterm Exam #2
11 pages
Revised Thesis Paper Dumagat Ombao
No ratings yet
Revised Thesis Paper Dumagat Ombao
35 pages
Unit-5 Curve Fitting by Numerical Method
100% (2)
Unit-5 Curve Fitting by Numerical Method
10 pages
Proc GLM - Sas User Guide
No ratings yet
Proc GLM - Sas User Guide
190 pages
Lagrange Multiplier PDF
No ratings yet
Lagrange Multiplier PDF
10 pages
Development of SO Prediction Model With Artificial Neural Network For Pune City
No ratings yet
Development of SO Prediction Model With Artificial Neural Network For Pune City
8 pages
EViews 2nd Week Assignment With Solution
No ratings yet
EViews 2nd Week Assignment With Solution
12 pages
Bodo Winter's ANOVA Tutorial
No ratings yet
Bodo Winter's ANOVA Tutorial
18 pages