2.WhyMachineLearning.pdf
2.WhyMachineLearning.pdf
1
Modelling objectives
• Modelling for inference
• Modelling for missing data (intrapolation)
• Modelling for prediction (extrapolation)
2
Modelling for inference
• Modelling for inference (discovery of possible
causes): error = residual
𝑦𝑦�
3
Modelling for prediction
• Modelling for prediction:
• β𝑜𝑜 + β1𝑥𝑥1 + β2𝑥𝑥2 + 𝑒𝑒 = 𝑦𝑦
• β𝑜𝑜 + β1𝑥𝑥1 + β2𝑥𝑥2 = 𝑦𝑦�
• 95% confidence interval of �𝑦𝑦 =[lb….ub]
4
The regression equation: geometrical
interpretation
Introductory statistics for Business and Economics by Thomas Wonnacott and Ronald Wonnacott 6
Regular regression fitting
• The sum-squared-error cost function
• A.k.a. Ordinary least squares (OLS)
- The point
of this function
is to minimize the
Minimize: ei in the previous
slide such that
their mean is zero
7
Residual
• e is assumed to be
– Centered around zero
– Normal (if doing inference with small samples)
– Statistically independent from the input variables
• x has no impact on the level of e
• x has no impact on the variance (standard deviation) of e
– as when the variance of e is constant
8
Gaussian White Noise
• Gaussian White Noise is a Gaussian White Noise
process that 0.45
FALSE) 0.25
identically distributed) 0
109
145
181
217
253
289
325
361
397
433
469
1
37
73
9
=0
10
https://www.analyticsvidhya.com/blog/2018/09/non-stationary-time-series-python/
Time series decomposition
1 2
3 4
11
Time series decomposition
5 6
=gaussian
noise
https://www.investopedia.com/articles/trading/07/stationary.asp 12
SARIMA prediction
13
Complexity of autoregressive models
1. AR(p):
– p is the number of target lag inputs in the regression
– p depends on the partial auto correlation graph of the target
2. ARMA(p, q):
– adds q, the number of residual inputs in the regression
– the residuals are generated by the previous application of ARMA(p,q)
regression
– q depends on the auto correlation graph of the target
3. ARIMA (p,d,q):
– adds d, the number of times the target is differenced
4. SARIMA (p,d,q)x(P,D,Q,m):
– adds m, the seasonal period length, D, the seasonal differencing, P, the
seasonal lag input no. & Q, the seasonal regression residual input no.
5. SARIMAX: adds exogenous inputs
6. GARCH: applies ARMA to the residuals.
14
Traditional statistical modelling
1. Models may require complex target and input
transformation to become linear
– Scaling may be required, but other transformations as
well like log transform or exponentiation, applied as
many times as necessary etc.
2. Model selection may require too much human input:
– May involve checking the stationarity of the residuals
• Correcting this is difficult without expertise
– May require hyper-parameter tuning
• E.g. Regularization hyper-parameter.
• Could be made algorithmic.
3. Suited for inferencing
15
Machine learning and deep learning
models
1. Are easily converted to non-linear and so require less
input and target transformation
– Scaling may be required
2. Model selection may require less human input.
– Some models, e.g. LSTMs, tolerate non-stationarity of the
residuals (but you do need to select the correct cost
function)
– May require hyper-parameter tuning
• E.g. regularization hyper-parameter
• Usually algorithmic.
3. Not suited for inferencing
– Input coefficients are not available or are not as
informative
16
Performance of Machine and Deep
Learning
17
Learn Keras for Deep Neural Networks by Moolayil
Types of models
• Cross-sectional: target and input variables
– The order of the target and input variable rows does
not contain important information
– The target and input variable rows can be shuffled (in
unison)
• Time-series: target and input variables are
indexed to time
– The order of the target and input variable rows
contains important information
– The target and input variable rows cannot be shuffled
18
Cross-sectional models
• Classification by target:
– Continuous target: Regression models
– Categorical target: Classification or clustering models
• Classification by number of inputs and targets
– Univariate input vs. multivariate input
– Single-target vs. multi-target
• Examples of multivariate models:
– Regression: Prediction of crop yield based on inputs like:
amount of light, fertilizer, water, soil acidity etc.
– Classification: Prediction of credit default (or no default)
based on inputs like: gender, salary, age, marriage status,
etc.
19
Time series models
• Classification by target:
– Continuous target: Regression models
– Categorical target: Classification models or clustering
• Classification by how the input is generated:
– Endogenous input: input is a lag of the target
• Econometric models
– A.K.A. autoregressive models
» AR(p) MA(q) ARMA(p,q) ARIMA(p,d,q) etc.
– A.K.A. “univariate” although one or more lags of the target may be used
as inputs
• Exponential smoothing, Kalman filters, Markov chains
– Exogenous input: one or more inputs is not a lag of the
target
• General Differences
20
Types of machine learning
• Machine Learning systems can be classified
according to the amount and type of
supervision they get during training:
1. supervised learning
2. unsupervised learning
3. semi-supervised learning
4. reinforcement learning
21
Supervised learning
Hands-On Machine
Learning with
Regression Scikit-Learn and
TensorFlow by
Geron 23
Supervised learning
• K-Nearest Neighbors
• Linear regression
• Logistic regression
• Support vector machines (SVMs)
• Decision trees and Random forests
• Neural networks
24
Unsupervised learning
Hands-On Machine
Learning with
Scikit-Learn and
TensorFlow by
Geron 25
Unsupervised learning
• Clustering
– k-Means
– Gaussian mixtures
– Hierarchical Cluster Analysis (HCA)
– Expectation Maximization
• Visualization and dimensionality reduction
– Principal Component Analysis (PCA)
– Kernel PCA
– Locally-Linear Embedding (LLE)
– t-distributed Stochastic Neighbor Embedding (t-SNE)
– Anomaly detection
• Association rule learning
– Apriori
– Eclat
26
Reinforcement Learning
Hands-On Machine
Learning with
Scikit-Learn and
TensorFlow by
Geron 27