RDataMining Slides Time Series Analysis PDF
RDataMining Slides Time Series Analysis PDF
Yanchang Zhao
http://www.RDataMining.com
July 2019
1 / 40
Contents
Introduction
Online Resources
2 / 40
∗
Time Series Analysis with R
∗
Chapter 8: Time Series Analysis and Mining, in book
R and Data Mining: Examples and Case Studies.
http://www.rdatamining.com/docs/RDataMining-book.pdf
3 / 40
Time Series Data in R
I class ts
I represents data which has been sampled at equispaced points
in time
I frequency=7: a weekly series
I frequency=12: a monthly series
I frequency=4: a quarterly series
4 / 40
Time Series Data in R
str(a)
## Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8 9 10...
attributes(a)
## $tsp
## [1] 2011.167 2012.750 12.000
##
## $class
## [1] "ts"
5 / 40
Contents
Introduction
Online Resources
6 / 40
What is Time Series Decomposition
7 / 40
Data AirPassengers
Data AirPassengers: monthly totals of Box Jenkins international
airline passengers, 1949 to 1960. It has 144(=12×12) values.
## load time series data
plot(AirPassengers)
600
500
AirPassengers
400
300
200
100
0
−20
−40
2 4 6 8 10 12
9 / 40
Decomposition
plot(f)
500
observed
300
450100
350
trend
250
20 40 60 150
seasonal
0
60 −40
40
random
20
0
−40
2 4 6 8 10 12
Time 10 / 40
Contents
Introduction
Online Resources
11 / 40
Time Series Forecasting
12 / 40
Forecasting
13 / 40
Forecasting
700 Actual
Forecast
Error Bounds (95% Confidence)
600
500
400
300
200
100
Time
14 / 40
Contents
Introduction
Online Resources
15 / 40
Time Series Clustering
16 / 40
Dynamic Time Warping (DTW)
DTW finds optimal alignment between two time
series [Keogh and Pazzani, 2001].
## Dynamic Time Warping (DTW)
library(dtw)
idx <- seq(0, 2 * pi, len = 100)
a <- sin(idx) + runif(100)/10
b <- cos(idx)
align <- dtw(a, b, step = asymmetricP1, keep = T)
dtwPlotTwoWay(align)
1.0
0.5
Query value
0.0
−0.5
−1.0
0 20 40 60 80 100 17 / 40
Synthetic Control Chart Time Series
18 / 40
Synthetic Control Chart Time Series
19 / 40
201 101 1
25 30 35 40 45 15 20 25 30 35 40 45 24 26 28 30 32 34 36
0
Six Classes
10
20
30
Time
40
50
60
501 401 301
10 15 20 25 30 35 25 30 35 40 45 0 10 20 30
0
10
20
30
Time
40
50
60
20 / 40
Hierarchical Clustering with Euclidean distance
21 / 40
Hierarchical Clustering with Euclidean distance
140
120
100
Height
80
60
2
2
5
6
6
40
2
2
4
2
2
6
2
2
5
6
6
6
4
5
5
2
2
5
5
6
6
5
5
3
1
4
20
4
4
3
4
4
3
3
1
5
5
3
3
3
3
4
4
1
6
6
1
3
3
1
1
1
1
1
1
dist(sample2)
hclust (*, "average")
22 / 40
Hierarchical Clustering with Euclidean distance
23 / 40
Hierarchical Clustering with DTW Distance
24 / 40
Height
3
3
3
3
3
3
5
5
3
3
3
3
5
5
5
5
5
5
5
5
6
6
6
4
6
4
4
4
4
4
4
4
myDist
4
4
6
6
6
6
hclust (*, "average") 6
6
1
1
1
1
1
1
1
1
1
1
2
Hierarchical Clustering with DTW Distance
2
2
2
2
2
2
2
2
2
25 / 40
Contents
Introduction
Online Resources
26 / 40
Time Series Classification
27 / 40
Decision Tree (ctree)
28 / 40
Decision Tree
# accuracy
(sum(classId == pClassId))/nrow(sc)
## [1] 0.8183333
29 / 40
DWT (Discrete Wavelet Transform)
I Wavelet transform provides a multi-resolution representation
using wavelets [Burrus et al., 1998].
I Haar Wavelet Transform – the simplest DWT
http://dmr.ath.cx/gfx/haar/
30 / 40
DWT (Discrete Wavelet Transform)
31 / 40
Decision Tree with DWT
(sum(classId==pClassId)) / nrow(wtSc)
## [1] 0.8883333
32 / 40
plot(ct, ip_args = list(pval = F), ep_args = list(digits = 0))
1
V57
Node 4 (n = 68) Node 5 (n = 6) Node 7 (n = 9) Node 8 (n = 86) Node 10 (n = 31) Node 13 (n = 80) Node 15 (n = 9) Node 16 (n = 99) Node 18 (n = 12) Node 20 (n = 103) Node 21 (n = 97)
1 1 1 1 1 1 1 1 1 1 1
0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
0 0 0 0 0 0 0 0 0 0 0
123456 123456 123456 123456 123456 123456 123456 123456 123456 123456 123456
33 / 40
k-NN Classification
## k-NN classification
k <- 20
newTS <- sc[501, ] + runif(100) * 15
distances <- dist(newTS, sc, method = "DTW")
s <- sort(as.vector(distances), index.return = TRUE)
# class IDs of k nearest neighbours
table(classId[s$ix[1:k]])
##
## 4 6
## 3 17
34 / 40
k-NN Classification
## k-NN classification
k <- 20
newTS <- sc[501, ] + runif(100) * 15
distances <- dist(newTS, sc, method = "DTW")
s <- sort(as.vector(distances), index.return = TRUE)
# class IDs of k nearest neighbours
table(classId[s$ix[1:k]])
##
## 4 6
## 3 17
34 / 40
The TSclust Package
†
http://cran.r-project.org/web/packages/TSclust/
35 / 40
Contents
Introduction
Online Resources
36 / 40
Online Resources
37 / 40
The End
Thanks!
Email: yanchang(at)RDataMining.com
Twitter: @RDataMining
38 / 40
How to Cite This Work
I Citation
Yanchang Zhao. R and Data Mining: Examples and Case Studies. ISBN
978-0-12-396963-7, December 2012. Academic Press, Elsevier. 256
pages. URL: http://www.rdatamining.com/docs/RDataMining-book.pdf.
I BibTex
@BOOK{Zhao2012R,
title = {R and Data Mining: Examples and Case Studies},
publisher = {Academic Press, Elsevier},
year = {2012},
author = {Yanchang Zhao},
pages = {256},
month = {December},
isbn = {978-0-123-96963-7},
keywords = {R, data mining},
url = {http://www.rdatamining.com/docs/RDataMining-book.pdf}
}
39 / 40
References I
Zhao, Y. (2012).
R and Data Mining: Examples and Case Studies, ISBN 978-0-12-396963-7.
Academic Press, Elsevier.
40 / 40