Data Analysis Activity 1
Data Analysis Activity 1
STAT 4534
25th September, 2023
Task 2
- The square root transformation reduces some variance but it still cannot be
classified as homoscedastic, while the log transformation significantly reduces the
variance, making the data homoscedastic.
Task 3
- Yes, both time series are trending similarly and has overlapping changes
Task 4
- The log transformation data has constant variance (stati0nary time series), while
the square-root transformation data has a huge variance and is a non stationary
time series.
Task 5
- We can still observe some periodic effect and it would align with our reasoning for
the effect.
- Yes there is some periodic behaviour, one very first day of the week (Monday), it
seems to have leading to a biased peak of Monday. However, the frequency of covid
cases should not depend on weekdays.
- The 7 day moving average standard deviation reduces variance and the data seems
to appear homoscedastic.
Task 6
- We can still observe some periodic effect notable trends in dots above the zero line)
and it would align with our reasoning of the weekend effect.
- Yes there is some periodic behaviour, one very first day of the week (Monday), it
seems to have leading to a biased peak of Monday. However, the frequency of covid
cases should not depend on weekdays.
- The 7 day moving average standard deviation reduces variance but the data seems
to appear heteroscedastic.
Task 7
Based on above activity, the logarithm transformation is better at making the data
homoscedatic.
Appendix
library(ggplot2)
library(astsa)
library(zoo)
#Task 1
covidcase<-le$new.cases
date<-as.Date(le$date,"%m.%d.%y")
tsplot(date,covidcase)
covidcase<-ts(covidcase,frequency = 7)
#Task 2
library(TTR)
par(mfrow=c(1,3))
#Task 3
par(mfrow=c(1,2))
ma.covidcase.sqrt <- filter(covidcase.sqrt, sides=2,
filter=rep(1/7,7))
ma.covidcase.log <- filter(covidcase.log, sides=2,
filter=rep(1/7,7))
plot(date, ma.covidcase.sqrt,type="l")
plot(date, ma.covidcase.log,type="l")
#Task 4
par(mfrow=c(1,2))
plot(date, detrended.covidcase.sqrt,type="l")
plot(date, detrended.covidcase.log,type="l")
# Task 5
par(mfrow=c(1,2))
acf(detrended.covidcase.log,na.action=na.pass)
pacf(detrended.covidcase.log,na.action=na.pass)
plot(date[-(1:8)], DIFF7.DIFF.covidcase.log)
acf(DIFF7.DIFF.covidcase.log)
pacf(DIFF7.DIFF.covidcase.log)
par(mfrow=c(1,1))
sd.DIFF7.DIFF.covidcase.log <-
runSD(DIFF7.DIFF.covidcase.log,7)
plot(date[-(1:8)], sd.DIFF7.DIFF.covidcase.log,type="l")
# Task 6
par(mfrow=c(1,2))
acf(detrended.covidcase.sqrt,na.action=na.pass)
pacf(detrended.covidcase.sqrt,na.action=na.pass)
plot(date[-(1:8)], DIFF7.DIFF.covidcase.sqrt)
acf(DIFF7.DIFF.covidcase.sqrt)
pacf(DIFF7.DIFF.covidcase.sqrt)
par(mfrow=c(1,1))
sd.DIFF7.DIFF.covidcase.sqrt <-
runSD(DIFF7.DIFF.covidcase.sqrt,7)
plot(date[-(1:8)], sd.DIFF7.DIFF.covidcase.sqrt,type="l")