0% found this document useful (0 votes)
18 views

Data Analysis Activity 1

- The document analyzes COVID-19 case data using time series analysis methods like transformations, differencing, and autocorrelation functions. - The log transformation is found to be better than the square root transformation at making the data homoscedastic. - There is evidence of periodic behavior in the autocorrelation functions, which may be explained by reduced testing on weekends. - The 7-day moving average helps reduce variance and appearance of heteroscedasticity in the differenced and transformed time series.

Uploaded by

IncreDABels
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Data Analysis Activity 1

- The document analyzes COVID-19 case data using time series analysis methods like transformations, differencing, and autocorrelation functions. - The log transformation is found to be better than the square root transformation at making the data homoscedastic. - There is evidence of periodic behavior in the autocorrelation functions, which may be explained by reduced testing on weekends. - The 7-day moving average helps reduce variance and appearance of heteroscedasticity in the differenced and transformed time series.

Uploaded by

IncreDABels
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Jay Kapoor

STAT 4534
25th September, 2023

Data Analysis Activity 1


Task 1

- The time is series is stationary.


- Yes, there are notable trends in the time series.
- Yes, there are both short term and long term cyclical behaviour in our time series
data.
- Yes, there is heteroscedasticity.

Task 2

- The square root transformation reduces some variance but it still cannot be
classified as homoscedastic, while the log transformation significantly reduces the
variance, making the data homoscedastic.
Task 3

- Yes, both time series are trending similarly and has overlapping changes

Task 4

- The log transformation data has constant variance (stati0nary time series), while
the square-root transformation data has a huge variance and is a non stationary
time series.
Task 5

- Yes, we can observe the periodic effect in sample ACF.

- The periodic-effect can be a explained by the systematic delays in the diagnostic


and reporting process due to reduced testing frequencies on weekends, holidays,
curfew/lock-downs in several regions.

- We can still observe some periodic effect and it would align with our reasoning for
the effect.
- Yes there is some periodic behaviour, one very first day of the week (Monday), it
seems to have leading to a biased peak of Monday. However, the frequency of covid
cases should not depend on weekdays.

- The 7 day moving average standard deviation reduces variance and the data seems
to appear homoscedastic.
Task 6

- Yes, we can observe the periodic effect in sample ACF.

- The periodic-effect can be a explained by the systematic delays in the diagnostic


and reporting process due to reduced testing frequencies on weekends, holidays,
curfew/lock-downs in several regions.

- We can still observe some periodic effect notable trends in dots above the zero line)
and it would align with our reasoning of the weekend effect.
- Yes there is some periodic behaviour, one very first day of the week (Monday), it
seems to have leading to a biased peak of Monday. However, the frequency of covid
cases should not depend on weekdays.

- The 7 day moving average standard deviation reduces variance but the data seems
to appear heteroscedastic.

Task 7

Based on above activity, the logarithm transformation is better at making the data
homoscedatic.
Appendix

# Jay Kapoor ([email protected])


# STAT 4534
# Data Analysis Activity 1

library(ggplot2)
library(astsa)
library(zoo)

#Task 1

le <- read.csv("D:/STAT 4534 time series/Analysis Activity


1/COVID-19.csv")

covidcase<-le$new.cases
date<-as.Date(le$date,"%m.%d.%y")
tsplot(date,covidcase)

covidcase<-ts(covidcase,frequency = 7)

#Task 2

#Performing sqrt transformation


par(mfrow=c(1,2))
covidcase.sqrt <- sqrt(covidcase)
tsplot(date, covidcase.sqrt,type="l")

#Performing log transformation

covidcase.log <- log(covidcase)


tsplot(date, covidcase.log,type="l")

library(TTR)

#Performing standard deviation for different transformations

par(mfrow=c(1,3))

sd.covidcase <- runSD(covidcase,7)


plot(date, sd.covidcase,type="l",main="moving sd(new cases)")

sd.covidcase.sqrt <- runSD(covidcase.sqrt,7)


plot(date, sd.covidcase.sqrt,type="l",main="moving sd(sqrt(new
cases))")

sd.covidcase.log <- runSD(covidcase.log,7)


plot(date, sd.covidcase.log,type="l",main="moving sd(log(new
cases))")

#Task 3
par(mfrow=c(1,2))
ma.covidcase.sqrt <- filter(covidcase.sqrt, sides=2,
filter=rep(1/7,7))
ma.covidcase.log <- filter(covidcase.log, sides=2,
filter=rep(1/7,7))

plot(date, ma.covidcase.sqrt,type="l")
plot(date, ma.covidcase.log,type="l")

#Task 4

par(mfrow=c(1,2))

detrended.covidcase.log <- covidcase.log - ma.covidcase.log


detrended.covidcase.sqrt <- covidcase.sqrt - ma.covidcase.sqrt

plot(date, detrended.covidcase.sqrt,type="l")
plot(date, detrended.covidcase.log,type="l")

# Task 5

par(mfrow=c(1,2))
acf(detrended.covidcase.log,na.action=na.pass)
pacf(detrended.covidcase.log,na.action=na.pass)

Wednesday <- rep(c(1,0,0,0,0,0,0),100)[1:length(covidcase)]


Thursday <- rep(c(0,1,0,0,0,0,0),100)[1:length(covidcase)]
Friday <-
rep(c(0,0,1,0,0,0,0),100)[1:length(covidcase)]
Saturday <- rep(c(0,0,0,1,0,0,0),100)[1:length(covidcase)]
Sunday <- rep(c(0,0,0,0,1,0,0),100)[1:length(covidcase)]
Monday <- rep(c(0,0,0,0,0,1,0),100)[1:length(covidcase)]
summary(lm(detrended.covidcase.log ~ Wednesday + Thursday +
Friday + Saturday + Sunday + Monday))

DIFF.covidcase.log <- diff(covidcase.log)


DIFF7.DIFF.covidcase.log <- diff(DIFF.covidcase.log,lag=7)

plot(date[-(1:8)], DIFF7.DIFF.covidcase.log)

acf(DIFF7.DIFF.covidcase.log)
pacf(DIFF7.DIFF.covidcase.log)

par(mfrow=c(1,1))

sd.DIFF7.DIFF.covidcase.log <-
runSD(DIFF7.DIFF.covidcase.log,7)
plot(date[-(1:8)], sd.DIFF7.DIFF.covidcase.log,type="l")

# Task 6
par(mfrow=c(1,2))
acf(detrended.covidcase.sqrt,na.action=na.pass)
pacf(detrended.covidcase.sqrt,na.action=na.pass)

summary(lm(detrended.covidcase.sqrt ~ Wednesday + Thursday +


Friday + Saturday + Sunday + Monday))

DIFF.covidcase.sqrt <- diff(covidcase.sqrt)


DIFF7.DIFF.covidcase.sqrt <- diff(DIFF.covidcase.sqrt,lag=7)

plot(date[-(1:8)], DIFF7.DIFF.covidcase.sqrt)

acf(DIFF7.DIFF.covidcase.sqrt)
pacf(DIFF7.DIFF.covidcase.sqrt)

par(mfrow=c(1,1))

sd.DIFF7.DIFF.covidcase.sqrt <-
runSD(DIFF7.DIFF.covidcase.sqrt,7)
plot(date[-(1:8)], sd.DIFF7.DIFF.covidcase.sqrt,type="l")

You might also like