In the last lecture, we look at Canadian Hare Abundance data, and fond autocorrelation at lag 1.
acf1 <- acf # copy the original acf function
library(TSA) # load package TSA, which replaces default acf
acf <- acf1 # copy back the original
data(hare) # load hare data from TSA
length(hare)
## [1] 31
thisYear <- hare[2:31]
lastYear <- hare[1:30]
plot(lastYear, thisYear, ylab="This Year",xlab="Previous Year", xlim=c(-10,100), ylim=c(-10,100))
## [1] 0.7025777
This prompted us to plot the autocorrelation function (ACF) at various lags.
When you have two vectors, \[ (X_1, \ldots, X_n), \hspace5mm \mbox{ and } \hspace5mm (Y_1, \ldots, Y_n) \] + Formula for Sample Correlation \(r\) \[ r = \hat \rho = \frac{1}{n-1} \sum_{i=1}^n \frac{(X_i - \bar X)}{S_X}\frac{(Y_i - \bar Y)}{S_Y} \] where \(S_x\) and \(S_Y\) are sample standard deviation of \(X\) and \(Y\) respectively.
Given one vector \(X_1, \ldots, X_{31}\), Autocorrelation function needs to measure correlation between \[
\mbox{ This Year:} \hspace5mm (X_2, X_3, X_4, \ldots, X_{31})
\hspace5mm \mbox{ vs } \hspace5mm
\mbox{ Last Year:} \hspace5mm (X_1, X_2, X_3, \ldots, X_{30})
\] + Formula for Autocovariance Function at Lag 1 is \[
\hat \gamma(1) = \frac{1}{n} \sum_{i=1}^{n-1} (X_i - \bar X) (X_{i-1} - \bar X) \\
\]
+ Formula for Autocovariance Function at Lag h is \[
\hat \gamma(h) = \frac{1}{n} \sum_{i=1}^{n-|h|} (X_i - \bar X) (X_{i-|h|} - \bar X) \\
\]
(Autocorrelation and Autocovariance) Given sequence of random variable \(\{X_1, \dots, X_n\}\),
ACF : AutoCorrelation Function (at lag \(h\)) \[ \rho(h) = \mbox{COR}(X_t, X_{t-h}) \]
ACVF : AutoCoVariance Function (at lag \(h\)) \[
\gamma(h) = \mbox{COV}(X_t, X_{t-h})
\]
Sample Covariance \[ \hat \rho = \frac{COV(X_t, Y_t)}{\sqrt{V(X_t) V(Y_t)}} \]
Sample Autocovariance Function \[ \hat \gamma(h) = \frac{1}{n} \sum_{i=1}^{n-|h|} (X_{t} - \bar X) (X_{t+|h|} - \bar X) \] \[ \hat \gamma(0) = V(X_t) \]
ACF and ACVF is related as: \[
\hat \rho(h) = \frac{\hat \gamma(h)}{\hat \gamma(0)}
\] because \(\hat \gamma(0)\) is same as Sample Variance of \(X= S^2_X\).
ACF and ACVF is symmetric in \(h\). (e.g. \(\gamma(h) = \gamma(-h)\))
\(\rho(0) = 1\) and \(\hat \rho(0) = 1\).
Don’t plot for \(h\) that is too big relative to \(n\). (\(n \geq 50\) and \(h \leq n/4\))
##
## Autocorrelations of series 'hare', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10
## 1.000 0.695 0.213 -0.243 -0.505 -0.607 -0.582 -0.371 -0.046 0.325 0.534
## 11 12 13 14
## 0.495 0.277 0.006 -0.217
acf(hare, type="covariance") # compute and plot ACVF
Gam.hat <- acf(hare, type="covariance") # store numbers from ACVF
##
## Autocovariances of series 'hare', by lag
##
## 0 1 2 3 4 5 6 7 8 9
## 754.57 524.16 160.51 -183.37 -380.72 -458.11 -439.43 -280.20 -35.07 245.28
## 10 11 12 13 14
## 403.27 373.61 209.11 4.76 -163.73
##
## Autocovariances of series 'hare', by lag
##
## 0
## 755
## [1] 779.7226
## [1] 754.5702
This is because of the difference in the denominator (See 1.1)
If your data was iid, \(X_t\) and \(X_{t+h}\) should be uncorrelated. Then Theoretical ACVF and ACF: \[ \gamma(0) = Var(X_t) \hspace5mm \mbox{ and } \hspace5mm \gamma(h) = 0 \mbox{ for } h \ne 0. \\ \hspace5mm \\ \rho(0)=1 \hspace10mm \mbox{ and } \hspace10mm \rho(h)=0 \mbox{ for } h \ne 0. \]
And sample ACVF and ACF, \(\hat \gamma(h)\) and \(\hat \rho(h)\) are estimating \(0\). Distribution of $(h) $ when \(\rho(h) = 0\) \[
\hat \rho(h) \sim N\Big(0, \frac{1}{\sqrt{n}}\Big)
\hspace10mm h\ne 0, \mbox{ under iid.}
\] This means that if data are uncorrelated, then for each \(h\), 95% of sample ACF must be within \(1.96/\sqrt n\)
n = 100
X <- rnorm(n, 2, 2) # random sample form N(2,SD=2)
Rho.hat <- acf(X, plot=FALSE) # calculate sample ACF of X (w/o plotting)
Th.Rho <- c(1, rep(0,15)) # Theo ACF is (1,0,0,0,....)
#-- Make plot --
layout(matrix(1:2, 1, 2)) # make two plots side by side (1row 2col)
plot(X) # 1st plot
plot(Rho.hat, type="h", xlim=c(0,15), ylim=c(-0.4,1)) # 2nd plot
lines(0:15, Th.Rho, type="p", col="red",
xlim=c(0,15), ylim=c(-0.4,1), xlab="", ylab="") # overlay red Teo.Rho on 2nd plot
Let’s repeat this 100 times and see the overall behaivor of Sample ACF under no correlation.
#--- Put above in a loop ---
for (i in 1:100) {
X <- rnorm(n, 2, 2)
Rh <- acf(X, plot=FALSE)
plot(Rho.hat, type="p", xlim=c(0,15), ylim=c(-0.4,1))
par(new=T) # another way to overlay plots
plot(0:15, Th.Rho, type="p", col="red", xlim=c(0,15), ylim=c(-0.4,1), xlab="",ylab="")
par(new=T)
}
Because the randome sample we are generating is uncorrelated, the blue dash line (\(1.96/\sqrt n\)=.196) contains approximately 95% of sample ACF.
Ff data is Random Sample, then plot of ACF should show almost all the bars within 95% CI under iid (\(1.96/\sqrt{n}\)).
Blue dotted line in acf() = \(\pm 1.96/\sqrt{n}\)
If too many acf() is outside, it is an evidence that the data is autocorrelated.
## [1] 115
## [1] 0.1827709
No ACF(h) is outside of the blue dotted line \(\Rightarrow\) LArain data is probably white noise.