ACVF and ACF

1. Autocorrelation

In the last lecture, we look at Canadian Hare Abundance data, and fond autocorrelation at lag 1.

acf1 <- acf    # copy the original acf function
library(TSA)   # load package TSA, which replaces default acf
acf  <- acf1   # copy back the original

data(hare)     # load hare data from TSA

length(hare)

## [1] 31

thisYear <- hare[2:31]
lastYear <- hare[1:30]

plot(lastYear, thisYear, ylab="This Year",xlab="Previous Year", xlim=c(-10,100), ylim=c(-10,100))

cor(lastYear, thisYear)    # calculate correlation

## [1] 0.7025777

This prompted us to plot the autocorrelation function (ACF) at various lags.

acf(hare)

1.1 Formula for ACF

When you have two vectors, \[ (X_1, \ldots, X_n), \hspace5mm \mbox{ and } \hspace5mm (Y_1, \ldots, Y_n) \] + Formula for Sample Correlation $r$ \[ r = \hat \rho = \frac{1}{n-1} \sum_{i=1}^n \frac{(X_i - \bar X)}{S_X}\frac{(Y_i - \bar Y)}{S_Y} \] where $S_x$ and $S_Y$ are sample standard deviation of $X$ and $Y$ respectively.

Given one vector $X_1, \ldots, X_{31}$, Autocorrelation function needs to measure correlation between \[ \mbox{ This Year:} \hspace5mm (X_2, X_3, X_4, \ldots, X_{31}) \hspace5mm \mbox{ vs } \hspace5mm \mbox{ Last Year:} \hspace5mm (X_1, X_2, X_3, \ldots, X_{30}) \] + Formula for Autocovariance Function at Lag 1 is \[ \hat \gamma(1) = \frac{1}{n} \sum_{i=1}^{n-1} (X_i - \bar X) (X_{i-1} - \bar X) \\ \]
+ Formula for Autocovariance Function at Lag h is \[ \hat \gamma(h) = \frac{1}{n} \sum_{i=1}^{n-|h|} (X_i - \bar X) (X_{i-|h|} - \bar X) \\ \]

1.2 ACF and ACVF

(Autocorrelation and Autocovariance) Given sequence of random variable $\{X_1, \dots, X_n\}$,

ACF : AutoCorrelation Function (at lag $h$) \[ \rho(h) = \mbox{COR}(X_t, X_{t-h}) \]

ACVF : AutoCoVariance Function (at lag $h$) \[ \gamma(h) = \mbox{COV}(X_t, X_{t-h}) \]

1.3 Cov and ACVF

Sample Covariance \[ \hat \rho = \frac{COV(X_t, Y_t)}{\sqrt{V(X_t) V(Y_t)}} \]

Sample Autocovariance Function \[ \hat \gamma(h) = \frac{1}{n} \sum_{i=1}^{n-|h|} (X_{t} - \bar X) (X_{t+|h|} - \bar X) \] \[ \hat \gamma(0) = V(X_t) \]

ACF and ACVF is related as: \[ \hat \rho(h) = \frac{\hat \gamma(h)}{\hat \gamma(0)} \] because $\hat \gamma(0)$ is same as Sample Variance of $X= S^2_X$.

1.4 Properties

ACF and ACVF is symmetric in $h$. (e.g. $\gamma(h) = \gamma(-h)$)

$\rho(0) = 1$ and $\hat \rho(0) = 1$.

Don’t plot for $h$ that is too big relative to $n$. ($n \geq 50$ and $h \leq n/4$)

  acf(hare)                      # compute and plot ACF

  Rho.hat <- acf(hare)           # compute ACF and assign to Rho.hat

  Rho.hat

## 
## Autocorrelations of series 'hare', by lag
## 
##      0      1      2      3      4      5      6      7      8      9     10 
##  1.000  0.695  0.213 -0.243 -0.505 -0.607 -0.582 -0.371 -0.046  0.325  0.534 
##     11     12     13     14 
##  0.495  0.277  0.006 -0.217

  acf(hare, type="covariance")   # compute and plot ACVF

  Gam.hat <- acf(hare, type="covariance")   # store numbers from ACVF

  Gam.hat

## 
## Autocovariances of series 'hare', by lag
## 
##       0       1       2       3       4       5       6       7       8       9 
##  754.57  524.16  160.51 -183.37 -380.72 -458.11 -439.43 -280.20  -35.07  245.28 
##      10      11      12      13      14 
##  403.27  373.61  209.11    4.76 -163.73

  Gam.hat[0]

## 
## Autocovariances of series 'hare', by lag
## 
##   0 
## 755

  var(hare)                      # not same as Gam.hat[0]

## [1] 779.7226

  var(hare) * 30 / 31            # same as Gam.hat[0]

## [1] 754.5702

This is because of the difference in the denominator (See 1.1)

2. Sample ACF under no autocorrelation

If your data was iid, $X_t$ and $X_{t+h}$ should be uncorrelated. Then Theoretical ACVF and ACF: \[ \gamma(0) = Var(X_t) \hspace5mm \mbox{ and } \hspace5mm \gamma(h) = 0 \mbox{ for } h \ne 0. \\ \hspace5mm \\ \rho(0)=1 \hspace10mm \mbox{ and } \hspace10mm \rho(h)=0 \mbox{ for } h \ne 0. \]

And sample ACVF and ACF, $\hat \gamma(h)$ and $\hat \rho(h)$ are estimating $0$. Distribution of $(h) $ when $\rho(h) = 0$ \[ \hat \rho(h) \sim N\Big(0, \frac{1}{\sqrt{n}}\Big) \hspace10mm h\ne 0, \mbox{ under iid.} \] This means that if data are uncorrelated, then for each $h$, 95% of sample ACF must be within $1.96/\sqrt n$

2.1 Monte Carlo Simulation

n = 100
X <- rnorm(n, 2, 2)                # random sample form N(2,SD=2)
Rho.hat <- acf(X, plot=FALSE)      # calculate sample ACF of X (w/o plotting)

Th.Rho <- c(1, rep(0,15))          # Theo ACF is (1,0,0,0,....)

#-- Make plot --
layout(matrix(1:2, 1, 2))          # make two plots side by side (1row 2col)
plot(X)                                              # 1st plot
plot(Rho.hat, type="h", xlim=c(0,15), ylim=c(-0.4,1))     # 2nd plot
lines(0:15, Th.Rho, type="p", col="red",
      xlim=c(0,15), ylim=c(-0.4,1), xlab="", ylab="")  # overlay red Teo.Rho on 2nd plot

Let’s repeat this 100 times and see the overall behaivor of Sample ACF under no correlation.

#--- Put above in a loop ---
for (i in 1:100) {
    X <- rnorm(n, 2, 2)
    Rh <- acf(X, plot=FALSE)

    plot(Rho.hat, type="p", xlim=c(0,15), ylim=c(-0.4,1))
    par(new=T)                  # another way to overlay plots
    plot(0:15, Th.Rho, type="p", col="red", xlim=c(0,15), ylim=c(-0.4,1), xlab="",ylab="")
    par(new=T)
}

Because the randome sample we are generating is uncorrelated, the blue dash line ($1.96/\sqrt n$=.196) contains approximately 95% of sample ACF.

2.2 Diagnosis for LArain data

Ff data is Random Sample, then plot of ACF should show almost all the bars within 95% CI under iid ($1.96/\sqrt{n}$).

Blue dotted line in acf() = $\pm 1.96/\sqrt{n}$

If too many acf() is outside, it is an evidence that the data is autocorrelated.

  data(larain)
  plot(larain)

  length(larain)     # this is n

## [1] 115

  acf(larain)

  1.96/sqrt(115)     # size of the blue line

## [1] 0.1827709

No ACF(h) is outside of the blue dotted line $\Rightarrow$ LArain data is probably white noise.

Summary

To check if a time series is Random Sample (White Noise), then plot its ACF, see if 95% of them are between the blue dashed line.
The blue dashed line in acf() is $\pm 1.96 / \sqrt n$.
For ACF and ACVF to be plotted and analyzed, the series must be Weakly Stationary.