Class Web Page



1. Autocorrelation

In the last lecture, we look at Canadian Hare Abundance data, and fond autocorrelation at lag 1.

## [1] 31

## [1] 0.7025777


This prompted us to plot the autocorrelation function (ACF) at various lags.


1.1 Formula for ACF

When you have two vectors, \[ (X_1, \ldots, X_n), \hspace5mm \mbox{ and } \hspace5mm (Y_1, \ldots, Y_n) \] + Formula for Sample Correlation \(r\) \[ r = \hat \rho = \frac{1}{n-1} \sum_{i=1}^n \frac{(X_i - \bar X)}{S_X}\frac{(Y_i - \bar Y)}{S_Y} \] where \(S_x\) and \(S_Y\) are sample standard deviation of \(X\) and \(Y\) respectively.



Given one vector \(X_1, \ldots, X_{31}\), Autocorrelation function needs to measure correlation between \[ \mbox{ This Year:} \hspace5mm (X_2, X_3, X_4, \ldots, X_{31}) \hspace5mm \mbox{ vs } \hspace5mm \mbox{ Last Year:} \hspace5mm (X_1, X_2, X_3, \ldots, X_{30}) \] + Formula for Autocovariance Function at Lag 1 is \[ \hat \gamma(1) = \frac{1}{n} \sum_{i=1}^{n-1} (X_i - \bar X) (X_{i-1} - \bar X) \\ \]
+ Formula for Autocovariance Function at Lag h is \[ \hat \gamma(h) = \frac{1}{n} \sum_{i=1}^{n-|h|} (X_i - \bar X) (X_{i-|h|} - \bar X) \\ \]

1.2 ACF and ACVF

(Autocorrelation and Autocovariance) Given sequence of random variable \(\{X_1, \dots, X_n\}\),

ACF : AutoCorrelation Function (at lag \(h\)) \[ \rho(h) = \mbox{COR}(X_t, X_{t-h}) \]

ACVF : AutoCoVariance Function (at lag \(h\)) \[ \gamma(h) = \mbox{COV}(X_t, X_{t-h}) \]

1.3 Cov and ACVF

Sample Covariance \[ \hat \rho = \frac{COV(X_t, Y_t)}{\sqrt{V(X_t) V(Y_t)}} \]

Sample Autocovariance Function \[ \hat \gamma(h) = \frac{1}{n} \sum_{i=1}^{n-|h|} (X_{t} - \bar X) (X_{t+|h|} - \bar X) \] \[ \hat \gamma(0) = V(X_t) \]

ACF and ACVF is related as: \[ \hat \rho(h) = \frac{\hat \gamma(h)}{\hat \gamma(0)} \] because \(\hat \gamma(0)\) is same as Sample Variance of \(X= S^2_X\).

1.4 Properties

ACF and ACVF is symmetric in \(h\). (e.g. \(\gamma(h) = \gamma(-h)\))

\(\rho(0) = 1\) and \(\hat \rho(0) = 1\).

Don’t plot for \(h\) that is too big relative to \(n\). (\(n \geq 50\) and \(h \leq n/4\))

## 
## Autocorrelations of series 'hare', by lag
## 
##      0      1      2      3      4      5      6      7      8      9     10 
##  1.000  0.695  0.213 -0.243 -0.505 -0.607 -0.582 -0.371 -0.046  0.325  0.534 
##     11     12     13     14 
##  0.495  0.277  0.006 -0.217

## 
## Autocovariances of series 'hare', by lag
## 
##       0       1       2       3       4       5       6       7       8       9 
##  754.57  524.16  160.51 -183.37 -380.72 -458.11 -439.43 -280.20  -35.07  245.28 
##      10      11      12      13      14 
##  403.27  373.61  209.11    4.76 -163.73
## 
## Autocovariances of series 'hare', by lag
## 
##   0 
## 755
## [1] 779.7226
## [1] 754.5702

This is because of the difference in the denominator (See 1.1)

2. Sample ACF under no autocorrelation

If your data was iid, \(X_t\) and \(X_{t+h}\) should be uncorrelated. Then Theoretical ACVF and ACF: \[ \gamma(0) = Var(X_t) \hspace5mm \mbox{ and } \hspace5mm \gamma(h) = 0 \mbox{ for } h \ne 0. \\ \hspace5mm \\ \rho(0)=1 \hspace10mm \mbox{ and } \hspace10mm \rho(h)=0 \mbox{ for } h \ne 0. \]

And sample ACVF and ACF, \(\hat \gamma(h)\) and \(\hat \rho(h)\) are estimating \(0\). Distribution of $(h) $ when \(\rho(h) = 0\) \[ \hat \rho(h) \sim N\Big(0, \frac{1}{\sqrt{n}}\Big) \hspace10mm h\ne 0, \mbox{ under iid.} \] This means that if data are uncorrelated, then for each \(h\), 95% of sample ACF must be within \(1.96/\sqrt n\)

2.2 Diagnosis for LArain data

Ff data is Random Sample, then plot of ACF should show almost all the bars within 95% CI under iid (\(1.96/\sqrt{n}\)).

Blue dotted line in acf() = \(\pm 1.96/\sqrt{n}\)

If too many acf() is outside, it is an evidence that the data is autocorrelated.

## [1] 115

## [1] 0.1827709

No ACF(h) is outside of the blue dotted line \(\Rightarrow\) LArain data is probably white noise.

Summary

  • To check if a time series is Random Sample (White Noise), then plot its ACF, see if 95% of them are between the blue dashed line.
  • The blue dashed line in acf() is \(\pm 1.96 / \sqrt n\).
  • For ACF and ACVF to be plotted and analyzed, the series must be Weakly Stationary.