Class Web Page



1. Is the ture mean 0?

Suppose we have a random sample

## [1] 2.016148

Question is if our ture mean \(\mu\) in this plot is \(0\) or not.

We know the sample mean \(\bar Y\) is not \(0\). But is this a random fluctuation? or is this an evidence that we have non-zero true mean?

2. Random Sample Case

We can say that above random sample comes from model like this: \[ X_t = \mu + \epsilon_t \hspace15mm \left\{ \begin{array}{ll} \mu : &\mbox{ Constant Trend (deterministic) }\\ \epsilon_t : &\mbox{ iid noise from } N(0, \sigma)\\ \end{array} \right. \]

\(\mu\) and \(\sigma\) are unknown.

This is equivalent to saying \(X_i\) is iid random sample from \(N(\mu,\sigma)\) distribution.

We will use \(\bar X\) to estimate \(\mu\).

CI for mean \(\mu\)

We know how ‘well’ \(\bar X\) estimates \(\mu\). \[ \bar X \sim N\Big(\mu, \frac{\sigma}{\sqrt n}\Big) \]

That gives us 95% CI for \(\mu\) \[ \bar X \pm 1.96 \frac{S}{\sqrt n} \] Since \(\sigma\) is unkown, it is replaced by the sample standard deviation \(S\).

If \(0\) is outside of the CI, then we reject \(H_0: \mu=0\) with 5% confidence level.

## [1] 2.016148
## [1] 4.655382
## [1] 0.9124548
## [1] 1.103693 2.016148 2.928603

Since \(0\) is outside of the 95% CI for \(\mu\), with 95% confidence, we conclude that the true mean \(\mu\) is not \(0\).

3. Time Series Case

We want to do the same when the data shows autocorrelation.

Model

Suppose we observe \(X_t\), and model it as: \[ Y_t = \mu + X_t \hspace10mm \left\{ \begin{array}{ll} \mu : &\mbox{ Constant Trend (deterministic) }\\ X_t : &\mbox{ Stationary Time Series } \end{array} \right. \] We assume that \(E X_t=0\) and \(V(X_t) = \sigma^2\).


Note that if \(X_t\) has autocorrelation, then \(Y_t\) is also autocorrelated.

We will use \(\bar Y\) to estimate \(\mu\), as usual.

Expectation of Sample Mean

Does property of \(\bar Y\) changes? Expectation is \[ E (\bar Y) \hspace3mm = \hspace3mm E \Big(\frac{1}{n} \sum_{t=1}^n Y_t\Big) \hspace3mm = \hspace3mm \frac{1}{n} \sum_{t=1}^n E (Y_t) \]

For the mean of one \(Y\), we have \[ E(Y_t) \hspace3mm = \hspace3mm E(\mu + X_t) \hspace3mm = \hspace3mm \mu + E(X_t) \hspace3mm = \hspace3mm \mu + 0 \hspace3mm = \hspace3mm \mu \]

So we do have \(E(\bar Y) = \mu\) and that’s same as the RS case.

i.e. \(\bar Y\) is still unbiased estimator of \(\mu\).

Variance of Sample Mean

How about the variance of \(\bar Y\)?

Since adding a constant does not change the variance, \[ \begin{aligned} V(\bar Y) \hspace3mm = \hspace3mm V(\bar Y - \mu ) \hspace3mm = \hspace3mm V \Big( \frac{1}{n} \sum_{i=1}^n (Y_i - \mu) \Big) \hspace3mm = \hspace3mm V \Big( \frac{1}{n} \sum_{i=1}^n X_i \Big) \hspace3mm = \hspace3mm V ( \bar X ) \end{aligned} \]

Now the variance of \(\bar X\) can be written as, \[ V (\bar X) \hspace3mm = \hspace3mm V \Big( \frac{1}{n} \sum_{i=1}^n X_i \Big) \hspace3mm = \hspace3mm \frac{1}{n^2} \, \mbox{Cov}\Big(\sum_{i=1}^n X_i, \, \sum_{j=1}^n X_j \Big) \hspace3mm = \hspace3mm \frac{1}{n^2} \Big[\mbox{ sum of Cov of all pairs }(X_i, X_j) \Big] \]

\(\Big[\mbox{ Sum of Cov of all pairs }(X_i, X_j) \Big]\) means, \[ \hspace3mm = \hspace3mm \mbox{ Sum of Cov of pairs } \left[ \begin{array}{c|cccc} & X_1 & X_2 & \cdots & X_n \\ \hline X_1 & \ddots & & \cdots & \\ X_2 & & \ddots & \cdots & \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ X_n & & & \cdots & \\ \end{array} \right] \hspace5mm = \hspace5mm \mbox{ Sum of } \left[ \begin{array}{cccc} \gamma(0) & \gamma(1) & \cdots & \gamma(n-1) \\ \gamma(-1) & \gamma(0) & \cdots & \gamma(n-2) \\ \vdots & \vdots &\ddots & \vdots \\ \gamma(n-1) & \gamma(n-2) & \cdots & \gamma(0) \\ \end{array} \right]\\ \] where \(\gamma(h)\) is ACVF at lag \(h\).

Conbining, we get \[ V(\bar Y) \hspace3mm = \hspace3mm V(\bar X) \hspace3mm = \hspace3mm \frac{1}{n^2} \sum_{h=-n}^n (n-|h|) \gamma(h) \hspace3mm = \hspace3mm \frac{1}{n^2} \sum_{h=-n}^n (n-|h|) \gamma(h) \hspace3mm = \hspace3mm \frac{1}{n} \sum_{h=-n}^n \Big(1 -\frac{|h|}{n}\Big) \gamma(h). \]


The variance of sample mean is different under the presence of autocorrelation.

Note that if \(X_t\) were iid as in the first section, then \(\gamma(0)=1\), and \(\gamma(h)=0\) for all \(h \ne 0\). Above formula reduces to \[ V (\bar X) = \frac{1}{n} \Big(1 -\frac{|0|}{n}\Big) \gamma(0) \hspace3mm = \hspace3mm \frac{\sigma^2}{n} \]

CI for \(\mu\) When \(Y_t\) is TS

So we have \[ V (\bar Y) = \frac{\nu^2}{n} \hspace5mm \mbox{ where } \hspace5mm \nu^2 = \sum_{h=-n}^n \Big(1 -\frac{|h|}{n}\Big) \gamma(h). \] That means approximately, \[ \bar Y \sim N\Big(\mu , \frac{\nu}{\sqrt n} \Big). \] Then the confidence interval for \(\mu\) is \[ \bar Y \pm 1.96 \sqrt{\frac{ \nu^2 }{n}} \]


In practice, we don’t know the true value of \(\gamma(h)\), so we need to use the sample version \(\hat \gamma(h)\). (sample ACVF instead of theoretical ACVF) \[ \hat \nu^2 = \sum_{h=-\sqrt n}^{\sqrt n} \Big(1 -\frac{|h|}{n}\Big) \hat \gamma(h). \]

Also, we can’t really use \(\hat \gamma(h)\) with \(h\) close to \(n\). So sum goes from \(-\sqrt{n}\) to \(\sqrt{n}\) instead.

Finally, modify the sum to: \[ \hat \nu^2 = \sum_{h=-\sqrt n}^{\sqrt n} \Big(1 -\frac{|h|}{n}\Big) \hat \gamma(h) \hspace3mm = \hspace3mm \gamma(0) + 2 \sum_{h=1}^{\sqrt n} \Big(1 -\frac{|h|}{n}\Big) \hat \gamma(h). \]

4. Ex: Testing Mean when There’s Autocorrelation

Because of the presence of significant ACF at lag 1, we need to use the new formula for CI.

## [1] 70
## List of 6
##  $ acf   : num [1:19, 1, 1] 35.06 19.04 9.88 1.38 -5.13 ...
##  $ type  : chr "covariance"
##  $ n.used: int 70
##  $ lag   : num [1:19, 1, 1] 0 1 2 3 4 5 6 7 8 9 ...
##  $ series: chr "Y"
##  $ snames: NULL
##  - attr(*, "class")= chr "acf"
## [1] 1.732918
## [1] 1.197743
## [1] 0.7127774
## [1] -0.6146584  1.7329183  4.0804951

Because 95% CI includes \(0\), we can not reject the null hypothesis that the true mean of \(Y\) is \(0\).

Note that if we used the RS version of the formula, we would have rejected.

Example: Color data

This time, let’s just get CI for the ture mean without testing to see if it is different from 0.

Note that I’m using the exact copy of the code used in the last section.

## [1] 35
## List of 6
##  $ acf   : num [1:16, 1, 1] 36.04 19.04 11.79 8.08 3.31 ...
##  $ type  : chr "covariance"
##  $ n.used: int 35
##  $ lag   : num [1:16, 1, 1] 0 1 2 3 4 5 6 7 8 9 ...
##  $ series: chr "Y"
##  $ snames: NULL
##  - attr(*, "class")= chr "acf"
## [1] 74.88571
## [1] 1.799286
## [1] 1.029621
## [1] 71.35911 74.88571 78.41232



Summary

  • For random sample, the sample mean has approximately the distribution of: \[ N\left(\mu, \hspace{2mm} \frac{\sigma}{\sqrt{n}}\right). \] where \(\sigma\) can be estimated by the sample standard deviaiton.
  • For autocorrelated data, the sample mean has approximately the distribution of: \[ N\left(\mu, \hspace{2mm} \frac{\nu}{\sqrt{n}}\right), \] where \(\nu\) can be estimated using sample ACVF via \[ \hat \nu^2 = \gamma(0) + 2 \sum_{h=1}^{\sqrt n} \Big(1 -\frac{|h|}{n}\Big) \hat \gamma(h). \]
  • For autocorrelated data, variance of the sample mean is bigger than the random sample case. We must estimate the sample mean variance by summing up sample ACVF.