Random Sample vs Time Series

1. Random Sample

In Applied Statistics course, you learned couple of formulas for basic statistical inference: \[ \mbox{Assuming } X_1, \ldots, X_n \mbox{ are RS (i.i.d) from } N(\mu, \sigma) \mbox{distribution}\\ \hspace{10mm}\\ \hspace{10mm}\\ 95\% \mbox{ Confidence Interval for mean } \mu: \hspace{15mm} \bar X \pm 1.96 \frac{S}{\sqrt n}\\ \hspace{10mm}\\ 95\% \mbox{ Prediction Interval for the next observation} : \hspace{15mm} \bar X \pm 1.96 \, S \, \sqrt{\frac{1}{n} + 1} \]

Let’s simulate that using R. Random Sample of size 100 from the normal distribution with mean 2 and SD 2 looks like this:

set.seed(1453)
X <- rnorm(100, 0, 1)                # generate RS of size 100 from N(mean=0,SD=1)
plot(X)                              # scatter plot with dots

plot(X,type='o', ylim=c(-3.5, 3.5))  # change type and zoom out on y-axis

hist(X)                              # histogram of X

Inference for Random Sample

Using the two formulas, we get the statistical inference for the simulated sample above.

mean(X)

## [1] -0.05789066

sd(X)

## [1] 0.9623882

#- compute CI
upper_CI = mean(X) + 1.96 * sd(X) * sqrt(1/100)
lower_CI = mean(X) - 1.96 * sd(X) * sqrt(1/100)

#- compute PI
upper_PI = mean(X) + 1.96 * sd(X) * sqrt(1/100 + 1)
lower_PI = mean(X) - 1.96 * sd(X) * sqrt(1/100 + 1)


plot(X,type='o', ylim=c(-3.5, 3.5), xlim=c(1,120))   # same plot as before
abline(h=upper_CI, col="red",  lty=2)           # add red dashed lines for CI
abline(h=lower_CI, col="red",  lty=2)
abline(h=upper_PI, col="blue", lty=2)           # add blue dashed lines for PI
abline(h=lower_PI, col="blue", lty=2)

This is all we can do to forecast if the data is random sample.

2. Example of Time Series

First, let’s load the package called TSA this is the package that comes with Cryer’s textbook.

# required only the first time you use the package
install.packages("TSA")    # if asked to choose CRAN mirror, choose MI or OH site

# required every time you restart R
acf1 <- acf        # Keep original acf() function
library(TSA)       # load package TSA
acf <- acf1        # replace back the new acf with original

LA rainfall (p2)

Now load the data that comes with TSA package. Here’s the data for annual rain falls in LA.

  data(larain)       # load dataset called larain that is in TSA package
  is.ts(larain)      # is larain TS object?

## [1] TRUE

  larain             # see what's inside larain

## Time Series:
## Start = 1878 
## End = 1992 
## Frequency = 1 
##   [1] 20.86 17.41 18.65  5.53 10.74 14.14 40.29 10.53 16.72 16.02 20.82 33.26 12.69
##  [14] 12.84 18.72 21.96  7.51 12.55 11.80 14.28  4.83  8.69 11.30 11.96 13.12 14.77
##  [27] 11.88 19.19 21.46 15.30 13.74 23.92  4.89 17.85  9.78 17.17 23.21 16.67 23.29
##  [40]  8.45 17.49  8.82 11.18 19.85 15.27  6.25  8.11  8.94 18.56 18.63  8.69  8.32
##  [53] 13.02 18.93 10.72 18.76 14.67 14.49 18.24 17.97 27.16 12.06 20.26 31.28  7.40
##  [66] 22.57 17.45 12.78 16.22  4.13  7.59 10.63  7.38 14.33 24.95  4.08 13.69 11.89
##  [79] 13.62 13.24 17.49  6.23  9.57  5.83 15.37 12.31  7.98 26.81 12.91 23.66  7.58
##  [92] 26.32 16.54  9.26  6.54 17.45 16.69 10.70 11.01 14.97 30.57 17.00 26.33 10.92
## [105] 14.41 34.04  8.90  8.92 18.00  9.11 11.57  4.56  6.49 15.07 22.65

  plot(larain,  ylab='Inches', xlab='Year', type='o')  # plot the data with labels

Chemical Process (p3)

Here’s color peoperty of a chemicals mixed in a plant.

  data(color)    # load dataset called color that is in TSA package
  plot(color,  ylab='Color Property',xlab='Batch',type='o')

Abundance of Canadian Hare (p5)

Here’s annual population of rabbits in Canada

  data(hare)    # load dataset
  plot(hare,ylab='Abundance',xlab='Year',type='o')

Monthly Oil Filter Sales (p7)

Sales for oil filters

  data(oilfilters)    # load dataset
  plot(oilfilters,type='o',ylab='Sales')

  is.ts(oilfilters)      # is data in TS format?

## [1] TRUE

  oilfilters             # see what's inside oilfilters

##       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1983                               2385 3302 3958 3302 2441 3107
## 1984 5862 4536 4625 4492 4486 4005 3744 2546 1954 2285 1778 3222
## 1985 5472 5310 1965 3791 3622 3726 3370 2535 1572 2146 2249 1721
## 1986 5357 5811 2436 4608 2871 3349 2909 2324 1603 2148 2245 1586
## 1987 5332 5787 2886 5475 3843 2537

  #-- Plot with month's initial letter
  plot(oilfilters,type='l',ylab='Sales')
  points(y=oilfilters,x=time(oilfilters), pch=as.vector(season(oilfilters)))

3. Difference Between RS and TS

Back to Hare Data

plot(hare,ylab='Abundance',xlab='Year',type='o')

length(hare)   # see how long is the hare data

## [1] 31

hare           # show what's inside hare

## Time Series:
## Start = 1905 
## End = 1935 
## Frequency = 1 
##  [1] 50 20 20 22 27 50 55 78 70 59 28 20 15 15 25 35 65 78 82 65 26 15 10  1  2  3
## [27] 22 75 95 78 20

#- This year vs Last Year
plot(x=hare[2:31], y=hare[1:30],
     ylab="Last Year's Abundance",
     xlab="This Year's Abundance")

#- This year vs 2 Years ago ---
plot(x=hare[3:31], y=hare[1:29],
     ylab="Abundance 2yrs Ago",
     xlab="This Year's Abundance")

Chemical Process (p3)

Here’s color peoperty of a chemicals mixed in a plant.

data(color)       # load the dataset
plot(color,ylab='Color Property',xlab='Batch',type='o')

#- Plot of color vs previous color
plot(x=color[2:35], y=color[1:34], xlab='Color Property', ylab='Previous Batch')

Back Random Sample X

plot(X, type='o')

#- Plot value of X agains previous value of X
plot(X[2:100], X[1:99], ylab='value of X', xlab='previous value of X')

Not so much pattern here.

4. Autocorrelation

To capture the pattern we saw between current vs previous values of hare data and color data, we use correlation.

#- Hare data This year vs Last Year
plot(x=hare[2:31], y=hare[1:30],
     ylab="Last Year's Abundance",
     xlab="This Year's Abundance")

cor(hare[2:31], hare[1:30])    # Autocorrelation of Lag 1

## [1] 0.7025777

#- Hare data This year vs 2 Years ago
plot(x=hare[3:31], y=hare[1:29],
  ylab="Abundance 2yrs Ago",
  xlab="This Year's Abundance")

cor(hare[3:31], hare[1:29])    # Autocorrelation of Lag 2

## [1] 0.2244224

#- Color data Plot of color vs previous color
plot(x=color[2:35], y=color[1:34], xlab='Color Property', ylab='Previous Batch')

cor(color[2:35], color[1:34])  # Autocorrelation of Lag 1

## [1] 0.554917

This pattern doesn’t exist in random sample X.

  #-- Random Sample X:  Plot value of X agains previous value of X
  plot(X[2:100], X[1:99], ylab='value of X', xlab='previous value of X')

  cor(X[2:100], X[1:99])         # Auto correlation of Lag 1

## [1] -0.002031602

Autocorrelation Function (ACF)

Autocorrelation Function is a collection of all the autocorrelaton at all the different lags.

  acf(hare)          # Autocorrelation Function of hare data

  acf(color)         # Autocorrelation Function of color data

  acf(X)             # Autocorrelation Functoin of RS X

Summary

What is the key difference between Time Series and Random Sample?
How do you check if a data is random sample, and not time series?

Random Sample vs Time Series

Nao Mimoto - Dept. of Statistics : The University of Akron

1. Random Sample

Inference for Random Sample

2. Example of Time Series

LA rainfall (p2)

Chemical Process (p3)

Abundance of Canadian Hare (p5)

Monthly Oil Filter Sales (p7)

3. Difference Between RS and TS

Back to Hare Data

Chemical Process (p3)

Back Random Sample X

4. Autocorrelation

Autocorrelation Function (ACF)

Summary