Class Web Page



1. Trend with Seasonality


Trend with Seasonal period \(s\)

Suppose \(Y_t\) is the observation. The Model says: \[ Y_t \hspace{3mm} = \hspace{3mm} m_t + S_t + X_t \]

\[\begin{align*} m_t &: \mbox{ Trend Component (linear, random)}\\ S_t &: \mbox{ Seasonal Component }\\ X_t &: \mbox{ Stationary Time Series (ARMA?)} \end{align*}\] where \(EX_t=0\).

Seasonality repeats every \(s\) observation. \[ S_{t+s}=S_t. \]

Condition on Seasonal component: \(S_{t+s}=S_t\) and \(\sum_{j=1}^s S_j = 0\).

Seasonality

\(s\) is the seasonality frequency. (e.g. \(s=12\) for monthly seasonality).

Seasonality repeats every \(s\) observation. \[ S_{t+s}=S_t. \]

Seasonality is not a trend. It’s a temporary deviation from overall trend. \[ \sum_{j=1}^s S_j = 0. \]

Ex: Accident Data

Accidental deaths in USA., 1973 to 1978 from Brockwell

  acf1 = acf
  library(TSA)
  acf = acf1      # Load TSA package

  D  = read.csv("https://nmimoto.github.io/datasets/acci.csv", header=T)
  Acci = ts(D, start=c(1973,1), freq=12)  #- Turn D into ts object with frequency

  plot(Acci, type='o', ylab="num of accidents")

  Acci
##        Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct
## 1973  9007  8106  8928  9137 10017 10826 11317 10744  9713  9938
## 1974  7750  6981  8038  8422  8714  9512 10120  9823  8743  9129
## 1975  8162  7306  8124  7870  9387  9556 10093  9620  8285  8433
## 1976  7717  7461  7776  7925  8634  8945 10078  9179  8037  8488
## 1977  7792  6957  7726  8106  8890  9299 10625  9302  8314  8850
## 1978  7836  6892  7791  8129  9115  9434 10484  9827  9110  9070
##        Nov   Dec
## 1973  9161  8927
## 1974  8710  8680
## 1975  8160  8034
## 1976  7874  8647
## 1977  8265  8796
## 1978  8633  9240
  #--- plot with month ---
  plot(Acci, type="l", ylab="num of accidents")
  points(y=Acci, x=time(Acci), pch=as.vector(season(Acci)))


Ex: Oil Filter Sales Data

Oil Filter Sales Data (Cryer p7) inside TSA package.

    data(oilfilters)     # load data inside TSA package
    Oil = oilfilters

    is.ts(Oil)     # is data in TS format?
## [1] TRUE
    Oil            # look inside
##       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1983                               2385 3302 3958 3302 2441 3107
## 1984 5862 4536 4625 4492 4486 4005 3744 2546 1954 2285 1778 3222
## 1985 5472 5310 1965 3791 3622 3726 3370 2535 1572 2146 2249 1721
## 1986 5357 5811 2436 4608 2871 3349 2909 2324 1603 2148 2245 1586
## 1987 5332 5787 2886 5475 3843 2537
    frequency(Oil) # what is the frequency?
## [1] 12
    #--- plot with month ---
    plot(Oil, type="l",ylab="Sales")
    points(y=Oil, x=time(Oil), pch=as.vector(season(Oil)))


3 ways to remove seasonality

There are three poplular ways of removing seasonality:

  • MA filter

  • Seasonal Average

  • Seaonal Differencing

2. MA filter

If \(s\) is odd, let it be \(2q+1\). Then use linear MA filter. \[ \hat m_t \hspace{3mm} = \hspace{3mm} \frac{1}{(2q+1)} \sum_{i=-q}^q Y_{t-j} \] If \(s\) is even, let it be \(2q\). Then use \[ \hat m_t \hspace{3mm} = \hspace{3mm} \frac{1}{2q} \Big(.5 Y_{t-q} + \sum_{i=-q-1}^{q-1} Y_{t-j} + .5 Y_{t+q}\Big) \]

Because seasonality should sum up to zero for each seasonal cycle, we have estimated trend: \[ Y_t = m_t + S_t + X_t \] \[ \hat m_t \hspace{3mm} = \hspace{3mm} \frac{1}{(2q+1)} \sum_{i=-q}^q m_{t-j} + \underbrace{ \frac{1}{(2q+1)} \sum_{i=-q}^q S_{t-j} }_{0} + \underbrace{ \frac{1}{(2q+1)} \sum_{i=-q}^q X_{t-j} }_{\mbox{ small }}. \] Then estimate for (\(S_t\) + \(Y_t\)) is \[ w_k \hspace{3mm} = \hspace{3mm} X_{t} - \hat m_{t} \hspace{10mm} q < t < n-q. \]

Use this to estimate the seasonal part, \[ \hat S_k \hspace{3mm} = \hspace{3mm} w_k - \frac{1}{s} \sum_{i=1}^s w_i. \] Now we have deseasonalized data (\(m_t\) + \(X_t\)) \[ d_t \hspace{3mm} = \hspace{3mm} Y_t - \hat S_t \hspace{10mm} t=1 \cdots n \] Now go back and re-estimate trend using \(d_t\).

MA filter on Accident Data

  m.hat = filter(Acci, c(.5, rep(1,10), .5)/12)
  plot(Acci, type="o")
  lines(m.hat, col="red")

Filter reveals quadtaric trend

3. Seasonal Average

Suppose we have monthly seasonality, \(s=12\). Then take average for each month. For example, average for January will be \[ \bar S_1 \hspace{3mm} = \hspace{3mm} \sum_{j=0} X_{1+12j} \] and July average will be \[ \bar S_7 \hspace{3mm} = \hspace{3mm} \sum_{j=0} X_{7+12j}. \] Note that this seasonal average will take out the trend part as well.

Seasonal Av on Accidents Data

  plot(Acci,    type="o")

  #--- Take Monthly Averages
  Mav1  = aggregate(c(Acci), list(month=cycle(Acci)), mean)$x       #- 1yr long Mtly Av
  M.av1 = ts(Mav1[cycle(Acci)], start=start(Acci), freq=frequency(Acci))    #- Mtly Av
  Acci.DS = Acci-M.av1                                        #- Subtract from original

  plot(M.av1,   type="o")               # Monthly Average

  plot(Acci,    type="o", main="Acci (Blk) vs Mtly Av (Blue)")
  lines(M.av1,  col="blue")

  plot(Acci.DS, type="o", main="De-seasonalized Acci (Blk - Blue)")
  abline(h=0)

Quadratic trend can be found here too.

  source('https://nmimoto.github.io/R/TS-00.txt')
  Stationarity.tests(Acci.DS)
## Warning in kpss.test(A): p-value smaller than printed p-value
##        KPSS   ADF    PP
## p-val: 0.01 0.885 0.397

Small-Large-Large means all of them points to non-stationarity.

Fit deseasonalized series with ARIMA

  Fit1 = auto.arima(Acci.DS)
  Fit1
## Series: Acci.DS 
## ARIMA(0,1,1) 
## 
## Coefficients:
##           ma1
##       -0.4056
## s.e.   0.1082
## 
## sigma^2 = 66457:  log likelihood = -494.53
## AIC=993.07   AICc=993.24   BIC=997.59
  Randomness.tests(Fit1$resid)

##   B-L test H0: the series is uncorrelated
##   M-L test H0: the square of the series is uncorrelated
##   J-B test H0: the series came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15 BL20  BL25  ML15  ML20    JB      SD
## [1,] 0.755 0.53 0.351 0.557 0.632 0.768 255.614
  Fit2 = auto.arima(Acci.DS, d=0)
  Fit2
## Series: Acci.DS 
## ARIMA(1,0,1) with zero mean 
## 
## Coefficients:
##          ar1      ma1
##       0.9471  -0.3656
## s.e.  0.0451   0.1205
## 
## sigma^2 = 66208:  log likelihood = -501.55
## AIC=1009.1   AICc=1009.46   BIC=1015.93
  Randomness.tests(Fit2$resid)

##   B-L test H0: the series is uncorrelated
##   M-L test H0: the square of the series is uncorrelated
##   J-B test H0: the series came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15 BL20  BL25  ML15  ML20    JB      SD
## [1,] 0.597 0.31 0.231 0.823 0.799 0.926 255.441


ARIMA with Monthly Average model

After we subtracted the seasonality (Monthly average \(S_t\)), We either have ARIMA(0,1,1) or ARIMA(2,0,0).

\[\begin{align*} Y_t &= \hspace{3mm} S_t + X_t \\\\ X_t &= \hspace{3mm} \phi_1 X_{t-1} + \phi_2 X_{t-2} + \epsilon_t \hspace{10mm} \mbox{(if ARIMA(2,0,0))} \end{align*}\]

4. Seasonal Differencing

(Box-Jenkins Method) For guessed seasonality \(s\), define \[ \bigtriangledown_s = (1-B^s) \] (Don’t confuse this with \(\bigtriangledown^s\)). Then we have \[ \bigtriangledown_s X_t \hspace{3mm} = \hspace{3mm} (1-B^s) X_t \hspace{3mm} = \hspace{3mm} X_t - X_{t-s} \\\\ \hspace{3mm} = \hspace{3mm} m_t - m_{t-s} + \underbrace{ S_t - S_{t-s} }_{0} + Y_t - Y_{t-s}. \] since \(S_t=S_{t-s}\), this eliminates the seasonality. Now fit ARIMA to \(\bigtriangledown_s X_t\).

Seasonal Difference on Accidents Data

    plot(Acci, type='o', ylab="num of accidents")

    plot(diff(Acci, 12))          # take seasonal difference (Del_12)

    plot(diff(diff(Acci, 12)))    # take (Del)(Del_12)

    Stationarity.tests(diff(diff(Acci, 12)))
## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
##        KPSS  ADF   PP
## p-val:  0.1 0.01 0.01
    Fit3 = auto.arima(diff(diff(Acci, 12)))
    Fit3
## Series: diff(diff(Acci, 12)) 
## ARIMA(0,0,1)(0,0,1)[12] with non-zero mean 
## 
## Coefficients:
##           ma1     sma1     mean
##       -0.4963  -0.6146  21.0220
## s.e.   0.1351   0.1932  11.9921
## 
## sigma^2 = 98283:  log likelihood = -424.27
## AIC=856.53   AICc=857.27   BIC=864.84
    Randomness.tests(Fit3$resid)

##   B-L test H0: the series is uncorrelated
##   M-L test H0: the square of the series is uncorrelated
##   J-B test H0: the series came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15  BL20  BL25  ML15  ML20    JB      SD
## [1,] 0.588 0.457 0.453 0.385 0.497 0.355 307.622


Seasonal ARIMA model

We took \(\bigtriangledown_{12}\), then \(\bigtriangledown\), then fit MA(1). Therefore our model is \[ \bigtriangledown \bigtriangledown_{12} Y_t \hspace{3mm} = \hspace{3mm} X_t \\ X_t \hspace{3mm} = \hspace{3mm} \epsilon_t + \theta_1 \epsilon_{t-1} \] This is called \[ \mbox{ sARIMA}(p,d,q) (P,D,Q)_{12} \hspace{3mm} \mbox{ model } \\ \\ \mbox{ sARIMA}(0,1,1) (0,1,0)_{12} \hspace{3mm} \mbox{ model } \]

5. Ex: Oil Filter Sales Data


5a. Monthly Average

  plot(Oil, type="o")

  #--- Take Monthly Averages
  D0 = Oil
  Mav1  = aggregate(c(D0), list(month=cycle(D0)), mean)$x  #- 1yr long Mtly Av
  M.av = ts(Mav1[cycle(D0)], start=start(D0), freq=frequency(D0))
  Ds.Oil = D0-M.av                                     #- Subtract from original

  plot(M.av,  type="o")

  plot(Oil, type="o", main="Oil (Blk) vs Mtly Av (Blue)")
  lines(M.av, col="blue")

  plot(Ds.Oil, type="o", main="De-seasonalized Oil (Blk - Blue)")
  abline(h=0)

  Stationarity.tests(Ds.Oil)
## Warning in pp.test(A): p-value smaller than printed p-value
##         KPSS   ADF   PP
## p-val: 0.025 0.039 0.01
  Fit3 = auto.arima(Ds.Oil, stepwise=FALSE, approximation=FALSE)
  Fit3
## Series: Ds.Oil 
## ARIMA(0,1,1) 
## 
## Coefficients:
##           ma1
##       -0.8626
## s.e.   0.0696
## 
## sigma^2 = 378095:  log likelihood = -368.67
## AIC=741.35   AICc=741.62   BIC=745.05
  Fit4 = auto.arima(Ds.Oil, d=0, stepwise=FALSE, approximation=FALSE)
  Fit4
## Series: Ds.Oil 
## ARIMA(0,0,0) with zero mean 
## 
## sigma^2 = 379437:  log likelihood = -376.42
## AIC=754.85   AICc=754.93   BIC=756.72
  Randomness.tests(Fit4$resid)

##   B-L test H0: the series is uncorrelated
##   M-L test H0: the square of the series is uncorrelated
##   J-B test H0: the series came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15  BL20  BL25  ML15  ML20    JB      SD
## [1,] 0.518 0.525 0.241 0.818 0.804 0.173 622.503


Monthly Average model

After we subtracted the seasonality (Monthly average \(S_t\)), it was WN. \[\begin{align*} Y_t &= \hspace{3mm} S_t + X_t \\\\ X_t &= \hspace{3mm} \epsilon_t \end{align*}\]

5b. Sesonal ARIMA Model

  plot(Oil)

  plot(diff(Oil, 12))                # take seasonal difference (Del_12)

  Stationarity.tests(diff(Oil, 12))
## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
##        KPSS  ADF   PP
## p-val: 0.07 0.01 0.01
  Fit4 = auto.arima(diff(Oil, 12), stepwise=FALSE, approximation=FALSE)
  Fit4
## Series: diff(Oil, 12) 
## ARIMA(0,0,0) with non-zero mean 
## 
## Coefficients:
##            mean
##       -217.3889
## s.e.   132.7486
## 
## sigma^2 = 652522:  log likelihood = -291.57
## AIC=587.14   AICc=587.5   BIC=590.31
  Randomness.tests(Fit4$resid)

##   B-L test H0: the series is uncorrelated
##   M-L test H0: the square of the series is uncorrelated
##   J-B test H0: the series came from Normal distribution
##   SD         : Standard Deviation of the series
##      BL15  BL20  BL25  ML15  ML20    JB      SD
## [1,] 0.71 0.645 0.502 0.967 0.993 0.055 807.789


sArima Model

We took \(\bigtriangledown_{12}\), then it looks like a WN. Therefore our model is \[ \bigtriangledown_{12} Y_t \hspace{3mm} = \hspace{3mm} X_t \\ \\ X_t \hspace{3mm} = \hspace{3mm} \epsilon_t \] This is called \[ \mbox{ sARIMA}(p,d,q) (P,D,Q)_{12} \hspace{3mm} \mbox{ model } \\ \\ \mbox{ sARIMA}(0,0,0) (0,1,0)_{12} \hspace{3mm} \mbox{ model } \]

Summary

  • Seasonalilty may be eliminated by MA filter or Seaonal Average

  • Sesonal ARIMA means that you take seasonal difference before fitting ARIMA