This is called ARIMA modeling.
Recall the backward operator \(B\), \[ B \, X_t = X_{t-1}. \]
Then define difference operator \(\bigtriangledown\) (del), \[ \bigtriangledown = (1-B). \]
For example, \[
\bigtriangledown X_t = (1-B)X_t = X_t - X_{t-1} \\ \\
\bigtriangledown^2 X_t = (1-B)(1-B) X_t = X_t - 2X_{t-1} + X_{t-2}
\]
Suppose your time series have a line plus a zero-mean staionary noise. \[ Y_t = a + bt + X_t \] That means \[ Y_{t-1} = a + b(t-1) + X_{t-1} \\ \\ \bigtriangledown Y_t = Y_t - Y_{t-1} = b + X_t - X_{t-1} \]
If \(X_t\) is stationary, then \(X_t-X_{t-1}\) is also stationary. Thus we have now \[ \bigtriangledown Y_t = \mu + M_t \] We can try to model this with ARMA(p,q) with intercept \(\mu = b\).
set.seed(42352)
= 1:100
t = 3 - 10*t + arima.sim(n=100, list(ar = c(.7,.2)), sd=100)
Y
= diff(Y) # take the difference
Y2
layout(matrix(1:2,1,2))
plot(Y,type="o") # Y is simulated data
plot(Y2, type="o" )
mean(Y2)
## [1] -9.461098
If \(m_t\) is quadratic, then \[\begin{align} Y_t &= a + bt + c t^2 + X_t \\\\ Y_{t-1} &= a + b(t-1) + c (t-1)^2+ X_{t-1} \\\\ Y_{t-2} &= a + b(t-2) + c (t-2)^2 + X_{t-2}. \end{align}\]
That means
\[\begin{align} \bigtriangledown^2 Y_t &= Y_t - 2Y_{t-1} + Y_{t-2} \\\\ &= 2c + X_t - 2X_{t-1} + X_{t-2} \end{align}\]
If \(X_t\) is stationary, so is \(X_t - 2X_{t-1} + X_{t-2}\).
If \(m_t\) is polynomial of deg \(k\), with coefficients $c_0, c_1, $ then applying \(k\) power of difference operator will remove the trend. \[ \bigtriangledown^k Y_t = k! c_k + \bigtriangledown^k X_t. \]
Then you will end up with some statonary series \(\bigtriangledown^k X_t\) with constant trend \(k! c_k\).
= 1:100
t = t^2
tsq = 3 - .5*t + .1*tsq + arima.sim(n=100, list(ar = c(.7,.2)))*10
Y
layout(matrix(1:2, 1, 2))
plot(Y,type="o")
plot(diff(Y),type="o")
plot(diff(diff(Y)),type="o")
- Box-Jenkins Differencing Turns many non-stationary trends into constants.
If you difference linar trend \(a+bt\) once, it becomes constant \(b\).
If you difference quadratic trend \(a+bt+ct^2\) twice, it becomes constant \(2c\).
If your data is from (Linear Trend + ARMA), then
Suppose we have a model with random trend \[ Y_t = M_t + X_t, \] where \(X_t\) is a stationary series and \(M_t\) is a random walk: \[ M_t = \sum_{i=1}^t e_i \mbox{ where } e_i \sim_{iid} N(0, 1). \]
With iid errors \((e_1,e_2,e_3,\ldots)\) random walk is generated as
\[\begin{align} M_1 &= e_1 \\ M_2 &= e_1 + e_2 \\ M_3 &= e_1 + e_2 + e_3 \\ M_4 &= e_1 + e_2 + e_3 + e_4 \\ &\vdots \end{align}\]
We can use cumsum() function in R to do this easily.
Mean of RW is \[ E[M_t] = E\Big(\sum_{i=1}^t e_i\Big) = \sum_{i=1}^t E(e_i) = 0 \]
Variance of RW is \[ V[M_t] = V\Big(\sum_{i=1}^t e_i\Big) = \sum_{i=1}^t V(e_i) = \sigma^2 t \]
#- RW without drift
= cumsum(rnorm(100))
M plot(M, type="l", ylim=c(-40,40), main="50 random walks")
for (i in 1:50) {
= rnorm(100) #- copy and paste these 3 lines many times
e = cumsum(e)
M lines(M)
}
Note that mean is 0, but variance increase as time goes on.
Since \[ M_t = \sum_{i=1}^t e_i, \] we have \[ \bigtriangledown M_t = M_t - M_{t-1} = \sum_{i=1}^t e_i - \sum_{i=1}^{n-1} e_i = \epsilon_t \] And we know that \(\epsilon_t \sim N(0,\sigma)\).
So if \(M_t\) is Random Walk, then
\(\bigtriangledown M_t\) should look
like iid \(N(0,\sigma)\) noise.
= rnorm(100) #- 100 random number from N(0,1)
e = cumsum(e)
M
layout(matrix(1:2, 1, 2))
plot(M,type="o", main="Random Walk")
plot(diff(M), type="o", main="difference of RW");
acf(diff(M), main="ACF of the RW difference")
hist(diff(M))
Random Walk with drift \(\delta\) is \[ M_t = \sum_{i=1}^t e_i \mbox{ where } e_i \sim_{iid} N(\delta, 1). \]
Now each step have little more chance of being positive than negative:
If \(M_t\) is Random Walk with drift, then \(\bigtriangledown M_t\) should look like iid \(N(\delta,\sigma)\) noise.
Mean of RW is now \[ E[M_t] = E\Big(\sum_{i=1}^t e_i\Big) = \sum_{i=1}^t E(e_i) = \delta \]
Variance of RW is unchanged: \[ V[M_t] = V\Big(\sum_{i=1}^t e_i\Big) = \sum_{i=1}^t V(e_i) = \sigma^2 t \]
layout(matrix(1:2, 1, 2))
#- RW with drift (delta = .3)
= rnorm(100, .3, 1) #- 100 random number from N(1,1)
e = cumsum(e)
M
#- repeat this 50 times
plot(M,type="l", ylim=c(-40,40), main="50 RW with drift .3")
for (i in 1:50) {
= rnorm(100, .3, 1)
e = cumsum(e)
M lines(M)
}
#- RW without drift
= rnorm(100) #- 100 random number from N(0,1)
e = cumsum(e)
M
#- repeat this 50 times
plot(M,type="l", ylim=c(-40,40), main="50 RW w/o drift")
for (i in 1:50) {
= rnorm(100) #- copy and paste these 3 lines many times
e = cumsum(e)
M lines(M)
}
set.seed(87656)
= rnorm(100, .3, 1)
e = cumsum(e)
M
plot(M,type="o")
Does this RW has drift or not?
Test if mean of the difference is 0.
plot(diff(M), type="o", main="difference of RW")
c(mean(diff(M)), sd(diff(M)), 1.96*sd(diff(M))/sqrt(100) )
## [1] 0.3292302 1.0239543 0.2006950
95% CI for the mean is \(.33 \pm
.20\), so the sample mean is significantly different from 0. We
conclude that the above RW has a drift.
If \(Y_t = M_t + X_t\), then \[
Y_t = \sum_{i=1}^t e_i + X_t
\\Y_{t-1} = \sum_{i=1}^{t-1} e_i + X_{t-1}
\] That means \[
\bigtriangledown Y_t = Y_t - Y_{t-1}
= \epsilon_t + X_t - X_{t-1}
\] Since \(X_t, \epsilon_t\) are
stationary, so is \(\epsilon_t+X_t-X_{t-1}\).
If you difference (RW without drift), it becomes N(\(0,\sigma^2\)) iid noise.
If you difference (RW with drift \(\delta\)), it becomes N(\(\delta,\sigma^2\)) iid noise.
When difference is applied to (RW w/o drift + ARMA), it should result in stationary series with mean 0, that can be fit with another ARMA.
If you difference (RW w drift + ARMA), it should result in stationary series with mean \(\delta\), that can be fit with another ARMA.
Daily adjusted close price of Apple, Inc. Between 10/17/2007 to 8/4/2008 from Yahoo! finance website
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(quantmod) # Do install.packages("quantmod") if not installed
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
source('https://nmimoto.github.io/R/TS-00.txt')
getSymbols("AAPL") #- download from Yahoo!
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## [1] "AAPL"
= as.ts(AAPL$AAPL.Adjusted[400:200])
X plot(X, type="o")
layout(matrix(1:2, 1, 2))
plot(diff(X), type="o")
acf(diff(X))
= auto.arima(X, stepwise=FALSE, approximation=FALSE)
Fit1 Fit1
## Series: X
## ARIMA(0,1,0)
##
## sigma^2 = 0.01962: log likelihood = 109.34
## AIC=-216.67 AICc=-216.65 BIC=-213.38
# Arima(0,1,0) w/o drift is suggested
# Force drift term. Is it significant?
Arima(X, order=c(0,1,0), include.drift=TRUE)
## Series: X
## ARIMA(0,1,0) with drift
##
## Coefficients:
## drift
## -0.0030
## s.e. 0.0099
##
## sigma^2 = 0.01971: log likelihood = 109.38
## AIC=-214.76 AICc=-214.7 BIC=-208.17
# Force AR(1) after the difference.
Arima(X, order=c(1,1,0))
## Series: X
## ARIMA(1,1,0)
##
## Coefficients:
## ar1
## -0.0261
## s.e. 0.0706
##
## sigma^2 = 0.0197: log likelihood = 109.41
## AIC=-214.81 AICc=-214.75 BIC=-208.21
# Force MA(1) after the difference.
Arima(X, order=c(0,1,1))
## Series: X
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## -0.0242
## s.e. 0.0680
##
## sigma^2 = 0.01971: log likelihood = 109.4
## AIC=-214.8 AICc=-214.74 BIC=-208.2
Randomness.tests(Fit1$residuals)
##
## 'tseries' version: 0.10-49
##
## 'tseries' is a package for time series analysis and
## computational finance.
##
## See 'library(help="tseries")' for details.
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.389 0.563 0.52 0.096 0.155 0.001 0.14