Suppose \(\bigtriangledown Y_t = X_t\) and \(X_t \sim ARMA(p,q)\).
Since we know how to forecast ARMA, we know how to get \[ \hat X_n(h) \hspace{3mm} = \hspace{3mm} a_1 X_n + \cdots + a_n X_n \]
How can we calculate \(\hat{Y_n(h)}\) so that MSE, \[ E\bigg[\Big(Y_{n+h}-\hat Y_n(h)\Big)^2\bigg] \] is minimized?
Consider two vectors, \[ (1,Y_0, Y_1, \ldots, Y_n), \hspace{10mm} (1,Y_0, X_1, \ldots, X_n) \\ \]
They actually span the same vector space, because \[ \left[ \begin{array}{c} Y_1 - Y_0 = X_1 \\ Y_2 - Y_1 = X_2 \\ Y_3 - Y_2 = X_3 \\ \vdots \\ Y_n - Y_{n-1} = X_n \\ \end{array} \right] \hspace{2mm} \iff \hspace{2mm} \left[ \begin{array}{c} Y_1 =X_1 + Y_0 \\ Y_2 =X_2 + Y_1 \\ Y_3 =X_3 + Y_2 \\ \vdots \\ Y_n =X_n + Y_{n-1} \\ \end{array} \right] \]
Because \(Y_{n+1}-Y_n = X_{n+1}\), we can rewrite MSE in \(Y_t\) as \[ E\bigg[\Big(Y_{n+1}-\hat Y_n(1)\Big)^2\bigg] \hspace{3mm} = \hspace{3mm} E\bigg[ \Big( (X_{n+1}+Y_n) - \hat Y_n(1)\Big)^2\bigg]. \]
Also, note that we can write
\[\begin{align} \hat Y_n(1) &= \hspace{3mm} a'_0 + a'_1 Y_n + \cdots + a'_n Y_1 \\ \\ &= \hspace{3mm} a_0 + a_1 X_n + \cdots + a_n X_1 + b_0 Y_0 \\ \\ &= \hspace{3mm} \hat X_n(1) + b_0 Y_0. \end{align}\]
That means \[ MSE \hspace{3mm} = \hspace{3mm} E\bigg[ \Big( (X_{n+1}+Y_n) - (\hat X_n(1) + b_0 Y_0) \Big)^2\bigg]. \]
If \(Y_0\) is uncorrelated with \((X_1, X_2, \ldots, X_n)\), \(b_0=0\), it drops out, and we get, given the observations \(Y_1, \ldots, Y_n\),
\[\begin{align} MSE &= \hspace{3mm} E\bigg[ \Big( X_{n+1} - \hat X_n(1) + Y_n \Big)^2\bigg] \\ \\ &= \hspace{3mm} E\bigg[ \Big( X_{n+1} - \hat X_n(1) \Big)^2 + 2 \, Y_n \Big(X_{n+1}- \hat X_n(1)\Big) + Y_n^2 \bigg] \\ \\ &= \hspace{3mm} E\bigg[ \Big( X_{n+1} - \hat X_n(1) \Big)^2 \bigg] + 2 \, Y_n E\bigg[ X_{n+1} - \hat X_n(1) \bigg] + Y_n^2 \\ \\ &= \hspace{3mm} E\bigg[ \Big( X_{n+1} - \hat X_n(1) \Big)^2 \bigg] + 0 + c. \end{align}\]
We know that \(\hat X(1)\) is the minimizer of this.
Therefore, \[ \hat Y_n(1) \hspace{3mm} = \hspace{3mm} Y_n + \hat X_n(1) \]
Similarly, \[ \hat Y_n(h) \hspace{3mm} = \hspace{3mm} Y_n + \hat X_n(h) \]
So forecast ARMA(p,q) as usual, then add it to the last observation
\(X_n\).
(Cryer p88) Monthly spot price for crude oil, Cushing, OK (in U.S. dollars per barrel), 01/1986 - 01/2006.
data is inside TSA package.
#--- Load package TSA ---
= acf # keep the default acf()
acf.orig library(TSA) # load TSA package, which will replace acf()
= acf.orig # overwrite the default acf() back
acf
library(forecast)
source('https://nmimoto.github.io/R/TS-00.txt')
data(oil.price)
= oil.price # rename the data
Oil plot(Oil, ylab='Price per Barrel',type='o')
layout(matrix(1:2, 1, 2))
acf(Oil)
pacf(Oil)
layout(1)
Stationarity.tests(Oil)
## KPSS ADF PP
## p-val: 0.01 0.99 0.94
Original series is not stationary (small, Large, Large)
#- Take difference of lag 1
plot(diff(Oil), ylab='Change in Oil',type='l')
#- Take log difference of lag 1
plot(diff(log(Oil)),ylab='Change in Log(Oil)',type='l')
Stationarity.tests(diff(log(Oil)))
## KPSS ADF PP
## p-val: 0.1 0.01 0.01
layout(matrix(1:2, 1, 2))
acf( diff(log(Oil)) )
pacf( diff(log(Oil)) )
layout(1)
we should take a log of the original price before differencing
differencing after log makes the process stationary (Large, small, small)
Now we will go back to the original data, and use ARIMA(\(p, 1, q\)) model with \(\lambda=0\) (Log).
we will use lambda=0 option inside auto.arima() to take Log.
#--- Look for best ARIMA ---
= auto.arima(Oil, d=1, lambda=0, stepwise=FALSE, approximation=FALSE)
Fit1 Fit1
## Series: Oil
## ARIMA(0,1,1)
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ma1
## 0.2956
## s.e. 0.0693
##
## sigma^2 = 0.006717: log likelihood = 260.29
## AIC=-516.58 AICc=-516.53 BIC=-509.62
Randomness.tests(Fit1$residuals)
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.251 0.143 0.226 0.018 0.08 0 0.082
#- plot forecast of ARIMA
plot(forecast(Fit1))
#- look at forecasted values
forecast(Fit1, 10)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Feb 2006 67.19173 60.49258 74.63277 57.22103 78.89982
## Mar 2006 67.19173 56.57998 79.79376 51.65874 87.39526
## Apr 2006 67.19173 53.96381 83.66215 48.05071 93.95759
## May 2006 67.19173 51.91027 86.97179 45.28255 99.70129
## Jun 2006 67.19173 50.19094 89.95108 43.00900 104.97172
## Jul 2006 67.19173 48.69896 92.70688 41.06918 109.92984
## Aug 2006 67.19173 47.37416 95.29939 39.37287 114.66598
## Sep 2006 67.19173 46.17873 97.76642 37.86358 119.23670
## Oct 2006 67.19173 45.08706 100.13358 36.50325 123.68017
## Nov 2006 67.19173 44.08093 102.41910 35.26484 128.02353
ARIMA(0,1,1) was chosen by auto.arima at home AICC=-516.53.
where is the forecast tending to?
does it make sense to check for drift term?
Arima(Oil, lambda=0, order=c(0,1,1), include.drift=TRUE)
## Series: Oil
## ARIMA(0,1,1) with drift
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ma1 drift
## 0.2939 0.0041
## s.e. 0.0696 0.0068
##
## sigma^2 = 0.006735: log likelihood = 260.47
## AIC=-514.94 AICc=-514.83 BIC=-504.49
source('https://nmimoto.github.io/R/TS-00.txt')
= Rolling1step.forecast(Oil,
Pred window.size=200,
Arima.order=c(0,1,1),
include.mean=TRUE,
lambda=0)
##
## Total length 241 , window size 200 .
## Last 41 obs retrospectively forecasted with Rolling 1-step
## prediction using same order and fized window size.
##
## Average Prediction Error: 0.6882
## root Mean Squared Error of Prediction: 3.4253
Pred
## $Actual
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 2002 29.66 28.84
## 2003 32.95 35.83 33.51 28.17 28.11 30.66 30.75 31.57 28.31 30.34
## 2004 34.31 34.68 36.74 36.75 40.27 38.02 40.78 44.90 45.94 53.28
## 2005 46.84 48.15 54.19 52.98 49.83 56.35 58.99 64.98 65.59 62.26
## 2006 65.48
## Nov Dec
## 2002 26.35 29.46
## 2003 31.11 32.13
## 2004 48.47 43.15
## 2005 58.32 59.41
## 2006
##
## $Predicted
## Jan Feb Mar Apr May Jun Jul
## 2002
## 2003 30.50791 33.61183 36.52679 32.63263 26.88306 28.50607 31.38438
## 2004 32.48065 34.90391 34.60949 37.43075 36.54014 41.49565 37.05008
## 2005 42.19800 48.28976 48.10695 56.09574 52.14670 49.19231 58.55757
## 2006 59.94085
## Aug Sep Oct Nov Dec
## 2002 28.63313 29.95311 28.56043 25.78242
## 2003 30.54957 31.91767 27.22245 31.37592 31.02924
## 2004 41.90884 45.81231 45.97783 55.69972 46.61958
## 2005 59.13204 66.67239 65.30059 61.46572 57.49142
## 2006