(Cryer p101) Generalization of taking a log of a time series. \[ {\large f_{\lambda}(x) = \left\{ \begin{array}{ll} \frac{x^\lambda-1}{\lambda} & \mbox{ if } \lambda \ne 0\\\\ ln(x) & \mbox{ if } \lambda=0 \\ \end{array} \right. } \] usually \(-1<\lambda<2\).
One can show \[ \frac{x^\lambda - 1}{\lambda} \to \log(x) \hspace{10mm} \mbox{ as } \lambda \to 0. \]
Sometimes called Normalizing Transformation
This transformation is for positive data only.
If data has negative observations, add a constant.
You can try \(\pm 1, \pm1/2, \pm 1/3, \pm
1/4, 0\) and see if it helps.
= read.csv("https://nmimoto.github.io/datasets/copper.csv")
D = ts(D[,2], start=1) #- extract only second column as time series
D1 plot(D1, type="o")
hist(D1)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
source('https://nmimoto.github.io/R/TS-00.txt')
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
# ?Arima # open help page for Arima()
#- plot log of price
plot(log(D1), type="o")
hist(log(D1))
#- estimate the best lambda to use
BoxCox.lambda(D1)
## [1] 0.2283439
= BoxCox(D1, .228)
D2 plot(D2, type="o")
hist(D2)
qqnorm(D2)
Randomness.tests(D2)
##
## 'tseries' version: 0.10-49
##
## 'tseries' is a package for time series analysis and
## computational finance.
##
## See 'library(help="tseries")' for details.
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0 0 0 0.191 0.383 0.194 0.926
= InvBoxCox(D2, .228) #- to transform BACK
X plot(X)
# Transform outside of auto.arima
= BoxCox(D1, .228)
D2
auto.arima(D2, stepwise=FALSE, approximation=FALSE)
## Series: D2
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## 0.5883 -0.2406
## s.e. 0.0582 0.1289
##
## sigma^2 = 0.5679: log likelihood = -223
## AIC=452 AICc=452.12 BIC=461.85
= Arima(D2, order=c(1,0,0), include.mean=FALSE)
Fit01 Fit01
## Series: D2
## ARIMA(1,0,0) with zero mean
##
## Coefficients:
## ar1
## 0.6115
## s.e. 0.0572
##
## sigma^2 = 0.5744: log likelihood = -224.65
## AIC=453.29 AICc=453.36 BIC=459.86
Randomness.tests(Fit01$resid)
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.538 0.548 0.455 0.192 0.16 0.016 0.752
forecast(Fit01, 10)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 198 -1.22903892 -2.200290 -0.2577876 -2.714440 0.2563619
## 199 -0.75158680 -1.890050 0.3868763 -2.492716 0.9895422
## 200 -0.45961337 -1.654611 0.7353846 -2.287205 1.3679783
## 201 -0.28106461 -1.496529 0.9343999 -2.139957 1.5778279
## 202 -0.17187776 -1.394908 1.0511524 -2.042341 1.6985855
## 203 -0.10510738 -1.330955 1.1207401 -1.979879 1.7696646
## 204 -0.06427568 -1.291175 1.1626237 -1.940656 1.8121050
## 205 -0.03930612 -1.266599 1.1879864 -1.916288 1.8376758
## 206 -0.02403664 -1.251476 1.2034029 -1.901243 1.8531701
## 207 -0.01469898 -1.242193 1.2127955 -1.891990 1.8625918
plot(forecast(Fit01, 10))
Then we would have to transform back by hand. it’s much better to do
this automatically.
If we use lambda= option inside auto.arima(), then forecast will be automatically transformed back into original space.
#- Use Lambda on D1
auto.arima(D1, lambda=.228, stepwise=FALSE, approximation=FALSE)
## Series: D1
## ARIMA(1,0,0) with non-zero mean
## Box Cox transformation: lambda= 0.228
##
## Coefficients:
## ar1 mean
## 0.5883 -0.2406
## s.e. 0.0582 0.1289
##
## sigma^2 = 0.5679: log likelihood = -223
## AIC=452 AICc=452.12 BIC=461.85
= Arima(D1, order=c(1,0,0), lambda=.228, include.mean=FALSE)
Fit02 Fit02
## Series: D1
## ARIMA(1,0,0) with zero mean
## Box Cox transformation: lambda= 0.228
##
## Coefficients:
## ar1
## 0.6115
## s.e. 0.0572
##
## sigma^2 = 0.5744: log likelihood = -224.65
## AIC=453.29 AICc=453.36 BIC=459.86
Randomness.tests(Fit02$resid)
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.538 0.548 0.455 0.192 0.16 0.016 0.752
forecast(Fit02, 10)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 198 0.2364182 0.04713407 0.7666891 0.01453752 1.282934
## 199 0.4384818 0.08436469 1.4488279 0.02510431 2.440725
## 200 0.6153759 0.12527460 1.9736036 0.03944955 3.289327
## 201 0.7479086 0.16033733 2.3328044 0.05311317 3.848925
## 202 0.8391748 0.18658431 2.5658196 0.06400887 4.202652
## 203 0.8990764 0.20472609 2.7130030 0.07183593 4.422279
## 204 0.9373005 0.21667851 2.8046794 0.07711566 4.557575
## 205 0.9612860 0.22432751 2.8613452 0.08054352 4.640621
## 206 0.9761854 0.22913666 2.8962183 0.08271780 4.691508
## 207 0.9853842 0.23212785 2.9176253 0.08407748 4.722660
plot(forecast(Fit02, 10))
source('https://nmimoto.github.io/R/TS-00.txt')
= Rolling1step.forecast(D1,
Pred window.size=150,
Arima.order=c(1,0,0),
include.mean=FALSE,
lambda=0.228)
##
## Total length 197 , window size 150 .
## Last 47 obs retrospectively forecasted with Rolling 1-step
## prediction using same order and fized window size.
##
## Average Prediction Error: -0.08
## root Mean Squared Error of Prediction: 0.4416
Pred
## $Actual
## Time Series:
## Start = 151
## End = 197
## Frequency = 1
## [1] 0.299 0.346 0.234 0.366 1.345 1.868 0.295 0.260 0.306 0.432
## [11] 0.161 0.224 0.203 0.323 0.622 0.599 0.730 0.939 1.314 2.027
## [21] 1.301 1.117 1.534 2.341 1.067 1.189 0.791 0.526 1.460 1.287
## [31] 0.259 0.316 0.250 0.664 0.715 0.765 0.322 0.708 0.855 0.478
## [41] 0.014 0.097 0.525 0.191 0.492 0.208 0.068
##
## $Predicted
## Time Series:
## Start = 151
## End = 197
## Frequency = 1
## [1] 0.6715176 0.5277462 0.5662849 0.4641295 0.5791707 1.1856627
## [7] 1.4417316 0.5207068 0.4881054 0.5302150 0.6280443 0.3734288
## [13] 0.4380231 0.4033290 0.5156391 0.7503588 0.7332452 0.8265025
## [19] 0.9624080 1.1828987 1.5569153 1.1752567 1.0696378 1.3021239
## [25] 1.7353007 1.0406630 1.1126303 0.8677404 0.6824487 1.2641255
## [31] 1.1679785 0.4636781 0.5130638 0.4503854 0.7828232 0.8137655
## [37] 0.8507556 0.5199959 0.8157138 0.9106253 0.6549077 0.1330519
## [43] 0.2966850 0.6973821 0.4177320 0.6763140 0.4330218
Box-Cox is a generalization of log that tries to maximize the normality of the data.
Does not make sense to apply to series with trend
Normality is not a concern in model estimation.
Log (lambda=0) is the most popular transformation.