w10c: Box-Cox Power Transformation

1. Power Transformation

(Cryer p101) Generalization of taking a log of a time series. \[ {\large f_{\lambda}(x) = \left\{ \begin{array}{ll} \frac{x^\lambda-1}{\lambda} & \mbox{ if } \lambda \ne 0\\\\ ln(x) & \mbox{ if } \lambda=0 \\ \end{array} \right. } \] usually \(-1<\lambda<2\).

One can show \[ \frac{x^\lambda - 1}{\lambda} \to \log(x) \hspace{10mm} \mbox{ as } \lambda \to 0. \]

Sometimes called Normalizing Transformation

This transformation is for positive data only.

If data has negative observations, add a constant.

You can try \(\pm 1, \pm1/2, \pm 1/3, \pm 1/4, 0\) and see if it helps.

2. Copper Data

  D  = read.csv("https://nmimoto.github.io/datasets/copper.csv")
  D1 = ts(D[,2], start=1)    #- extract only second column as time series
  plot(D1, type="o")

  hist(D1)

  library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

  source('https://nmimoto.github.io/R/TS-00.txt')

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

  #   ?Arima    # open help page for Arima()

  #- plot log of price
  plot(log(D1), type="o")

  hist(log(D1))

  #- estimate the best lambda to use
  BoxCox.lambda(D1)

## [1] 0.2283439

  D2 = BoxCox(D1, .228)
  plot(D2, type="o")

  hist(D2)

  qqnorm(D2)

  Randomness.tests(D2)

## 
##     'tseries' version: 0.10-49
## 
##     'tseries' is a package for time series analysis and
##     computational finance.
## 
##     See 'library(help="tseries")' for details.

##   B-L test H0: the series is uncorrelated
##   M-L test H0: the square of the series is uncorrelated
##   J-B test H0: the series came from Normal distribution
##   SD         : Standard Deviation of the series

##      BL15 BL20 BL25  ML15  ML20    JB    SD
## [1,]    0    0    0 0.191 0.383 0.194 0.926

  X = InvBoxCox(D2, .228)    #- to transform BACK
  plot(X)

a) Transform by hand

  # Transform outside of auto.arima
  D2 = BoxCox(D1, .228)

  auto.arima(D2, stepwise=FALSE, approximation=FALSE)

## Series: D2 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     mean
##       0.5883  -0.2406
## s.e.  0.0582   0.1289
## 
## sigma^2 = 0.5679:  log likelihood = -223
## AIC=452   AICc=452.12   BIC=461.85

  Fit01 = Arima(D2, order=c(1,0,0), include.mean=FALSE)
  Fit01

## Series: D2 
## ARIMA(1,0,0) with zero mean 
## 
## Coefficients:
##          ar1
##       0.6115
## s.e.  0.0572
## 
## sigma^2 = 0.5744:  log likelihood = -224.65
## AIC=453.29   AICc=453.36   BIC=459.86

  Randomness.tests(Fit01$resid)

##   B-L test H0: the series is uncorrelated
##   M-L test H0: the square of the series is uncorrelated
##   J-B test H0: the series came from Normal distribution
##   SD         : Standard Deviation of the series

##       BL15  BL20  BL25  ML15 ML20    JB    SD
## [1,] 0.538 0.548 0.455 0.192 0.16 0.016 0.752

  forecast(Fit01, 10)

##     Point Forecast     Lo 80      Hi 80     Lo 95     Hi 95
## 198    -1.22903892 -2.200290 -0.2577876 -2.714440 0.2563619
## 199    -0.75158680 -1.890050  0.3868763 -2.492716 0.9895422
## 200    -0.45961337 -1.654611  0.7353846 -2.287205 1.3679783
## 201    -0.28106461 -1.496529  0.9343999 -2.139957 1.5778279
## 202    -0.17187776 -1.394908  1.0511524 -2.042341 1.6985855
## 203    -0.10510738 -1.330955  1.1207401 -1.979879 1.7696646
## 204    -0.06427568 -1.291175  1.1626237 -1.940656 1.8121050
## 205    -0.03930612 -1.266599  1.1879864 -1.916288 1.8376758
## 206    -0.02403664 -1.251476  1.2034029 -1.901243 1.8531701
## 207    -0.01469898 -1.242193  1.2127955 -1.891990 1.8625918

  plot(forecast(Fit01, 10))

Then we would have to transform back by hand. it’s much better to do this automatically.

b) Transform inside auto.arima()

If we use lambda= option inside auto.arima(), then forecast will be automatically transformed back into original space.

  #- Use Lambda on D1
  auto.arima(D1, lambda=.228, stepwise=FALSE, approximation=FALSE)

## Series: D1 
## ARIMA(1,0,0) with non-zero mean 
## Box Cox transformation: lambda= 0.228 
## 
## Coefficients:
##          ar1     mean
##       0.5883  -0.2406
## s.e.  0.0582   0.1289
## 
## sigma^2 = 0.5679:  log likelihood = -223
## AIC=452   AICc=452.12   BIC=461.85

  Fit02 = Arima(D1, order=c(1,0,0), lambda=.228, include.mean=FALSE)
  Fit02

## Series: D1 
## ARIMA(1,0,0) with zero mean 
## Box Cox transformation: lambda= 0.228 
## 
## Coefficients:
##          ar1
##       0.6115
## s.e.  0.0572
## 
## sigma^2 = 0.5744:  log likelihood = -224.65
## AIC=453.29   AICc=453.36   BIC=459.86

  Randomness.tests(Fit02$resid)

##   B-L test H0: the series is uncorrelated
##   M-L test H0: the square of the series is uncorrelated
##   J-B test H0: the series came from Normal distribution
##   SD         : Standard Deviation of the series

##       BL15  BL20  BL25  ML15 ML20    JB    SD
## [1,] 0.538 0.548 0.455 0.192 0.16 0.016 0.752

  forecast(Fit02, 10)

##     Point Forecast      Lo 80     Hi 80      Lo 95    Hi 95
## 198      0.2364182 0.04713407 0.7666891 0.01453752 1.282934
## 199      0.4384818 0.08436469 1.4488279 0.02510431 2.440725
## 200      0.6153759 0.12527460 1.9736036 0.03944955 3.289327
## 201      0.7479086 0.16033733 2.3328044 0.05311317 3.848925
## 202      0.8391748 0.18658431 2.5658196 0.06400887 4.202652
## 203      0.8990764 0.20472609 2.7130030 0.07183593 4.422279
## 204      0.9373005 0.21667851 2.8046794 0.07711566 4.557575
## 205      0.9612860 0.22432751 2.8613452 0.08054352 4.640621
## 206      0.9761854 0.22913666 2.8962183 0.08271780 4.691508
## 207      0.9853842 0.23212785 2.9176253 0.08407748 4.722660

  plot(forecast(Fit02, 10))

Rolling 1-step forecast

  source('https://nmimoto.github.io/R/TS-00.txt')

  Pred = Rolling1step.forecast(D1,
                               window.size=150,
                               Arima.order=c(1,0,0),
                               include.mean=FALSE,
                               lambda=0.228)

## 
##   Total length 197 , window size 150 .
##   Last 47 obs retrospectively forecasted with Rolling 1-step
##       prediction using same order and fized window size.
## 
##   Average Prediction Error:  -0.08
##   root Mean Squared Error of Prediction:   0.4416

  Pred

## $Actual
## Time Series:
## Start = 151 
## End = 197 
## Frequency = 1 
##  [1] 0.299 0.346 0.234 0.366 1.345 1.868 0.295 0.260 0.306 0.432
## [11] 0.161 0.224 0.203 0.323 0.622 0.599 0.730 0.939 1.314 2.027
## [21] 1.301 1.117 1.534 2.341 1.067 1.189 0.791 0.526 1.460 1.287
## [31] 0.259 0.316 0.250 0.664 0.715 0.765 0.322 0.708 0.855 0.478
## [41] 0.014 0.097 0.525 0.191 0.492 0.208 0.068
## 
## $Predicted
## Time Series:
## Start = 151 
## End = 197 
## Frequency = 1 
##  [1] 0.6715176 0.5277462 0.5662849 0.4641295 0.5791707 1.1856627
##  [7] 1.4417316 0.5207068 0.4881054 0.5302150 0.6280443 0.3734288
## [13] 0.4380231 0.4033290 0.5156391 0.7503588 0.7332452 0.8265025
## [19] 0.9624080 1.1828987 1.5569153 1.1752567 1.0696378 1.3021239
## [25] 1.7353007 1.0406630 1.1126303 0.8677404 0.6824487 1.2641255
## [31] 1.1679785 0.4636781 0.5130638 0.4503854 0.7828232 0.8137655
## [37] 0.8507556 0.5199959 0.8157138 0.9106253 0.6549077 0.1330519
## [43] 0.2966850 0.6973821 0.4177320 0.6763140 0.4330218

Summary

Box-Cox is a generalization of log that tries to maximize the normality of the data.
Does not make sense to apply to series with trend
Normality is not a concern in model estimation.
Log (lambda=0) is the most popular transformation.