w8d: Parameter Estimation

1. MLE for ARMA(p,q)

MLE

To constuct Maximum Likelihood Estimator, we need distribution of \(\epsilon_t\).
Suppose \(\epsilon_t \sim N(0,\sigma^2)\). Then causal ARMA(\(p,q\)) model \(X_t\) has to be normal as well, \[ X_t \hspace{3mm} = \hspace{3mm} \sum_{i=0}^\infty \psi_i e_{t-i} \\ \\ X_t \sim N\Big(0, \gamma(0)\Big) \] We know that \(X_{t+1}\) has same distribution, and covariance between \(X_t\) and \(X_{t+1}\) is \(\gamma(1)\).

That means, \(\{X_1, \ldots, X_n\}\) is \(n\)-dimentional multivariate normal vector. \[ \left[ \begin{array}{ccccc} X_1 \\ X_2 \\ \vdots \\ X_n \\ \end{array} \right] \sim N_n \Big(0, \mathbf \Sigma\Big), \hspace{5mm} \mbox{ where } {\small \mathbf \Sigma = \left[ \begin{array}{c c c c} \gamma(0) &\gamma(1) &\cdots &\gamma(n-1) \\ \gamma(1) &\gamma(0) &\cdots &\gamma(n-2) \\ \vdots &\vdots & & \vdots \\ \gamma(n-1) &\gamma(n-2) &\cdots &\gamma(0) \\ \end{array} \right]} \]

Mean of each element is 0, and covariance matrix is \[ \mathbf \Sigma = \mathbf \Gamma_n \]

We know that joint pdf of multivariate Normal vector with covariance matrix \(\mathbf \Gamma_n\) is \[ L(\phi, \theta, \sigma) = \frac{1}{(2 \pi)^{n/2} (\mbox{det}\mathbf \Gamma_n)^{1/2} } \exp\Big( - \frac{1}{2} \mathbf X'_n \mathbf \Gamma_n^{-1} \mathbf X_n \Big) \]

When MLE is used, the parameter estimators are searched numerically, under the assumption that \(\epsilon_t\) are Normal (Gaussian Likelihood).
For ARMA model, you will not be penalized so much when the assumption of Normality is violated.

2. Large-Sample Property

For large sample from an ARMA(\(p,q\)) process, MLE has sample distribution, \[ {\large \hat \beta \approx N\Big(\beta, V(\hat \beta)\Big) } \] where \(\beta = (\phi_1, \ldots, \phi_p, \theta_1, \ldots, \theta_q, \sigma)^T\)

Ex: AR(p)

This is same as asymptotic covariance matrix of Yule-Walker estimators.

\[\begin{align*} \mbox{ AR(1): } \hspace{10mm} V(\hat \phi_1) &= \frac{\sigma^2}{n} (1-\phi_1^2) \\\\ \mbox{ AR(2): } \hspace{3mm} V(\hat \phi_1, \hat \phi_2) &= \frac{\sigma^2}{n} \left[ \begin{array}{ccccc} 1-\phi_2^2 & -\phi_1 (1+\phi_2) \\ -\phi_1 (1+\phi_2) & 1-\phi_2^2 \\ \end{array} \right] \end{align*}\]

Ex: MA(q)

\[\begin{align*} \mbox{ MA(1): } \hspace{10mm} V(\hat \theta_1) &= \frac{\sigma^2}{n} (1-\theta_1^2) \\\\ \mbox{ MA(2): } \hspace{3mm} V(\hat \theta_1, \hat \theta_2) &= \frac{\sigma^2}{n} \left[ \begin{array}{ccccc} 1-\theta_2^2 & -\theta_1 (1+\theta_2) \\ -\theta_1 (1+\theta_2) & 1-\theta_2^2 \\ \end{array} \right] \end{align*}\]

Ex: ARMA(1,1)

\[\begin{align*} V(\phi_1,\theta_1) \hspace{3mm} = \hspace{3mm} \Big(\frac{\sigma^2}{n}\Big) \frac{1- \phi_1 \theta_1}{(\phi_1-\theta_1)^2} \left[ \begin{array}{ccccc} (1-\phi_1^2)(1-\phi_1 \theta_1) & -(1-\theta_1^2)(1-\phi_1^2) \\ -(1-\theta_1^2)(1-\phi_1^2) & (1-\phi_1^2)(1-\phi_1\theta_1) \\ \end{array} \right] \end{align*}\]

3. Finite Sample Simulation

1-time simulation and MLE for n=100

  mu = 5
  X  = arima.sim(n = 100, list(ar = c(0.7), ma = c(0.4)) ) + mu
  plot(X, type="o")

  Fit1 = arima(X, order=c(1,0,1))
  Fit1

## 
## Call:
## arima(x = X, order = c(1, 0, 1))
## 
## Coefficients:
##          ar1     ma1  intercept
##       0.7738  0.3631     4.6714
## s.e.  0.0670  0.0905     0.5992
## 
## sigma^2 estimated as 1.062:  log likelihood = -145.66,  aic = 299.32

  str(Fit1)          # see what's inside

## List of 14
##  $ coef     : Named num [1:3] 0.774 0.363 4.671
##   ..- attr(*, "names")= chr [1:3] "ar1" "ma1" "intercept"
##  $ sigma2   : num 1.06
##  $ var.coef : num [1:3, 1:3] 0.00449 -0.00247 -0.00078 -0.00247 0.00818 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:3] "ar1" "ma1" "intercept"
##   .. ..$ : chr [1:3] "ar1" "ma1" "intercept"
##  $ mask     : logi [1:3] TRUE TRUE TRUE
##  $ loglik   : num -146
##  $ aic      : num 299
##  $ arma     : int [1:7] 1 1 0 0 1 0 0
##  $ residuals: Time-Series [1:100] from 1 to 100: -0.116 0.416 -2.176 -1.79 -0.252 ...
##  $ call     : language arima(x = X, order = c(1, 0, 1))
##  $ series   : chr "X"
##  $ code     : int 0
##  $ n.cond   : int 0
##  $ nobs     : int 100
##  $ model    :List of 10
##   ..$ phi  : num 0.774
##   ..$ theta: num 0.363
##   ..$ Delta: num(0) 
##   ..$ Z    : num [1:2] 1 0
##   ..$ a    : num [1:2] -0.685 0.178
##   ..$ P    : num [1:2, 1:2] 0 0 0 0
##   ..$ T    : num [1:2, 1:2] 0.774 0 1 0
##   ..$ V    : num [1:2, 1:2] 1 0.363 0.363 0.132
##   ..$ h    : num 0
##   ..$ Pn   : num [1:2, 1:2] 1 0.363 0.363 0.132
##  - attr(*, "class")= chr "Arima"

  Fit1$coef          # parameter estimates (MLE)

##       ar1       ma1 intercept 
## 0.7737811 0.3630587 4.6714018

  Fit1$var.coef      # variance of MLE using asymptotic formula

##                     ar1           ma1     intercept
## ar1        0.0044875716 -0.0024678955 -0.0007804059
## ma1       -0.0024678955  0.0081815327  0.0003076798
## intercept -0.0007804059  0.0003076798  0.3590977368

  c(Fit1$var.coef[1,1], Fit1$var.coef[2,2], Fit1$var.coef[3,3])

## [1] 0.004487572 0.008181533 0.359097737

Put above in a loop of 500.

  #-Repeat above for 1000 times, recording Est$coef each time.
  #       Compare the simulated variance and theoretical asympt variance ---

  n=100
  itt=500

  MLE  = matrix(0,itt,7)  #matrix to record estimated values
  Vars = matrix(0,itt,7)  #matrix to record estimated values

  set.seed(1373)
  for (i in 1:itt) {
    mu = 5
    # X  = arima.sim(n = n,  list(ar = c(0.7), ma = c(0.4))  ) + mu                    # Case 1
    X  = arima.sim(n = n, list(ar = c(0.7, .2, -.3), ma = c(0.6, -.4, .2))  ) + mu    # Case 2

    Est      = arima(X, order=c(3,0,3))
    MLE[i,]  = Est$coef
    Vars[i,] = diag(Est$var.coef)
  }

## Warning in log(s2): NaNs produced

## Warning in log(s2): NaNs produced

## Warning in log(s2): NaNs produced

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in log(s2): NaNs produced

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in log(s2): NaNs produced

## Warning in log(s2): NaNs produced

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

  #--- Result for phi1, theta1, mu ----
  layout(matrix(1:6, 2,3, byrow=TRUE));
  hist(MLE[,1], main="phi1 (.7)");     abline(v= .7, col="blue")
  hist(MLE[,2], main="phi2 (.2)");     abline(v= .2, col="blue")
  hist(MLE[,3], main="phi3 (.3)");     abline(v=-.3, col="blue")
  hist(MLE[,4], main="theta1 (.6)");   abline(v= .6, col="blue")
  hist(MLE[,5], main="theta2 (.4)");   abline(v=-.4, col="blue")
  hist(MLE[,6], main="theta3 (.2)");   abline(v= .2, col="blue")

  Result = cbind( apply(MLE, 2, mean), apply(MLE, 2, sd), sqrt(apply(Vars, 2, mean)) )
  colnames(Result) = c("Mean","SE by Sim",  "SE from Output")
  Result

##             Mean SE by Sim SE from Output
## [1,]  0.54219689 0.6287727      0.3371195
## [2,] -0.07738958 0.4732947      0.4203374
## [3,] -0.08756457 0.3389365      0.2285187
## [4,]  0.53327595 0.6740321      0.3415460
## [5,]  0.15170613 0.7645478      0.3128849
## [6,]  0.36054578 0.2943624      0.2197626
## [7,]  4.99426805 0.3392923      0.3362629

Increase n to 400

  #-Repeat above for 1000 times, recording Est$coef each time.
  #       Compare the simulated variance and theoretical asympt variance ---

  n=400
  itt=500

  MLE  = matrix(0,itt,7)  #matrix to record estimated values
  Vars = matrix(0,itt,7)  #matrix to record estimated values

  set.seed(373)
  for (i in 1:itt) {
    mu = 5
    # X  = arima.sim(n = n,  list(ar = c(0.7), ma = c(0.4))  ) + mu                    # Case 1
    X  = arima.sim(n = n, list(ar = c(0.7, .2, -.3), ma = c(0.6, -.4, .2))  ) + mu    # Case 2

    Est      = arima(X, order=c(3,0,3))
    MLE[i,]  = Est$coef
    Vars[i,] = diag(Est$var.coef)
  }

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

  #--- Result for phi1, theta1, mu ----
  layout(matrix(1:6, 2,3, byrow=TRUE));
  hist(MLE[,1], main="phi1 (.7)");     abline(v= .7, col="blue")
  hist(MLE[,2], main="phi2 (.2)");     abline(v= .2, col="blue")
  hist(MLE[,3], main="phi3 (.3)");     abline(v=-.3, col="blue")
  hist(MLE[,4], main="theta1 (.6)");   abline(v= .6, col="blue")
  hist(MLE[,5], main="theta2 (.4)");   abline(v=-.4, col="blue")
  hist(MLE[,6], main="theta3 (.2)");   abline(v= .2, col="blue")

  Result2 = cbind( apply(MLE, 2, mean), apply(MLE, 2, sd), sqrt(apply(Vars, 2, mean)) )
  colnames(Result) = c("Mean","SE by Sim",  "SE from Output")
  Result2

##            [,1]      [,2]      [,3]
## [1,]  0.6678227 0.2988139 0.2304187
## [2,]  0.1539419 0.2233818 0.1926529
## [3,] -0.2613771 0.1685802 0.1325948
## [4,]  0.4052964 0.3008026 0.2321341
## [5,] -0.2138445 0.3588771 0.2714871
## [6,]  0.1935095 0.1432537 0.1237579
## [7,]  4.9906898 0.1896803 0.1706930

  #- compare to when n=100
  Result

##             Mean SE by Sim SE from Output
## [1,]  0.54219689 0.6287727      0.3371195
## [2,] -0.07738958 0.4732947      0.4203374
## [3,] -0.08756457 0.3389365      0.2285187
## [4,]  0.53327595 0.6740321      0.3415460
## [5,]  0.15170613 0.7645478      0.3128849
## [6,]  0.36054578 0.2943624      0.2197626
## [7,]  4.99426805 0.3392923      0.3362629

  #
  Est

## 
## Call:
## arima(x = X, order = c(3, 0, 3))
## 
## Coefficients:
##          ar1     ar2      ar3     ma1      ma2     ma3  intercept
##       0.4911  0.3470  -0.2790  0.6066  -0.1464  0.1046     5.2788
## s.e.  0.2374  0.2094   0.1605  0.2387   0.3327  0.1523     0.1893
## 
## sigma^2 estimated as 1.145:  log likelihood = -595.49,  aic = 1206.98

When errors are not Normal

  n=100
  MLE  = Vars  = matrix(0,1000,3)  #matrix to record estimated values
  MLE2 = Vars2 = matrix(0,1000,3)  #matrix to record estimated values
  MLE3 = Vars3 = matrix(0,1000,3)  #matrix to record estimated values

  for (i in 1:1000) {

    #-- Normal errors ---
    X   = arima.sim(n = n, list(ar = c(0.7), ma = c(0.4))  ) + mu

    Est      = arima(X, order=c(1,0,1))
    MLE[i,]  = Est$coef
    Vars[i,] = c(Est$var.coef[1,1], Est$var.coef[2,2], Est$var.coef[3,3])

    #-- non-normal errors ---
    et = rt(n, 4)                          # t-distribution with df=4.
    Y  = arima.sim(n=n, list(ar = c(0.7), ma = c(0.4)), innov=et  ) + mu

    Est2 = arima(Y, order=c(1,0,1))
    MLE2[i,]  = Est2$coef
    Vars2[i,] = c(Est2$var.coef[1,1], Est2$var.coef[2,2], Est2$var.coef[3,3])

    #-- non-normal errors ---
    et3 = rexp(n, 1)-1                          # exp with mean 0
    T  = arima.sim(n=n, list(ar = c(0.7), ma = c(0.4)), innov=et3  ) + mu

    Est3 = arima(T, order=c(1,0,1))
    MLE3[i,]  = Est3$coef
    Vars3[i,] = c(Est3$var.coef[1,1], Est3$var.coef[2,2], Est3$var.coef[3,3])

  }

  # Plotting Results

  layout(matrix(1:6, 3,2, byrow=TRUE));
  hist(MLE[,1],  xlim=c(0,1), main="e ~ Norm: phi1 (.7)"  );  abline(v= .7, col="blue")
  hist(MLE[,2],  xlim=c(0,1), main="e ~ Norm: theta1 (.4)");  abline(v= .4, col="blue")
  hist(MLE2[,1], xlim=c(0,1), main="e ~ t(v): phi1 (.7)"  );  abline(v= .7, col="blue")
  hist(MLE2[,2], xlim=c(0,1), main="e ~ t(v): theta1 (.4)");  abline(v= .4, col="blue")
  hist(MLE3[,1], xlim=c(0,1), main="e ~ exp(1)-1: phi1 (.7)"  );  abline(v= .7, col="blue")
  hist(MLE3[,2], xlim=c(0,1), main="e ~ exp(1)-1: theta1 (.4)");  abline(v= .4, col="blue")

  Result3 = rbind(c(mean(MLE[,1]),   sd(MLE[,1]),  mean(MLE[,2]),   sd(MLE[,2] ) ),
                   c(mean(MLE2[,1]),  sd(MLE2[,1]), mean(MLE2[,2]),  sd(MLE2[,2]) ),
                   c(mean(MLE3[,1]),  sd(MLE3[,1]), mean(MLE3[,2]),  sd(MLE3[,2]) ) )

  colnames(Result3) = c("Mean phi1" ,  "SE phi1", "Mean th1", "SE th1")
  Result3

##      Mean phi1    SE phi1  Mean th1    SE th1
## [1,] 0.6614065 0.09300538 0.4169312 0.1112552
## [2,] 0.6573426 0.09131514 0.4186479 0.1124989
## [3,] 0.6613954 0.09062829 0.4187407 0.1136855

Summary

We assume that \(\epsilon_t\) is normal and estimate ARMA parameters by MLE.
For MLE, asymptotic sample distributions of the estimators are known (formula above).
For small \(n\), Standard Error from R output may not be accurate.
When the assumption of \(\epsilon_t\) being normal is violated, performance of MLE is not significantly affected for ARMA model.
Since MLE is computed by numerial optimization, it needs good starting point. CSS is used as starting point in and by default.