Class Web Page



1. MLE for ARMA(p,q)


MLE

To constuct Maximum Likelihood Estimator, we need distribution of \(\epsilon_t\).
Suppose \(\epsilon_t \sim N(0,\sigma^2)\). Then causal ARMA(\(p,q\)) model \(X_t\) has to be normal as well, \[ X_t \hspace{3mm} = \hspace{3mm} \sum_{i=0}^\infty \psi_i e_{t-i} \\ \\ X_t \sim N\Big(0, \gamma(0)\Big) \] We know that \(X_{t+1}\) has same distribution, and covariance between \(X_t\) and \(X_{t+1}\) is \(\gamma(1)\).

That means, \(\{X_1, \ldots, X_n\}\) is \(n\)-dimentional multivariate normal vector. \[ \left[ \begin{array}{ccccc} X_1 \\ X_2 \\ \vdots \\ X_n \\ \end{array} \right] \sim N_n \Big(0, \mathbf \Sigma\Big), \hspace{5mm} \mbox{ where } {\small \mathbf \Sigma = \left[ \begin{array}{c c c c} \gamma(0) &\gamma(1) &\cdots &\gamma(n-1) \\ \gamma(1) &\gamma(0) &\cdots &\gamma(n-2) \\ \vdots &\vdots & & \vdots \\ \gamma(n-1) &\gamma(n-2) &\cdots &\gamma(0) \\ \end{array} \right]} \]

Mean of each element is 0, and covariance matrix is \[ \mathbf \Sigma = \mathbf \Gamma_n \]

We know that joint pdf of multivariate Normal vector with covariance matrix \(\mathbf \Gamma_n\) is \[ L(\phi, \theta, \sigma) = \frac{1}{(2 \pi)^{n/2} (\mbox{det}\mathbf \Gamma_n)^{1/2} } \exp\Big( - \frac{1}{2} \mathbf X'_n \mathbf \Gamma_n^{-1} \mathbf X_n \Big) \]

When MLE is used, the parameter estimators are searched numerically, under the assumption that \(\epsilon_t\) are Normal (Gaussian Likelihood).
For ARMA model, you will not be penalized so much when the assumption of Normality is violated.

2. Large-Sample Property

For large sample from an ARMA(\(p,q\)) process, MLE has sample distribution, \[ {\large \hat \beta \approx N\Big(\beta, V(\hat \beta)\Big) } \] where \(\beta = (\phi_1, \ldots, \phi_p, \theta_1, \ldots, \theta_q, \sigma)^T\)

Ex: AR(p)

This is same as asymptotic covariance matrix of Yule-Walker estimators.

\[\begin{align*} \mbox{ AR(1): } \hspace{10mm} V(\hat \phi_1) &= \frac{\sigma^2}{n} (1-\phi_1^2) \\\\ \mbox{ AR(2): } \hspace{3mm} V(\hat \phi_1, \hat \phi_2) &= \frac{\sigma^2}{n} \left[ \begin{array}{ccccc} 1-\phi_2^2 & -\phi_1 (1+\phi_2) \\ -\phi_1 (1+\phi_2) & 1-\phi_2^2 \\ \end{array} \right] \end{align*}\]

Ex: MA(q)

\[\begin{align*} \mbox{ MA(1): } \hspace{10mm} V(\hat \theta_1) &= \frac{\sigma^2}{n} (1-\theta_1^2) \\\\ \mbox{ MA(2): } \hspace{3mm} V(\hat \theta_1, \hat \theta_2) &= \frac{\sigma^2}{n} \left[ \begin{array}{ccccc} 1-\theta_2^2 & -\theta_1 (1+\theta_2) \\ -\theta_1 (1+\theta_2) & 1-\theta_2^2 \\ \end{array} \right] \end{align*}\]

Ex: ARMA(1,1)

\[\begin{align*} V(\phi_1,\theta_1) \hspace{3mm} = \hspace{3mm} \Big(\frac{\sigma^2}{n}\Big) \frac{1- \phi_1 \theta_1}{(\phi_1-\theta_1)^2} \left[ \begin{array}{ccccc} (1-\phi_1^2)(1-\phi_1 \theta_1) & -(1-\theta_1^2)(1-\phi_1^2) \\ -(1-\theta_1^2)(1-\phi_1^2) & (1-\phi_1^2)(1-\phi_1\theta_1) \\ \end{array} \right] \end{align*}\]

3. Finite Sample Simulation


1-time simulation and MLE for n=100

  mu = 5
  X  = arima.sim(n = 100, list(ar = c(0.7), ma = c(0.4)) ) + mu
  plot(X, type="o")

  Fit1 = arima(X, order=c(1,0,1))
  Fit1
## 
## Call:
## arima(x = X, order = c(1, 0, 1))
## 
## Coefficients:
##          ar1     ma1  intercept
##       0.7738  0.3631     4.6714
## s.e.  0.0670  0.0905     0.5992
## 
## sigma^2 estimated as 1.062:  log likelihood = -145.66,  aic = 299.32
  str(Fit1)          # see what's inside
## List of 14
##  $ coef     : Named num [1:3] 0.774 0.363 4.671
##   ..- attr(*, "names")= chr [1:3] "ar1" "ma1" "intercept"
##  $ sigma2   : num 1.06
##  $ var.coef : num [1:3, 1:3] 0.00449 -0.00247 -0.00078 -0.00247 0.00818 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:3] "ar1" "ma1" "intercept"
##   .. ..$ : chr [1:3] "ar1" "ma1" "intercept"
##  $ mask     : logi [1:3] TRUE TRUE TRUE
##  $ loglik   : num -146
##  $ aic      : num 299
##  $ arma     : int [1:7] 1 1 0 0 1 0 0
##  $ residuals: Time-Series [1:100] from 1 to 100: -0.116 0.416 -2.176 -1.79 -0.252 ...
##  $ call     : language arima(x = X, order = c(1, 0, 1))
##  $ series   : chr "X"
##  $ code     : int 0
##  $ n.cond   : int 0
##  $ nobs     : int 100
##  $ model    :List of 10
##   ..$ phi  : num 0.774
##   ..$ theta: num 0.363
##   ..$ Delta: num(0) 
##   ..$ Z    : num [1:2] 1 0
##   ..$ a    : num [1:2] -0.685 0.178
##   ..$ P    : num [1:2, 1:2] 0 0 0 0
##   ..$ T    : num [1:2, 1:2] 0.774 0 1 0
##   ..$ V    : num [1:2, 1:2] 1 0.363 0.363 0.132
##   ..$ h    : num 0
##   ..$ Pn   : num [1:2, 1:2] 1 0.363 0.363 0.132
##  - attr(*, "class")= chr "Arima"
  Fit1$coef          # parameter estimates (MLE)
##       ar1       ma1 intercept 
## 0.7737811 0.3630587 4.6714018
  Fit1$var.coef      # variance of MLE using asymptotic formula
##                     ar1           ma1     intercept
## ar1        0.0044875716 -0.0024678955 -0.0007804059
## ma1       -0.0024678955  0.0081815327  0.0003076798
## intercept -0.0007804059  0.0003076798  0.3590977368
  c(Fit1$var.coef[1,1], Fit1$var.coef[2,2], Fit1$var.coef[3,3])
## [1] 0.004487572 0.008181533 0.359097737


Put above in a loop of 500.

  #-Repeat above for 1000 times, recording Est$coef each time.
  #       Compare the simulated variance and theoretical asympt variance ---

  n=100
  itt=500

  MLE  = matrix(0,itt,7)  #matrix to record estimated values
  Vars = matrix(0,itt,7)  #matrix to record estimated values

  set.seed(1373)
  for (i in 1:itt) {
    mu = 5
    # X  = arima.sim(n = n,  list(ar = c(0.7), ma = c(0.4))  ) + mu                    # Case 1
    X  = arima.sim(n = n, list(ar = c(0.7, .2, -.3), ma = c(0.6, -.4, .2))  ) + mu    # Case 2

    Est      = arima(X, order=c(3,0,3))
    MLE[i,]  = Est$coef
    Vars[i,] = diag(Est$var.coef)
  }
## Warning in log(s2): NaNs produced

## Warning in log(s2): NaNs produced

## Warning in log(s2): NaNs produced
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in log(s2): NaNs produced
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in log(s2): NaNs produced

## Warning in log(s2): NaNs produced
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
  #--- Result for phi1, theta1, mu ----
  layout(matrix(1:6, 2,3, byrow=TRUE));
  hist(MLE[,1], main="phi1 (.7)");     abline(v= .7, col="blue")
  hist(MLE[,2], main="phi2 (.2)");     abline(v= .2, col="blue")
  hist(MLE[,3], main="phi3 (.3)");     abline(v=-.3, col="blue")
  hist(MLE[,4], main="theta1 (.6)");   abline(v= .6, col="blue")
  hist(MLE[,5], main="theta2 (.4)");   abline(v=-.4, col="blue")
  hist(MLE[,6], main="theta3 (.2)");   abline(v= .2, col="blue")

  Result = cbind( apply(MLE, 2, mean), apply(MLE, 2, sd), sqrt(apply(Vars, 2, mean)) )
  colnames(Result) = c("Mean","SE by Sim",  "SE from Output")
  Result
##             Mean SE by Sim SE from Output
## [1,]  0.54219689 0.6287727      0.3371195
## [2,] -0.07738958 0.4732947      0.4203374
## [3,] -0.08756457 0.3389365      0.2285187
## [4,]  0.53327595 0.6740321      0.3415460
## [5,]  0.15170613 0.7645478      0.3128849
## [6,]  0.36054578 0.2943624      0.2197626
## [7,]  4.99426805 0.3392923      0.3362629


Increase n to 400

  #-Repeat above for 1000 times, recording Est$coef each time.
  #       Compare the simulated variance and theoretical asympt variance ---

  n=400
  itt=500

  MLE  = matrix(0,itt,7)  #matrix to record estimated values
  Vars = matrix(0,itt,7)  #matrix to record estimated values

  set.seed(373)
  for (i in 1:itt) {
    mu = 5
    # X  = arima.sim(n = n,  list(ar = c(0.7), ma = c(0.4))  ) + mu                    # Case 1
    X  = arima.sim(n = n, list(ar = c(0.7, .2, -.3), ma = c(0.6, -.4, .2))  ) + mu    # Case 2

    Est      = arima(X, order=c(3,0,3))
    MLE[i,]  = Est$coef
    Vars[i,] = diag(Est$var.coef)
  }
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1

## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
  #--- Result for phi1, theta1, mu ----
  layout(matrix(1:6, 2,3, byrow=TRUE));
  hist(MLE[,1], main="phi1 (.7)");     abline(v= .7, col="blue")
  hist(MLE[,2], main="phi2 (.2)");     abline(v= .2, col="blue")
  hist(MLE[,3], main="phi3 (.3)");     abline(v=-.3, col="blue")
  hist(MLE[,4], main="theta1 (.6)");   abline(v= .6, col="blue")
  hist(MLE[,5], main="theta2 (.4)");   abline(v=-.4, col="blue")
  hist(MLE[,6], main="theta3 (.2)");   abline(v= .2, col="blue")

  Result2 = cbind( apply(MLE, 2, mean), apply(MLE, 2, sd), sqrt(apply(Vars, 2, mean)) )
  colnames(Result) = c("Mean","SE by Sim",  "SE from Output")
  Result2
##            [,1]      [,2]      [,3]
## [1,]  0.6678227 0.2988139 0.2304187
## [2,]  0.1539419 0.2233818 0.1926529
## [3,] -0.2613771 0.1685802 0.1325948
## [4,]  0.4052964 0.3008026 0.2321341
## [5,] -0.2138445 0.3588771 0.2714871
## [6,]  0.1935095 0.1432537 0.1237579
## [7,]  4.9906898 0.1896803 0.1706930
  #- compare to when n=100
  Result
##             Mean SE by Sim SE from Output
## [1,]  0.54219689 0.6287727      0.3371195
## [2,] -0.07738958 0.4732947      0.4203374
## [3,] -0.08756457 0.3389365      0.2285187
## [4,]  0.53327595 0.6740321      0.3415460
## [5,]  0.15170613 0.7645478      0.3128849
## [6,]  0.36054578 0.2943624      0.2197626
## [7,]  4.99426805 0.3392923      0.3362629
  #
  Est
## 
## Call:
## arima(x = X, order = c(3, 0, 3))
## 
## Coefficients:
##          ar1     ar2      ar3     ma1      ma2     ma3  intercept
##       0.4911  0.3470  -0.2790  0.6066  -0.1464  0.1046     5.2788
## s.e.  0.2374  0.2094   0.1605  0.2387   0.3327  0.1523     0.1893
## 
## sigma^2 estimated as 1.145:  log likelihood = -595.49,  aic = 1206.98


When errors are not Normal

  n=100
  MLE  = Vars  = matrix(0,1000,3)  #matrix to record estimated values
  MLE2 = Vars2 = matrix(0,1000,3)  #matrix to record estimated values
  MLE3 = Vars3 = matrix(0,1000,3)  #matrix to record estimated values

  for (i in 1:1000) {

    #-- Normal errors ---
    X   = arima.sim(n = n, list(ar = c(0.7), ma = c(0.4))  ) + mu

    Est      = arima(X, order=c(1,0,1))
    MLE[i,]  = Est$coef
    Vars[i,] = c(Est$var.coef[1,1], Est$var.coef[2,2], Est$var.coef[3,3])

    #-- non-normal errors ---
    et = rt(n, 4)                          # t-distribution with df=4.
    Y  = arima.sim(n=n, list(ar = c(0.7), ma = c(0.4)), innov=et  ) + mu

    Est2 = arima(Y, order=c(1,0,1))
    MLE2[i,]  = Est2$coef
    Vars2[i,] = c(Est2$var.coef[1,1], Est2$var.coef[2,2], Est2$var.coef[3,3])

    #-- non-normal errors ---
    et3 = rexp(n, 1)-1                          # exp with mean 0
    T  = arima.sim(n=n, list(ar = c(0.7), ma = c(0.4)), innov=et3  ) + mu

    Est3 = arima(T, order=c(1,0,1))
    MLE3[i,]  = Est3$coef
    Vars3[i,] = c(Est3$var.coef[1,1], Est3$var.coef[2,2], Est3$var.coef[3,3])

  }

  # Plotting Results

  layout(matrix(1:6, 3,2, byrow=TRUE));
  hist(MLE[,1],  xlim=c(0,1), main="e ~ Norm: phi1 (.7)"  );  abline(v= .7, col="blue")
  hist(MLE[,2],  xlim=c(0,1), main="e ~ Norm: theta1 (.4)");  abline(v= .4, col="blue")
  hist(MLE2[,1], xlim=c(0,1), main="e ~ t(v): phi1 (.7)"  );  abline(v= .7, col="blue")
  hist(MLE2[,2], xlim=c(0,1), main="e ~ t(v): theta1 (.4)");  abline(v= .4, col="blue")
  hist(MLE3[,1], xlim=c(0,1), main="e ~ exp(1)-1: phi1 (.7)"  );  abline(v= .7, col="blue")
  hist(MLE3[,2], xlim=c(0,1), main="e ~ exp(1)-1: theta1 (.4)");  abline(v= .4, col="blue")

  Result3 = rbind(c(mean(MLE[,1]),   sd(MLE[,1]),  mean(MLE[,2]),   sd(MLE[,2] ) ),
                   c(mean(MLE2[,1]),  sd(MLE2[,1]), mean(MLE2[,2]),  sd(MLE2[,2]) ),
                   c(mean(MLE3[,1]),  sd(MLE3[,1]), mean(MLE3[,2]),  sd(MLE3[,2]) ) )

  colnames(Result3) = c("Mean phi1" ,  "SE phi1", "Mean th1", "SE th1")
  Result3
##      Mean phi1    SE phi1  Mean th1    SE th1
## [1,] 0.6614065 0.09300538 0.4169312 0.1112552
## [2,] 0.6573426 0.09131514 0.4186479 0.1124989
## [3,] 0.6613954 0.09062829 0.4187407 0.1136855





Summary

  • We assume that \(\epsilon_t\) is normal and estimate ARMA parameters by MLE.

  • For MLE, asymptotic sample distributions of the estimators are known (formula above).

  • For small \(n\), Standard Error from R output may not be accurate.

  • When the assumption of \(\epsilon_t\) being normal is violated, performance of MLE is not significantly affected for ARMA model.

  • Since MLE is computed by numerial optimization, it needs good starting point. CSS is used as starting point in and by default.