To constuct Maximum Likelihood Estimator, we need distribution of
\(\epsilon_t\).
Suppose \(\epsilon_t \sim
N(0,\sigma^2)\). Then causal ARMA(\(p,q\)) model \(X_t\) has to be normal as well, \[
X_t \hspace{3mm} = \hspace{3mm} \sum_{i=0}^\infty \psi_i e_{t-i} \\ \\
X_t \sim N\Big(0, \gamma(0)\Big)
\] We know that \(X_{t+1}\) has
same distribution, and covariance between \(X_t\) and \(X_{t+1}\) is \(\gamma(1)\).
That means, \(\{X_1, \ldots, X_n\}\) is \(n\)-dimentional multivariate normal vector. \[ \left[ \begin{array}{ccccc} X_1 \\ X_2 \\ \vdots \\ X_n \\ \end{array} \right] \sim N_n \Big(0, \mathbf \Sigma\Big), \hspace{5mm} \mbox{ where } {\small \mathbf \Sigma = \left[ \begin{array}{c c c c} \gamma(0) &\gamma(1) &\cdots &\gamma(n-1) \\ \gamma(1) &\gamma(0) &\cdots &\gamma(n-2) \\ \vdots &\vdots & & \vdots \\ \gamma(n-1) &\gamma(n-2) &\cdots &\gamma(0) \\ \end{array} \right]} \]
Mean of each element is 0, and covariance matrix is \[ \mathbf \Sigma = \mathbf \Gamma_n \]
We know that joint pdf of multivariate Normal vector with covariance matrix \(\mathbf \Gamma_n\) is \[ L(\phi, \theta, \sigma) = \frac{1}{(2 \pi)^{n/2} (\mbox{det}\mathbf \Gamma_n)^{1/2} } \exp\Big( - \frac{1}{2} \mathbf X'_n \mathbf \Gamma_n^{-1} \mathbf X_n \Big) \]
When MLE is used, the parameter estimators are searched numerically,
under the assumption that \(\epsilon_t\) are Normal (Gaussian
Likelihood).
For ARMA model, you will not be penalized so much when the assumption of
Normality is violated.
For large sample from an ARMA(\(p,q\)) process, MLE has sample
distribution, \[
{\large
\hat \beta \approx N\Big(\beta, V(\hat \beta)\Big)
}
\] where \(\beta = (\phi_1, \ldots,
\phi_p, \theta_1, \ldots, \theta_q, \sigma)^T\)
This is same as asymptotic covariance matrix of Yule-Walker estimators.
\[\begin{align*}
\mbox{ AR(1): } \hspace{10mm} V(\hat \phi_1)
&= \frac{\sigma^2}{n} (1-\phi_1^2) \\\\
\mbox{ AR(2): } \hspace{3mm} V(\hat \phi_1, \hat \phi_2) &=
\frac{\sigma^2}{n}
\left[
\begin{array}{ccccc}
1-\phi_2^2 & -\phi_1 (1+\phi_2) \\
-\phi_1 (1+\phi_2) & 1-\phi_2^2 \\
\end{array}
\right]
\end{align*}\]
\[\begin{align*}
\mbox{ MA(1): } \hspace{10mm} V(\hat \theta_1) &=
\frac{\sigma^2}{n} (1-\theta_1^2) \\\\
\mbox{ MA(2): } \hspace{3mm} V(\hat \theta_1, \hat \theta_2) &=
\frac{\sigma^2}{n}
\left[
\begin{array}{ccccc}
1-\theta_2^2 & -\theta_1 (1+\theta_2) \\
-\theta_1 (1+\theta_2) & 1-\theta_2^2 \\
\end{array}
\right]
\end{align*}\]
\[\begin{align*}
V(\phi_1,\theta_1) \hspace{3mm} = \hspace{3mm}
\Big(\frac{\sigma^2}{n}\Big)
\frac{1- \phi_1 \theta_1}{(\phi_1-\theta_1)^2}
\left[
\begin{array}{ccccc}
(1-\phi_1^2)(1-\phi_1 \theta_1) & -(1-\theta_1^2)(1-\phi_1^2) \\
-(1-\theta_1^2)(1-\phi_1^2) & (1-\phi_1^2)(1-\phi_1\theta_1) \\
\end{array}
\right]
\end{align*}\]
mu = 5
X = arima.sim(n = 100, list(ar = c(0.7), ma = c(0.4)) ) + mu
plot(X, type="o") Fit1 = arima(X, order=c(1,0,1))
Fit1##
## Call:
## arima(x = X, order = c(1, 0, 1))
##
## Coefficients:
## ar1 ma1 intercept
## 0.7738 0.3631 4.6714
## s.e. 0.0670 0.0905 0.5992
##
## sigma^2 estimated as 1.062: log likelihood = -145.66, aic = 299.32
str(Fit1) # see what's inside## List of 14
## $ coef : Named num [1:3] 0.774 0.363 4.671
## ..- attr(*, "names")= chr [1:3] "ar1" "ma1" "intercept"
## $ sigma2 : num 1.06
## $ var.coef : num [1:3, 1:3] 0.00449 -0.00247 -0.00078 -0.00247 0.00818 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:3] "ar1" "ma1" "intercept"
## .. ..$ : chr [1:3] "ar1" "ma1" "intercept"
## $ mask : logi [1:3] TRUE TRUE TRUE
## $ loglik : num -146
## $ aic : num 299
## $ arma : int [1:7] 1 1 0 0 1 0 0
## $ residuals: Time-Series [1:100] from 1 to 100: -0.116 0.416 -2.176 -1.79 -0.252 ...
## $ call : language arima(x = X, order = c(1, 0, 1))
## $ series : chr "X"
## $ code : int 0
## $ n.cond : int 0
## $ nobs : int 100
## $ model :List of 10
## ..$ phi : num 0.774
## ..$ theta: num 0.363
## ..$ Delta: num(0)
## ..$ Z : num [1:2] 1 0
## ..$ a : num [1:2] -0.685 0.178
## ..$ P : num [1:2, 1:2] 0 0 0 0
## ..$ T : num [1:2, 1:2] 0.774 0 1 0
## ..$ V : num [1:2, 1:2] 1 0.363 0.363 0.132
## ..$ h : num 0
## ..$ Pn : num [1:2, 1:2] 1 0.363 0.363 0.132
## - attr(*, "class")= chr "Arima"
Fit1$coef # parameter estimates (MLE)## ar1 ma1 intercept
## 0.7737811 0.3630587 4.6714018
Fit1$var.coef # variance of MLE using asymptotic formula## ar1 ma1 intercept
## ar1 0.0044875716 -0.0024678955 -0.0007804059
## ma1 -0.0024678955 0.0081815327 0.0003076798
## intercept -0.0007804059 0.0003076798 0.3590977368
c(Fit1$var.coef[1,1], Fit1$var.coef[2,2], Fit1$var.coef[3,3])## [1] 0.004487572 0.008181533 0.359097737
#-Repeat above for 1000 times, recording Est$coef each time.
# Compare the simulated variance and theoretical asympt variance ---
n=100
itt=500
MLE = matrix(0,itt,7) #matrix to record estimated values
Vars = matrix(0,itt,7) #matrix to record estimated values
set.seed(1373)
for (i in 1:itt) {
mu = 5
# X = arima.sim(n = n, list(ar = c(0.7), ma = c(0.4)) ) + mu # Case 1
X = arima.sim(n = n, list(ar = c(0.7, .2, -.3), ma = c(0.6, -.4, .2)) ) + mu # Case 2
Est = arima(X, order=c(3,0,3))
MLE[i,] = Est$coef
Vars[i,] = diag(Est$var.coef)
}## Warning in log(s2): NaNs produced
## Warning in log(s2): NaNs produced
## Warning in log(s2): NaNs produced
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in log(s2): NaNs produced
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in log(s2): NaNs produced
## Warning in log(s2): NaNs produced
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
#--- Result for phi1, theta1, mu ----
layout(matrix(1:6, 2,3, byrow=TRUE));
hist(MLE[,1], main="phi1 (.7)"); abline(v= .7, col="blue")
hist(MLE[,2], main="phi2 (.2)"); abline(v= .2, col="blue")
hist(MLE[,3], main="phi3 (.3)"); abline(v=-.3, col="blue")
hist(MLE[,4], main="theta1 (.6)"); abline(v= .6, col="blue")
hist(MLE[,5], main="theta2 (.4)"); abline(v=-.4, col="blue")
hist(MLE[,6], main="theta3 (.2)"); abline(v= .2, col="blue") Result = cbind( apply(MLE, 2, mean), apply(MLE, 2, sd), sqrt(apply(Vars, 2, mean)) )
colnames(Result) = c("Mean","SE by Sim", "SE from Output")
Result## Mean SE by Sim SE from Output
## [1,] 0.54219689 0.6287727 0.3371195
## [2,] -0.07738958 0.4732947 0.4203374
## [3,] -0.08756457 0.3389365 0.2285187
## [4,] 0.53327595 0.6740321 0.3415460
## [5,] 0.15170613 0.7645478 0.3128849
## [6,] 0.36054578 0.2943624 0.2197626
## [7,] 4.99426805 0.3392923 0.3362629
#-Repeat above for 1000 times, recording Est$coef each time.
# Compare the simulated variance and theoretical asympt variance ---
n=400
itt=500
MLE = matrix(0,itt,7) #matrix to record estimated values
Vars = matrix(0,itt,7) #matrix to record estimated values
set.seed(373)
for (i in 1:itt) {
mu = 5
# X = arima.sim(n = n, list(ar = c(0.7), ma = c(0.4)) ) + mu # Case 1
X = arima.sim(n = n, list(ar = c(0.7, .2, -.3), ma = c(0.6, -.4, .2)) ) + mu # Case 2
Est = arima(X, order=c(3,0,3))
MLE[i,] = Est$coef
Vars[i,] = diag(Est$var.coef)
}## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
#--- Result for phi1, theta1, mu ----
layout(matrix(1:6, 2,3, byrow=TRUE));
hist(MLE[,1], main="phi1 (.7)"); abline(v= .7, col="blue")
hist(MLE[,2], main="phi2 (.2)"); abline(v= .2, col="blue")
hist(MLE[,3], main="phi3 (.3)"); abline(v=-.3, col="blue")
hist(MLE[,4], main="theta1 (.6)"); abline(v= .6, col="blue")
hist(MLE[,5], main="theta2 (.4)"); abline(v=-.4, col="blue")
hist(MLE[,6], main="theta3 (.2)"); abline(v= .2, col="blue") Result2 = cbind( apply(MLE, 2, mean), apply(MLE, 2, sd), sqrt(apply(Vars, 2, mean)) )
colnames(Result) = c("Mean","SE by Sim", "SE from Output")
Result2## [,1] [,2] [,3]
## [1,] 0.6678227 0.2988139 0.2304187
## [2,] 0.1539419 0.2233818 0.1926529
## [3,] -0.2613771 0.1685802 0.1325948
## [4,] 0.4052964 0.3008026 0.2321341
## [5,] -0.2138445 0.3588771 0.2714871
## [6,] 0.1935095 0.1432537 0.1237579
## [7,] 4.9906898 0.1896803 0.1706930
#- compare to when n=100
Result## Mean SE by Sim SE from Output
## [1,] 0.54219689 0.6287727 0.3371195
## [2,] -0.07738958 0.4732947 0.4203374
## [3,] -0.08756457 0.3389365 0.2285187
## [4,] 0.53327595 0.6740321 0.3415460
## [5,] 0.15170613 0.7645478 0.3128849
## [6,] 0.36054578 0.2943624 0.2197626
## [7,] 4.99426805 0.3392923 0.3362629
#
Est##
## Call:
## arima(x = X, order = c(3, 0, 3))
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 ma3 intercept
## 0.4911 0.3470 -0.2790 0.6066 -0.1464 0.1046 5.2788
## s.e. 0.2374 0.2094 0.1605 0.2387 0.3327 0.1523 0.1893
##
## sigma^2 estimated as 1.145: log likelihood = -595.49, aic = 1206.98
n=100
MLE = Vars = matrix(0,1000,3) #matrix to record estimated values
MLE2 = Vars2 = matrix(0,1000,3) #matrix to record estimated values
MLE3 = Vars3 = matrix(0,1000,3) #matrix to record estimated values
for (i in 1:1000) {
#-- Normal errors ---
X = arima.sim(n = n, list(ar = c(0.7), ma = c(0.4)) ) + mu
Est = arima(X, order=c(1,0,1))
MLE[i,] = Est$coef
Vars[i,] = c(Est$var.coef[1,1], Est$var.coef[2,2], Est$var.coef[3,3])
#-- non-normal errors ---
et = rt(n, 4) # t-distribution with df=4.
Y = arima.sim(n=n, list(ar = c(0.7), ma = c(0.4)), innov=et ) + mu
Est2 = arima(Y, order=c(1,0,1))
MLE2[i,] = Est2$coef
Vars2[i,] = c(Est2$var.coef[1,1], Est2$var.coef[2,2], Est2$var.coef[3,3])
#-- non-normal errors ---
et3 = rexp(n, 1)-1 # exp with mean 0
T = arima.sim(n=n, list(ar = c(0.7), ma = c(0.4)), innov=et3 ) + mu
Est3 = arima(T, order=c(1,0,1))
MLE3[i,] = Est3$coef
Vars3[i,] = c(Est3$var.coef[1,1], Est3$var.coef[2,2], Est3$var.coef[3,3])
}
# Plotting Results
layout(matrix(1:6, 3,2, byrow=TRUE));
hist(MLE[,1], xlim=c(0,1), main="e ~ Norm: phi1 (.7)" ); abline(v= .7, col="blue")
hist(MLE[,2], xlim=c(0,1), main="e ~ Norm: theta1 (.4)"); abline(v= .4, col="blue")
hist(MLE2[,1], xlim=c(0,1), main="e ~ t(v): phi1 (.7)" ); abline(v= .7, col="blue")
hist(MLE2[,2], xlim=c(0,1), main="e ~ t(v): theta1 (.4)"); abline(v= .4, col="blue")
hist(MLE3[,1], xlim=c(0,1), main="e ~ exp(1)-1: phi1 (.7)" ); abline(v= .7, col="blue")
hist(MLE3[,2], xlim=c(0,1), main="e ~ exp(1)-1: theta1 (.4)"); abline(v= .4, col="blue") Result3 = rbind(c(mean(MLE[,1]), sd(MLE[,1]), mean(MLE[,2]), sd(MLE[,2] ) ),
c(mean(MLE2[,1]), sd(MLE2[,1]), mean(MLE2[,2]), sd(MLE2[,2]) ),
c(mean(MLE3[,1]), sd(MLE3[,1]), mean(MLE3[,2]), sd(MLE3[,2]) ) )
colnames(Result3) = c("Mean phi1" , "SE phi1", "Mean th1", "SE th1")
Result3## Mean phi1 SE phi1 Mean th1 SE th1
## [1,] 0.6614065 0.09300538 0.4169312 0.1112552
## [2,] 0.6573426 0.09131514 0.4186479 0.1124989
## [3,] 0.6613954 0.09062829 0.4187407 0.1136855
We assume that \(\epsilon_t\) is normal and estimate ARMA parameters by MLE.
For MLE, asymptotic sample distributions of the estimators are known (formula above).
For small \(n\), Standard Error from R output may not be accurate.
When the assumption of \(\epsilon_t\) being normal is violated, performance of MLE is not significantly affected for ARMA model.
Since MLE is computed by numerial optimization, it needs good starting point. CSS is used as starting point in and by default.