To constuct Maximum Likelihood Estimator, we need distribution of
\(\epsilon_t\).
Suppose \(\epsilon_t \sim
N(0,\sigma^2)\). Then causal ARMA(\(p,q\)) model \(X_t\) has to be normal as well, \[
X_t \hspace{3mm} = \hspace{3mm} \sum_{i=0}^\infty \psi_i e_{t-i} \\ \\
X_t \sim N\Big(0, \gamma(0)\Big)
\] We know that \(X_{t+1}\) has
same distribution, and covariance between \(X_t\) and \(X_{t+1}\) is \(\gamma(1)\).
That means, \(\{X_1, \ldots, X_n\}\) is \(n\)-dimentional multivariate normal vector. \[ \left[ \begin{array}{ccccc} X_1 \\ X_2 \\ \vdots \\ X_n \\ \end{array} \right] \sim N_n \Big(0, \mathbf \Sigma\Big), \hspace{5mm} \mbox{ where } {\small \mathbf \Sigma = \left[ \begin{array}{c c c c} \gamma(0) &\gamma(1) &\cdots &\gamma(n-1) \\ \gamma(1) &\gamma(0) &\cdots &\gamma(n-2) \\ \vdots &\vdots & & \vdots \\ \gamma(n-1) &\gamma(n-2) &\cdots &\gamma(0) \\ \end{array} \right]} \]
Mean of each element is 0, and covariance matrix is \[ \mathbf \Sigma = \mathbf \Gamma_n \]
We know that joint pdf of multivariate Normal vector with covariance matrix \(\mathbf \Gamma_n\) is \[ L(\phi, \theta, \sigma) = \frac{1}{(2 \pi)^{n/2} (\mbox{det}\mathbf \Gamma_n)^{1/2} } \exp\Big( - \frac{1}{2} \mathbf X'_n \mathbf \Gamma_n^{-1} \mathbf X_n \Big) \]
When MLE is used, the parameter estimators are searched numerically,
under the assumption that \(\epsilon_t\) are Normal (Gaussian
Likelihood).
For ARMA model, you will not be penalized so much when the assumption of
Normality is violated.
For large sample from an ARMA(\(p,q\)) process, MLE has sample
distribution, \[
{\large
\hat \beta \approx N\Big(\beta, V(\hat \beta)\Big)
}
\] where \(\beta = (\phi_1, \ldots,
\phi_p, \theta_1, \ldots, \theta_q, \sigma)^T\)
This is same as asymptotic covariance matrix of Yule-Walker estimators.
\[\begin{align*}
\mbox{ AR(1): } \hspace{10mm} V(\hat \phi_1)
&= \frac{\sigma^2}{n} (1-\phi_1^2) \\\\
\mbox{ AR(2): } \hspace{3mm} V(\hat \phi_1, \hat \phi_2) &=
\frac{\sigma^2}{n}
\left[
\begin{array}{ccccc}
1-\phi_2^2 & -\phi_1 (1+\phi_2) \\
-\phi_1 (1+\phi_2) & 1-\phi_2^2 \\
\end{array}
\right]
\end{align*}\]
\[\begin{align*}
\mbox{ MA(1): } \hspace{10mm} V(\hat \theta_1) &=
\frac{\sigma^2}{n} (1-\theta_1^2) \\\\
\mbox{ MA(2): } \hspace{3mm} V(\hat \theta_1, \hat \theta_2) &=
\frac{\sigma^2}{n}
\left[
\begin{array}{ccccc}
1-\theta_2^2 & -\theta_1 (1+\theta_2) \\
-\theta_1 (1+\theta_2) & 1-\theta_2^2 \\
\end{array}
\right]
\end{align*}\]
\[\begin{align*}
V(\phi_1,\theta_1) \hspace{3mm} = \hspace{3mm}
\Big(\frac{\sigma^2}{n}\Big)
\frac{1- \phi_1 \theta_1}{(\phi_1-\theta_1)^2}
\left[
\begin{array}{ccccc}
(1-\phi_1^2)(1-\phi_1 \theta_1) & -(1-\theta_1^2)(1-\phi_1^2) \\
-(1-\theta_1^2)(1-\phi_1^2) & (1-\phi_1^2)(1-\phi_1\theta_1) \\
\end{array}
\right]
\end{align*}\]
= 5
mu = arima.sim(n = 100, list(ar = c(0.7), ma = c(0.4)) ) + mu
X plot(X, type="o")
= arima(X, order=c(1,0,1))
Fit1 Fit1
##
## Call:
## arima(x = X, order = c(1, 0, 1))
##
## Coefficients:
## ar1 ma1 intercept
## 0.7738 0.3631 4.6714
## s.e. 0.0670 0.0905 0.5992
##
## sigma^2 estimated as 1.062: log likelihood = -145.66, aic = 299.32
str(Fit1) # see what's inside
## List of 14
## $ coef : Named num [1:3] 0.774 0.363 4.671
## ..- attr(*, "names")= chr [1:3] "ar1" "ma1" "intercept"
## $ sigma2 : num 1.06
## $ var.coef : num [1:3, 1:3] 0.00449 -0.00247 -0.00078 -0.00247 0.00818 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:3] "ar1" "ma1" "intercept"
## .. ..$ : chr [1:3] "ar1" "ma1" "intercept"
## $ mask : logi [1:3] TRUE TRUE TRUE
## $ loglik : num -146
## $ aic : num 299
## $ arma : int [1:7] 1 1 0 0 1 0 0
## $ residuals: Time-Series [1:100] from 1 to 100: -0.116 0.416 -2.176 -1.79 -0.252 ...
## $ call : language arima(x = X, order = c(1, 0, 1))
## $ series : chr "X"
## $ code : int 0
## $ n.cond : int 0
## $ nobs : int 100
## $ model :List of 10
## ..$ phi : num 0.774
## ..$ theta: num 0.363
## ..$ Delta: num(0)
## ..$ Z : num [1:2] 1 0
## ..$ a : num [1:2] -0.685 0.178
## ..$ P : num [1:2, 1:2] 0 0 0 0
## ..$ T : num [1:2, 1:2] 0.774 0 1 0
## ..$ V : num [1:2, 1:2] 1 0.363 0.363 0.132
## ..$ h : num 0
## ..$ Pn : num [1:2, 1:2] 1 0.363 0.363 0.132
## - attr(*, "class")= chr "Arima"
$coef # parameter estimates (MLE) Fit1
## ar1 ma1 intercept
## 0.7737811 0.3630587 4.6714018
$var.coef # variance of MLE using asymptotic formula Fit1
## ar1 ma1 intercept
## ar1 0.0044875716 -0.0024678955 -0.0007804059
## ma1 -0.0024678955 0.0081815327 0.0003076798
## intercept -0.0007804059 0.0003076798 0.3590977368
c(Fit1$var.coef[1,1], Fit1$var.coef[2,2], Fit1$var.coef[3,3])
## [1] 0.004487572 0.008181533 0.359097737
#-Repeat above for 1000 times, recording Est$coef each time.
# Compare the simulated variance and theoretical asympt variance ---
=100
n=500
itt
= matrix(0,itt,7) #matrix to record estimated values
MLE = matrix(0,itt,7) #matrix to record estimated values
Vars
set.seed(1373)
for (i in 1:itt) {
= 5
mu # X = arima.sim(n = n, list(ar = c(0.7), ma = c(0.4)) ) + mu # Case 1
= arima.sim(n = n, list(ar = c(0.7, .2, -.3), ma = c(0.6, -.4, .2)) ) + mu # Case 2
X
= arima(X, order=c(3,0,3))
Est = Est$coef
MLE[i,] = diag(Est$var.coef)
Vars[i,] }
## Warning in log(s2): NaNs produced
## Warning in log(s2): NaNs produced
## Warning in log(s2): NaNs produced
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in log(s2): NaNs produced
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in log(s2): NaNs produced
## Warning in log(s2): NaNs produced
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
#--- Result for phi1, theta1, mu ----
layout(matrix(1:6, 2,3, byrow=TRUE));
hist(MLE[,1], main="phi1 (.7)"); abline(v= .7, col="blue")
hist(MLE[,2], main="phi2 (.2)"); abline(v= .2, col="blue")
hist(MLE[,3], main="phi3 (.3)"); abline(v=-.3, col="blue")
hist(MLE[,4], main="theta1 (.6)"); abline(v= .6, col="blue")
hist(MLE[,5], main="theta2 (.4)"); abline(v=-.4, col="blue")
hist(MLE[,6], main="theta3 (.2)"); abline(v= .2, col="blue")
= cbind( apply(MLE, 2, mean), apply(MLE, 2, sd), sqrt(apply(Vars, 2, mean)) )
Result colnames(Result) = c("Mean","SE by Sim", "SE from Output")
Result
## Mean SE by Sim SE from Output
## [1,] 0.54219689 0.6287727 0.3371195
## [2,] -0.07738958 0.4732947 0.4203374
## [3,] -0.08756457 0.3389365 0.2285187
## [4,] 0.53327595 0.6740321 0.3415460
## [5,] 0.15170613 0.7645478 0.3128849
## [6,] 0.36054578 0.2943624 0.2197626
## [7,] 4.99426805 0.3392923 0.3362629
#-Repeat above for 1000 times, recording Est$coef each time.
# Compare the simulated variance and theoretical asympt variance ---
=400
n=500
itt
= matrix(0,itt,7) #matrix to record estimated values
MLE = matrix(0,itt,7) #matrix to record estimated values
Vars
set.seed(373)
for (i in 1:itt) {
= 5
mu # X = arima.sim(n = n, list(ar = c(0.7), ma = c(0.4)) ) + mu # Case 1
= arima.sim(n = n, list(ar = c(0.7, .2, -.3), ma = c(0.6, -.4, .2)) ) + mu # Case 2
X
= arima(X, order=c(3,0,3))
Est = Est$coef
MLE[i,] = diag(Est$var.coef)
Vars[i,] }
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
## Warning in arima(X, order = c(3, 0, 3)): possible convergence
## problem: optim gave code = 1
#--- Result for phi1, theta1, mu ----
layout(matrix(1:6, 2,3, byrow=TRUE));
hist(MLE[,1], main="phi1 (.7)"); abline(v= .7, col="blue")
hist(MLE[,2], main="phi2 (.2)"); abline(v= .2, col="blue")
hist(MLE[,3], main="phi3 (.3)"); abline(v=-.3, col="blue")
hist(MLE[,4], main="theta1 (.6)"); abline(v= .6, col="blue")
hist(MLE[,5], main="theta2 (.4)"); abline(v=-.4, col="blue")
hist(MLE[,6], main="theta3 (.2)"); abline(v= .2, col="blue")
= cbind( apply(MLE, 2, mean), apply(MLE, 2, sd), sqrt(apply(Vars, 2, mean)) )
Result2 colnames(Result) = c("Mean","SE by Sim", "SE from Output")
Result2
## [,1] [,2] [,3]
## [1,] 0.6678227 0.2988139 0.2304187
## [2,] 0.1539419 0.2233818 0.1926529
## [3,] -0.2613771 0.1685802 0.1325948
## [4,] 0.4052964 0.3008026 0.2321341
## [5,] -0.2138445 0.3588771 0.2714871
## [6,] 0.1935095 0.1432537 0.1237579
## [7,] 4.9906898 0.1896803 0.1706930
#- compare to when n=100
Result
## Mean SE by Sim SE from Output
## [1,] 0.54219689 0.6287727 0.3371195
## [2,] -0.07738958 0.4732947 0.4203374
## [3,] -0.08756457 0.3389365 0.2285187
## [4,] 0.53327595 0.6740321 0.3415460
## [5,] 0.15170613 0.7645478 0.3128849
## [6,] 0.36054578 0.2943624 0.2197626
## [7,] 4.99426805 0.3392923 0.3362629
#
Est
##
## Call:
## arima(x = X, order = c(3, 0, 3))
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 ma3 intercept
## 0.4911 0.3470 -0.2790 0.6066 -0.1464 0.1046 5.2788
## s.e. 0.2374 0.2094 0.1605 0.2387 0.3327 0.1523 0.1893
##
## sigma^2 estimated as 1.145: log likelihood = -595.49, aic = 1206.98
=100
n= Vars = matrix(0,1000,3) #matrix to record estimated values
MLE = Vars2 = matrix(0,1000,3) #matrix to record estimated values
MLE2 = Vars3 = matrix(0,1000,3) #matrix to record estimated values
MLE3
for (i in 1:1000) {
#-- Normal errors ---
= arima.sim(n = n, list(ar = c(0.7), ma = c(0.4)) ) + mu
X
= arima(X, order=c(1,0,1))
Est = Est$coef
MLE[i,] = c(Est$var.coef[1,1], Est$var.coef[2,2], Est$var.coef[3,3])
Vars[i,]
#-- non-normal errors ---
= rt(n, 4) # t-distribution with df=4.
et = arima.sim(n=n, list(ar = c(0.7), ma = c(0.4)), innov=et ) + mu
Y
= arima(Y, order=c(1,0,1))
Est2 = Est2$coef
MLE2[i,] = c(Est2$var.coef[1,1], Est2$var.coef[2,2], Est2$var.coef[3,3])
Vars2[i,]
#-- non-normal errors ---
= rexp(n, 1)-1 # exp with mean 0
et3 = arima.sim(n=n, list(ar = c(0.7), ma = c(0.4)), innov=et3 ) + mu
T
= arima(T, order=c(1,0,1))
Est3 = Est3$coef
MLE3[i,] = c(Est3$var.coef[1,1], Est3$var.coef[2,2], Est3$var.coef[3,3])
Vars3[i,]
}
# Plotting Results
layout(matrix(1:6, 3,2, byrow=TRUE));
hist(MLE[,1], xlim=c(0,1), main="e ~ Norm: phi1 (.7)" ); abline(v= .7, col="blue")
hist(MLE[,2], xlim=c(0,1), main="e ~ Norm: theta1 (.4)"); abline(v= .4, col="blue")
hist(MLE2[,1], xlim=c(0,1), main="e ~ t(v): phi1 (.7)" ); abline(v= .7, col="blue")
hist(MLE2[,2], xlim=c(0,1), main="e ~ t(v): theta1 (.4)"); abline(v= .4, col="blue")
hist(MLE3[,1], xlim=c(0,1), main="e ~ exp(1)-1: phi1 (.7)" ); abline(v= .7, col="blue")
hist(MLE3[,2], xlim=c(0,1), main="e ~ exp(1)-1: theta1 (.4)"); abline(v= .4, col="blue")
= rbind(c(mean(MLE[,1]), sd(MLE[,1]), mean(MLE[,2]), sd(MLE[,2] ) ),
Result3 c(mean(MLE2[,1]), sd(MLE2[,1]), mean(MLE2[,2]), sd(MLE2[,2]) ),
c(mean(MLE3[,1]), sd(MLE3[,1]), mean(MLE3[,2]), sd(MLE3[,2]) ) )
colnames(Result3) = c("Mean phi1" , "SE phi1", "Mean th1", "SE th1")
Result3
## Mean phi1 SE phi1 Mean th1 SE th1
## [1,] 0.6614065 0.09300538 0.4169312 0.1112552
## [2,] 0.6573426 0.09131514 0.4186479 0.1124989
## [3,] 0.6613954 0.09062829 0.4187407 0.1136855
We assume that \(\epsilon_t\) is normal and estimate ARMA parameters by MLE.
For MLE, asymptotic sample distributions of the estimators are known (formula above).
For small \(n\), Standard Error from R output may not be accurate.
When the assumption of \(\epsilon_t\) being normal is violated, performance of MLE is not significantly affected for ARMA model.
Since MLE is computed by numerial optimization, it needs good starting point. CSS is used as starting point in and by default.