Many data sets consists of data collected regularly
A common feature is that successive observations are not independent.
Time Series Analysis is about fitting models to break the observations down into
Then predict the future by projecting the non-random component and the pattern forward in time.
Example: Air Passenger Miles data (monthly over 10 years).
Other examples similarly: unemployment statistics, product quality, milk yield per cow.
Versatile: many subjects use ideas and techniques from TS.
We shall analyse data and make forecasts using stationary models, then
investigate what can be done with non-stationary data.
The collection of incoming data will be denoted by X1, . . . Xn, or {Xt : 1 £ t £ n}. It is assumed that these observations are made at equally-spaced times.
Time Series is about analysing dependence. But the analysis is predicated on stationarity.
A time series X is stationary if
Note that g0 = Var(Xt).
The notation 'X is I(0)' is sometimes used to indicate a stationary process. X is I(1) if X itself is not stationary, but the changes in X,| ÑXt = Xt-Xt-1, |
The autocorrelation function (ACF) is
| rk = Corr (Xt , Xt-k) = gk/g0. |
We need a model which fits the data, so it must allow Xt to depend on Xt-1 (and earlier X values as well, in most cases).
Look at stationary linear models. (Non-stationary models can't be projected, non-linear ones are harder than linear.)
A first-order autoregression, AR(1), has equation
| Xt = m + a (Xt-1 - m) + et, |
A first-order moving average, MA(1), is
| Xt = m + et + b et-1 . |
Each has a single parameter in addition to the standard mean and innovation variance.
For a stationary model the ACF is {rk : k ³ 0}, where rk is the correlation coefficient of Xt with Xt-k. Occasionally you need to use r for k < 0. Remember that r is an even function, so r-k = rk.
For a non-stationary model the ACF is undefined.
For AR(1) with parameter a the ACF is
rk =
a|k|.
For MA(1) with parameter b we have rk = 0 except for
r0 = 1 and
r1 = b/(1 + b2).
This distinguishes the MA(1) from the AR(1): for AR(1) the effects of the early observations continue to be felt.
This suggests a technique for identifying a first-order MA: see if the ACF
is close to 0 except at lag 1.
Checking whether a decrease is 'close to geometric'
is much harder. The partial ACF (PACF) was introduced to
combat this: the PACF of an AR(1) is 0 except at lag 1; the PACF of a MA(1)
decreases geometrically.
The PACF is denoted by fk and defined to be the conditional correlation of Xt and Xt-k given all the values from t-k + 1 to t-1, i.e. the extent of the relationship between Xt and Xt-k which is not accounted for by an AR(k-1) model.
The PACF is fiendishly difficult to calculate by hand except in very simple cases.
The general p-th order autoregression AR(p) has equation
| Xt = m + a1(Xt-1 - m) + a2(Xt-2 - m) + . . . + ap (Xt-p - m) + et , |
so that the current value of X is driven by a number of previous values, stretching back p time periods, as well as having the single innovation added in.
The general q-th order moving average MA(q) has equation
| Xt = m + et + b1et-1 + b2et-2 + . . . + bqet-q |
In both cases it is possible to calculate the theoretical ACF. As for the simple versions, the ACF of MA(q) is 0 for k > q, whereas the ACF of AR(p) decreases as a sum of geometrics for k > p.
For the MA the calculations are simple. For the AR a matrix inversion is required. This technique depends on the Yule-Walker equations.
Both MA and AR have limitations; in particular, anything with non-zero SACF and SPACF cannot be fitted well by either AR or MA. The mixed autoregressive-moving average model adds flexibility.
| Xt = m + a1 (Xt-1 - m) + a2 (Xt-2 - m) + . . . + ap (Xt-p - m) + et + b1 et-1 + b2 et-2 + . . . + bq et-q |
The backshift operator B acts on a whole process at once. Its effect can be summarised as (BX)t = Xt-1.
We can therefore write the equation of a moving average as
| Xt = (1 + b1B + b2B2 + ... + bqBq)et |
and that of an autoregression as
| (1 - a1B - a2B2 - ... - apBp)Xt = et |
These equations may be inverted. For example, the MA(1) equation X = (1 + bB)e may alternatively be written
| e = (1 + bB)-1X = (1 - bB + b2B2 - ...)X, |
an infinite-order autoregression. Note that the coefficients grow to infinity unless -1 < b < 1: if b is within the limits the MA is invertible.
The general MA is invertible if the equation
| f(z) º 1 + b1z + b2z2 + ... + bqzq = 0 |
has no solution on or inside the unit circle.
All stationary ARs are invertible. The condition that q(z) have no roots on or inside the unit circle is implied by the requirement that the Yule-Walker equations have a solution.
For any invertible MA there is one (or several) non-invertible MA which has the same ACF. From now on we shall always deal with the invertible version. Parameter estimation routines produce invertible estimated time series.
The sample ACF is rk, calculated from the sample autocovariance function ck, where
| ck = | 1 n |
| (xi-m)(xi-k-m), |
where m here refers to the sample mean, used as an estimate for m.
We would like to use the sample ACF as an estimator for the theoretical underlying ACF.
If a MA(1) model fits the data the SACF should be close to 0 for k > 1. For an AR(1) model to fit, the SACF should be roughly geometrically decreasing. And remember that the ACF is only defined for stationary Time Series, so don't use the SACF of a clearly non-stationary data set.
The correlogram is a plot of rk against k. It is often used as a diagnostic tool to suggest a model, as well as in the analysis of residuals.
If the MA(q) is the true model, then for k > q the mean of rk is 0 and the variance approx n-1(1 + 2å1q rk2). If the sample ACF seems close to 0 for k > q, then MA(q) seems a reasonable model. We can use the variance calculation to construct intervals around 0: if all rk fall inside the relevant intervals, then MA(q) is likely to be a good fit.
It is much harder to judge whether the SACF is decreasing roughly geometrically. We use the partial ACF and sample PACF.
The formula for calculating the PACF is unmemorable. But for an AR(p) the PACF is zero for k > p, whereas it decreases approximately geometrically for a MA(q).
The sample PACF has mean 0 and s.d. approx 1/Ön for k > p if the true model is AR(p). Often used in conjunction with the sample ACF for diagnostic puroses.
Clearly m is the expectation of X, so it seems sensible to use the sample mean åXi/n as an estimator for m. Similarly a reasonable estimator for g0 = Var(Xt) is the sample variance of X.
For AR(1) and MA(1) the simplest method of estimating the additional parameter is MoM: select the value of a or b which gives theoretical value r1 equal to r1.
But this is not way the parameters are estimated.
(Least squares is the same as maximum likelihood when Normality is assumed.) If the et are supposed i.i.d. ~ N(0 , s2), we can write down the likelihood for an AR(1). It depends only on a, m, s and X0, and it is trivial to maximise over these 4 parameters at once.
It is hardly more difficult for an AR(p), although the additional parameters are matched by additional past X values that need estimating.
The likelihood of a MA is much harder to write down. If e0, e-1, etc. were known, there would be no problem. They can only be estimated, along with the parameters. Sadly the likelihood is not a linear or quadratic function of the ei. Estimates may not be reliable.
An invertible MA is time-reversible. A popular technique is "backforecasting": estimate the parameters crudely, put these into the model and run it backwards in time; now e0, e-1, etc. are in the future, so can be forecast (see forecasting later). The forecast values are used to produce more refined parameter estimates, and so it goes on. More accurate than MLE; fairly quick on a computer, though a number of iterations are usually needed.
The MA component of an ARMA model will cause the same difficulties as for MA processes.
Future values of X depend on
For example, for an ARMA(1,1) model we have
| Xn + 1 = m + a (Xn - m) + en + 1 + b en |
and
| Xn + 2 = m + a (Xn + 1 - m) + en + 2 + b en + 1 |
To obtain point estimators of Xn + i (i = 1, 2, ¼) based on the information available at time n we take the above equations and
The calculation of variances, and hence of prediction intervals, is very complicated, as parameter estimates may be correlated. It is best left to a computer.
![]() |
![]() |
![]() |