Time Series: Practice

8.1 Practical considerations

8.1.1 Reduction to stationarity

Most time series encountered are not stationary.

Differencing The difference operator Ñ is defined as Ñ = 1-B, or, in other words, (ÑX)t = Xt - Xt-1.

Differencing can remove or reduce deterministic trends. In some cases the investigator may wish to difference again.
Differencing forms an integral part of the Box-Jenkins methodology. If the dth difference of X, ÑdX, is an ARMA(p,q) process, then X is said to be an ARIMA(p,d,q) process.

If there is reason to believe that the underlying, deterministic trend takes a particular form, this can be regarded as a time-varying mean and subtracted from the process, to leave a random component which should be analysed separately.

Seasonal means If seasonal effect is fairly regular from year to year, a separate mean is estimated for each 'season'. The seasonal means are then subtracted from the observations and the residuals analysed as usual.

Seasonal differencing Slightly less stable than the method of seasonal means, but fits into the Box-Jenkins schema. The seasonal difference operator for monthly data is Ñ12 = 1 - B12.

The Box-Jenkins approach treats the reduction to stationarity as an integral part of the model fitting technique. MINITAB has the capability to carry out the extended model-fitting quite seamlessly.

Method of Moving Averages Replace Xt with a smoothed version of Xt which takes account of seasonal variation. For example, for quarterly data we might use

(Xt-2 + 2Xt-1 + 2Xt + 2Xt+1 + Xt+2)/8
Thus, at each time period, equal weight is given to each seasonal component.

This method works well at eliminating seasonal variation, but of course it also smoothes out much of the underlying Time Series variation which we are hoping to detect. Different smoothing functions are sometimes used in an attempt to combat this effect.

Data transformations We always assume that the process of innovations has constant variance s2. If this is not the case - for example, if Var(et) seems to depend on the fitted value - then a variance-stabilising transformation may be in order.
A common choice for such a transformation is the log function.

Variance-stabilisation is not the only reason for transformation. If data look very non-normal (skewed), a transformation may reduce the sample to normality.

8.1.2 Conduct of an investigation

Plot SACF, SPACF to see if a MA(q) or AR(p) model will fit. MINITAB draws dotted red lines to indicate which values appear to be significantly different from 0. If neither a pure MA nor a pure AR fits the bill, use ARMA(1,1).

Look at the series of residuals. Try a Normal probability plot, plot residuals against fitted values, find the SACF, SPACF of the residuals. If there is evidence that they are not all i.i.d. Normal(0,?), add a parameter to the model and try again.

8.1.3 When to stop adding parameters

Each additional parameter improves the fit and reduces the residual sum of squares. But it also adds to worries about overfitting.
The Akaike Information Criterion (AIC) states that one should minimise
- 2 log Lmax + 2 (no. of parameters) = n log(SSR) + 2 (p + q) + const.,
ie. that an extra parameter should reduce the RSS by a factor of e2/n, otherwise it is not worth including.

This criterion only operates well when the data can be assumed Normal, and even then it tends to permit too many parameters. Akaike has also introduced the Bayesian Information Criterion, which is

- 2 log Lmax + [1 + log(n)] (no. of parameters)
Other criteria also exist.

8.1.4. Automatic methods

Exponential smoothing. (Holt, 1958). If X is a stationary or I(1), trend-free time series, estimate Xn+1 by
åck xt-k
where the ci are weights, summing to 1.
Holt suggested geometric weights, ci = a(1-a)i for non-negative i. This form leads to simple updating equations.

a = 0.2 is often used. The smoothing parameter may be estimated, but small variations tend not to make much difference.

Exponential smoothing is optimal if X is ARIMA(0,1,1). More advanced versions exist to cope with trends and seasonal variation.

8.2 Multivariate time series

8.2.1 Possible scenarios

Multivariate time series may arise in a number of ways.

The time series are measuring the same quantity: for example, where aircraft noise meters are set up at a number of locations. In this case we expect high correlation between the series.

Alternatively, they could all depend on some fundamental underlying quantity. Thus different forms of investment strategy will depend on the base lending rate.

The purpose of the investigation may be to uncover a causal relationship between two or more time series: one may be driving the other, possibly with a lag of a few time periods.

8.2.2 Cointegrated time series

In practice most econometric time series are not I(0) (stationary), but I(1) (integrated stationary: the sequence of first differences is stationary).
Investigating two I(1) processes using standard methods (see later) can give unreliable results. The exception is when they are quite closely related, in the sense that there is a stationary process Zt given by

Zt = uXt + vYt

for some constants u and v.
In this case X and Y are said to be cointegrated.
To test whether X and Y are cointegrated use ordinary least squares to find a and b such that Y = aX + b, then analyse the residuals: if they are stationary, then it is reasonable to suppose that X and Y are cointegrated.
When two series are cointegrated, the values of either one can be used to forecast the future of both.

8.2.3 Vector time series

Of the form Xt = AXt-1 + Et for a VAR(1), where A is a matrix, the other terms random vectors. The Et are assumed independent Normal random variables with unchanging variance-covariance matrix. For convenience we assume that the expectation of Xt is zero.

Here EXtXt-1T = ASX, and one may deduce that

EXtXt-kT = AkSX

A may be estimated by means of the lag-1 cross-correlation function.

General VARMA processes may be treated similarly, although Moving Average components tend to make life more difficult. If VARIMA is sought, note that some of the components of the vector Xt may need to be differenced a different number of times than some of the others.

8.3 The frequency domain

8.3.1 The spectrum

The frequency domain approach aims to find periodicities hidden in the data and to use them to predict future fluctuations. Frequency domain analysis predates the time domain.

(Wiener-Hinchin Theorem): if X is stationary, then there is an increasing function F such that

gk = ò(0,p) cos kw dF(w).

If X has no deterministic component then F is continuous, so differentiable: f = dF/dw is the spectral density function, or spectrum of X.

We have

f(w) = 1

p
(g0 + åk 2 gk cos wk)

8.3.2 Estimating the spectrum

Spectral analysis involves using the data to produce an estimate for the spectrum of X, and from this deducing properties of X.

The obvious estimator to use is

I(w) = 1

p
(c0 + å1£k£N-1 2 ck cos wk)

Unfortunately this estimator is inconsistent. There are a variety of ways of smoothing the estimator to try to produce something more useful.

8.3.3 Spectrum of AR and MA

The spectrum of the innovations process is s2/p.

A fundamental result is that, if X = y(B)Y (where B is the backshift operator) then

fX(w) = y(eiw) y(e-iw) fY(w).

Therefore the spectral density of a MA is

f(w) = s2 f(eiw) f(e-iw)/p

and of an AR is

f(w) = s2

p q(eiw) q(e-iw)

ARMA processes can be handled similarly.

An algorithm called the Fast Fourier Transform enables the ACF to be calculated rapidly from the spectral density. In some cases this may be the quickest way of finding the ACF.


Previous Index
This page is maintained by Russell Gerrard: R.J.Gerrard@city.ac.uk